xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
@ 2016-05-12 15:25 Tamas K Lengyel
  2016-05-12 15:25 ` [PATCH v3 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
  2016-05-13 12:00 ` [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Jan Beulich
  0 siblings, 2 replies; 15+ messages in thread
From: Tamas K Lengyel @ 2016-05-12 15:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Tamas K Lengyel

Currently mem-sharing can be performed on a page-by-page base from the control
domain. However, when completely deduplicating (cloning) a VM, this requires
at least 3 hypercalls per page. As the user has to loop through all pages up
to max_gpfn, this process is very slow and wasteful.

This patch introduces a new mem_sharing memop for bulk deduplication where
the user doesn't have to separately nominate each page in both the source and
destination domain, and the looping over all pages happen in the hypervisor.
This significantly reduces the overhead of completely deduplicating entire
domains.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>

v3: Bail if domains are not paused
    Rename bulk_share struct to just bulk
    Return -ENOMEM error if nomination fails
    Return total number of shared pages (not keeping separate count)
v2: Stash hypercall continuation start point in xen_mem_sharing_op_t
    Return number of successfully shared pages in xen_mem_sharing_op_t
---
 tools/libxc/include/xenctrl.h |  15 +++++++
 tools/libxc/xc_memshr.c       |  19 ++++++++
 xen/arch/x86/mm/mem_sharing.c | 100 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   |  14 +++++-
 4 files changed, 147 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index dc54612..29ff13e 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2327,6 +2327,21 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
                     domid_t client_domain,
                     unsigned long client_gfn);
 
+/* Allows to deduplicate the entire memory of a client domain in bulk. Using
+ * this function is equivalent of calling xc_memshr_nominate_gfn for each gfn
+ * in the two domains followed by xc_memshr_share_gfns. If successfull,
+ * returns the number of shared pages in 'shared'. Both domains must be paused.
+ *
+ * May fail with -EINVAL if the source and client domain have different
+ * memory size or if memory sharing is not enabled on either of the domains.
+ * May also fail with -ENOMEM if there isn't enough memory available to store
+ * the sharing metadata before deduplication can happen.
+ */
+int xc_memshr_bulk_share(xc_interface *xch,
+                         domid_t source_domain,
+                         domid_t client_domain,
+                         uint64_t *shared);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater. 
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index deb0aa4..71350d2 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -181,6 +181,25 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
     return xc_memshr_memop(xch, source_domain, &mso);
 }
 
+int xc_memshr_bulk_share(xc_interface *xch,
+                         domid_t source_domain,
+                         domid_t client_domain,
+                         uint64_t *shared)
+{
+    int rc;
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_bulk_share;
+    mso.u.bulk.client_domain = client_domain;
+
+    rc = xc_memshr_memop(xch, source_domain, &mso);
+    if ( !rc && shared ) *shared = mso.u.bulk.shared;
+
+    return rc;
+}
+
 int xc_memshr_domain_resume(xc_interface *xch,
                             domid_t domid)
 {
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index a522423..06176aa 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1294,6 +1294,43 @@ int relinquish_shared_pages(struct domain *d)
     return rc;
 }
 
+static int bulk_share(struct domain *d, struct domain *cd, unsigned long max,
+                      struct mem_sharing_op_bulk *bulk)
+{
+    int rc;
+    shr_handle_t sh, ch;
+
+    while( bulk->start <= max )
+    {
+        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
+        if ( rc == -ENOMEM )
+            break;
+        if ( !rc )
+        {
+            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
+            if ( rc == -ENOMEM )
+                break;
+            if ( !rc )
+                mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);
+        }
+
+        ++(bulk->start);
+
+        /* Check for continuation if it's not the last iteration. */
+        if ( bulk->start < max && hypercall_preempt_check() )
+        {
+            rc = 1;
+            break;
+        }
+    }
+
+    /* We only propagate -ENOMEM so reset rc here */
+    if ( rc < 0 && rc != -ENOMEM )
+        rc = 0;
+
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         }
         break;
 
+        case XENMEM_sharing_op_bulk_share:
+        {
+            unsigned long max_sgfn, max_cgfn;
+            struct domain *cd;
+
+            rc = -EINVAL;
+            if ( !mem_sharing_enabled(d) )
+                goto out;
+
+            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
+                                                   &cd);
+            if ( rc )
+                goto out;
+
+            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
+            if ( rc )
+            {
+                rcu_unlock_domain(cd);
+                goto out;
+            }
+
+            if ( !mem_sharing_enabled(cd) )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            if ( !atomic_read(&d->pause_count) ||
+                 !atomic_read(&cd->pause_count) )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            max_sgfn = domain_get_maximum_gpfn(d);
+            max_cgfn = domain_get_maximum_gpfn(cd);
+
+            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            rc = bulk_share(d, cd, max_sgfn, &mso.u.bulk);
+            if ( rc > 0 )
+            {
+                if ( __copy_to_guest(arg, &mso, 1) )
+                    rc = -EFAULT;
+                else
+                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                                       "lh", XENMEM_sharing_op,
+                                                       arg);
+            }
+            else
+                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
+
+            rcu_unlock_domain(cd);
+        }
+        break;
+
         case XENMEM_sharing_op_debug_gfn:
         {
             unsigned long gfn = mso.u.debug.u.gfn;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index fe52ee1..202250a 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -453,6 +453,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_debug_gref        5
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
+#define XENMEM_sharing_op_bulk_share        8
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
             uint64_aligned_t client_gfn;    /* IN: the client gfn */
             uint64_aligned_t client_handle; /* IN: handle to the client page */
             domid_t  client_domain; /* IN: the client domain id */
-        } share; 
+        } share;
+        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
+            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
+                                                full deduplication. Field is
+                                                used internally and may change
+                                                when the hypercall returns. */
+            uint64_aligned_t shared;         /* OUT: the number of gfns
+                                                that are shared after this
+                                                operation including pages
+                                                already shared before */
+            domid_t client_domain;           /* IN: the client domain id */
+        } bulk;
         struct mem_sharing_op_debug {     /* OP_DEBUG_xxx */
             union {
                 uint64_aligned_t gfn;      /* IN: gfn to debug          */
-- 
2.8.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/2] tests/mem-sharing: Add bulk option to memshrtool
  2016-05-12 15:25 [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
@ 2016-05-12 15:25 ` Tamas K Lengyel
  2016-05-13 12:00 ` [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: Tamas K Lengyel @ 2016-05-12 15:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Tamas K Lengyel, Ian Jackson, Ian Campbell, Stefano Stabellini

Add the bulk option to the test tool to perform complete deduplication
between domains.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 tools/tests/mem-sharing/memshrtool.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/tests/mem-sharing/memshrtool.c b/tools/tests/mem-sharing/memshrtool.c
index 437c7c9..3e8f467 100644
--- a/tools/tests/mem-sharing/memshrtool.c
+++ b/tools/tests/mem-sharing/memshrtool.c
@@ -24,6 +24,8 @@ static int usage(const char* prog)
     printf("  nominate <domid> <gfn>  - Nominate a page for sharing.\n");
     printf("  share <domid> <gfn> <handle> <source> <source-gfn> <source-handle>\n");
     printf("                          - Share two pages.\n");
+    printf("  bulk <source-domid> <destination-domid>\n");
+    printf("                          - Share all pages between domains.\n");
     printf("  unshare <domid> <gfn>   - Unshare a page by grabbing a writable map.\n");
     printf("  add-to-physmap <domid> <gfn> <source> <source-gfn> <source-handle>\n");
     printf("                          - Populate a page in a domain with a shared page.\n");
@@ -180,6 +182,26 @@ int main(int argc, const char** argv)
         }
         printf("Audit returned %d errors.\n", rc);
     }
+    else if( !strcasecmp(cmd, "bulk") )
+    {
+        domid_t sdomid, cdomid;
+        int rc;
+        unsigned long shared;
+
+        if( argc != 4 )
+            return usage(argv[0]);
+
+        sdomid = strtol(argv[2], NULL, 0);
+        cdomid = strtol(argv[3], NULL, 0);
 
+        rc = xc_memshr_bulk_share(xch, sdomid, cdomid, &shared);
+        if ( rc < 0 )
+        {
+            printf("error executing xc_memshr_bulk_dedup: %s\n", strerror(errno));
+            return rc;
+        }
+
+        printf("Successfully shared %lu pages between the domains\n", shared);
+    }
     return 0;
 }
-- 
2.8.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-12 15:25 [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
  2016-05-12 15:25 ` [PATCH v3 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
@ 2016-05-13 12:00 ` Jan Beulich
  2016-05-13 14:50   ` Tamas K Lengyel
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2016-05-13 12:00 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: xen-devel

>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1294,6 +1294,43 @@ int relinquish_shared_pages(struct domain *d)
>      return rc;
>  }
>  
> +static int bulk_share(struct domain *d, struct domain *cd, unsigned long max,
> +                      struct mem_sharing_op_bulk *bulk)
> +{
> +    int rc;
> +    shr_handle_t sh, ch;
> +
> +    while( bulk->start <= max )
> +    {
> +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
> +        if ( rc == -ENOMEM )
> +            break;
> +        if ( !rc )
> +        {
> +            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
> +            if ( rc == -ENOMEM )
> +                break;

If we get to this break, how will the caller know that the first
nomination succeeded but the second didn't? Or perhaps there
is some undo logic missing here?

> +            if ( !rc )
> +                mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);

You shouldn't be ignoring errors here.

> +        }
> +
> +        ++(bulk->start);

Pointless parentheses.

> +        /* Check for continuation if it's not the last iteration. */
> +        if ( bulk->start < max && hypercall_preempt_check() )

The loop head has <=; why < here?

> +        {
> +            rc = 1;

I'd recommend using -ERESTART here, as we do elsewhere.

> +            break;
> +        }
> +    }
> +
> +    /* We only propagate -ENOMEM so reset rc here */
> +    if ( rc < 0 && rc != -ENOMEM )
> +        rc = 0;

What's the rationale for discarding all other errors? At least the
patch description, but perhaps even the comment (which btw
is lacking a full stop) should be explaining this.

> @@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>  
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if ( !mem_sharing_enabled(d) )
> +                goto out;
> +
> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> +                                                   &cd);
> +            if ( rc )
> +                goto out;
> +
> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);

Either you pass XENMEM_sharing_op_share here, or you need to
update xen/xsm/flask/policy/access_vectors (even if it's only a
comment which needs updating).

That said - are this and the similar pre-existing XSM checks actually
correct? I.e. is one of the two domains here really controlling the
other? I would have expected that a tool stack domain initiates the
sharing between two domains it controls...

> +            if ( rc )
> +            {
> +                rcu_unlock_domain(cd);
> +                goto out;
> +            }
> +
> +            if ( !mem_sharing_enabled(cd) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            if ( !atomic_read(&d->pause_count) ||
> +                 !atomic_read(&cd->pause_count) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            max_sgfn = domain_get_maximum_gpfn(d);
> +            max_cgfn = domain_get_maximum_gpfn(cd);
> +
> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )

Why would the two domains need to agree in their maximum
GPFN? There's nothing similar in this file so far. Nor does the
right side of the || match anything pre-existing...

> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>              domid_t  client_domain; /* IN: the client domain id */
> -        } share; 
> +        } share;
> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
> +                                                full deduplication. Field is
> +                                                used internally and may change
> +                                                when the hypercall returns. */
> +            uint64_aligned_t shared;         /* OUT: the number of gfns
> +                                                that are shared after this
> +                                                operation including pages
> +                                                already shared before */
> +            domid_t client_domain;           /* IN: the client domain id */
> +        } bulk;

Let's not repeat pre-existing mistakes: There is explicit padding
missing here, which then also ought to be checked to be zero on
input.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 12:00 ` [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Jan Beulich
@ 2016-05-13 14:50   ` Tamas K Lengyel
  2016-05-13 15:09     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Tamas K Lengyel @ 2016-05-13 14:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
>> --- a/xen/arch/x86/mm/mem_sharing.c
>> +++ b/xen/arch/x86/mm/mem_sharing.c
>> @@ -1294,6 +1294,43 @@ int relinquish_shared_pages(struct domain *d)
>>      return rc;
>>  }
>>
>> +static int bulk_share(struct domain *d, struct domain *cd, unsigned long max,
>> +                      struct mem_sharing_op_bulk *bulk)
>> +{
>> +    int rc;
>> +    shr_handle_t sh, ch;
>> +
>> +    while( bulk->start <= max )
>> +    {
>> +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
>> +        if ( rc == -ENOMEM )
>> +            break;
>> +        if ( !rc )
>> +        {
>> +            rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
>> +            if ( rc == -ENOMEM )
>> +                break;
>
> If we get to this break, how will the caller know that the first
> nomination succeeded but the second didn't? Or perhaps there
> is some undo logic missing here?

No, there is not. There really is no "unnominate" feature of memshare.
So even if the user is calling nominate manually from userspace it
won't have the option to unnominate the page so that error condition
is currently useless for the user.

>
>> +            if ( !rc )
>> +                mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);
>
> You shouldn't be ignoring errors here.

The only error this function returns is if the sh/ch handles are
invalid. However we obtained those just now from successful
nominations, so we are guaranteed to have valid handles. This error
checking is only important when nominations/sharing happen
independently where the handle may go stale in-between. Here that is
not possible.

>
>> +        }
>> +
>> +        ++(bulk->start);
>
> Pointless parentheses.

Pointless but I prefer this style.

>
>> +        /* Check for continuation if it's not the last iteration. */
>> +        if ( bulk->start < max && hypercall_preempt_check() )
>
> The loop head has <=; why < here?

Because we only do preempt check if there are more then one pages left
(as the comment states).

>> +        {
>> +            rc = 1;
>
> I'd recommend using -ERESTART here, as we do elsewhere.
>

Ack.

>> +            break;
>> +        }
>> +    }
>> +
>> +    /* We only propagate -ENOMEM so reset rc here */
>> +    if ( rc < 0 && rc != -ENOMEM )
>> +        rc = 0;
>
> What's the rationale for discarding all other errors? At least the
> patch description, but perhaps even the comment (which btw
> is lacking a full stop) should be explaining this.

The reason we swallow errors here other then ENOMEM is that it's quite
possible that max_gpfn page is unsharable for example, thus rc would
have an EINVAL final error value. However, we don't care about the
success/fail of individual pages, we only care about the overall
state. For that only ENOMEM is critical.

>
>> @@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>          }
>>          break;
>>
>> +        case XENMEM_sharing_op_bulk_share:
>> +        {
>> +            unsigned long max_sgfn, max_cgfn;
>> +            struct domain *cd;
>> +
>> +            rc = -EINVAL;
>> +            if ( !mem_sharing_enabled(d) )
>> +                goto out;
>> +
>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>> +                                                   &cd);
>> +            if ( rc )
>> +                goto out;
>> +
>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
>
> Either you pass XENMEM_sharing_op_share here, or you need to
> update xen/xsm/flask/policy/access_vectors (even if it's only a
> comment which needs updating).

Right, it should actually be sharing_op_share here.

>
> That said - are this and the similar pre-existing XSM checks actually
> correct? I.e. is one of the two domains here really controlling the
> other? I would have expected that a tool stack domain initiates the
> sharing between two domains it controls...

Not sure what was the original rationale behind it either.

>
>> +            if ( rc )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                goto out;
>> +            }
>> +
>> +            if ( !mem_sharing_enabled(cd) )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>> +
>> +            if ( !atomic_read(&d->pause_count) ||
>> +                 !atomic_read(&cd->pause_count) )
>> +            {
>> +                rcu_unlock_domain(cd);
>> +                rc = -EINVAL;
>> +                goto out;
>> +            }
>> +
>> +            max_sgfn = domain_get_maximum_gpfn(d);
>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>> +
>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>
> Why would the two domains need to agree in their maximum
> GPFN? There's nothing similar in this file so far. Nor does the
> right side of the || match anything pre-existing...

The use-case for this function is to deduplicate identical VMs, not to
blindly share pages across arbitrary domains. So this is a safety
check to avoid accidentally running this function on domains that
obviously are not identical. The right hand size is a safety check
against not properly initialized input structs where the start point
is obviously outside the memory of the domain.

>
>> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>>              domid_t  client_domain; /* IN: the client domain id */
>> -        } share;
>> +        } share;
>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>> +                                                full deduplication. Field is
>> +                                                used internally and may change
>> +                                                when the hypercall returns. */
>> +            uint64_aligned_t shared;         /* OUT: the number of gfns
>> +                                                that are shared after this
>> +                                                operation including pages
>> +                                                already shared before */
>> +            domid_t client_domain;           /* IN: the client domain id */
>> +        } bulk;
>
> Let's not repeat pre-existing mistakes: There is explicit padding
> missing here, which then also ought to be checked to be zero on
> input.

This struct is part of a union and is smaller then largest struct in
the union, even with padding. So how would padding have any effect on
anything?

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 14:50   ` Tamas K Lengyel
@ 2016-05-13 15:09     ` Jan Beulich
  2016-05-13 15:31       ` Tamas K Lengyel
  2016-05-13 15:35       ` Daniel De Graaf
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2016-05-13 15:09 UTC (permalink / raw)
  To: Tamas K Lengyel, dgdegra; +Cc: Xen-devel

>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
> On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
>>> +            if ( !rc )
>>> +                mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);
>>
>> You shouldn't be ignoring errors here.
> 
> The only error this function returns is if the sh/ch handles are
> invalid. However we obtained those just now from successful
> nominations, so we are guaranteed to have valid handles. This error
> checking is only important when nominations/sharing happen
> independently where the handle may go stale in-between. Here that is
> not possible.

You describe the state of things right now. What if someone
adds an error path to that function without inspecting all callers,
as it is clear that the function could have returned errors before.
Ignoring errors is plain bad. If you're sure there can't be errors,
at least add a respective ASSERT().

>>> +        }
>>> +
>>> +        ++(bulk->start);
>>
>> Pointless parentheses.
> 
> Pointless but I prefer this style.

But it's in contrast to what we do almost everywhere else ...

>>> +        /* Check for continuation if it's not the last iteration. */
>>> +        if ( bulk->start < max && hypercall_preempt_check() )
>>
>> The loop head has <=; why < here?
> 
> Because we only do preempt check if there are more then one pages left
> (as the comment states).

No, the comment says "last iteration", whereas the check is for
"next to last" (as the increment has already happened). And I
also don't see when between any two iterations preemption is
possible, just not between the second from last and the last one.

>>> +            break;
>>> +        }
>>> +    }
>>> +
>>> +    /* We only propagate -ENOMEM so reset rc here */
>>> +    if ( rc < 0 && rc != -ENOMEM )
>>> +        rc = 0;
>>
>> What's the rationale for discarding all other errors? At least the
>> patch description, but perhaps even the comment (which btw
>> is lacking a full stop) should be explaining this.
> 
> The reason we swallow errors here other then ENOMEM is that it's quite
> possible that max_gpfn page is unsharable for example, thus rc would
> have an EINVAL final error value. However, we don't care about the
> success/fail of individual pages, we only care about the overall
> state. For that only ENOMEM is critical.

And you think no possible caller would care about the hypercall
reporting success yet not everything having got done that was
requested? Sounds strange to me, but as said - at least a bold
comment please.

>>> @@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>          }
>>>          break;
>>>
>>> +        case XENMEM_sharing_op_bulk_share:
>>> +        {
>>> +            unsigned long max_sgfn, max_cgfn;
>>> +            struct domain *cd;
>>> +
>>> +            rc = -EINVAL;
>>> +            if ( !mem_sharing_enabled(d) )
>>> +                goto out;
>>> +
>>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>>> +                                                   &cd);
>>> +            if ( rc )
>>> +                goto out;
>>> +
>>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
>>
>> Either you pass XENMEM_sharing_op_share here, or you need to
>> update xen/xsm/flask/policy/access_vectors (even if it's only a
>> comment which needs updating).
> 
> Right, it should actually be sharing_op_share here.
> 
>>
>> That said - are this and the similar pre-existing XSM checks actually
>> correct? I.e. is one of the two domains here really controlling the
>> other? I would have expected that a tool stack domain initiates the
>> sharing between two domains it controls...
> 
> Not sure what was the original rationale behind it either.

Daniel - any opinion on this one?

>>> +            if ( rc )
>>> +            {
>>> +                rcu_unlock_domain(cd);
>>> +                goto out;
>>> +            }
>>> +
>>> +            if ( !mem_sharing_enabled(cd) )
>>> +            {
>>> +                rcu_unlock_domain(cd);
>>> +                rc = -EINVAL;
>>> +                goto out;
>>> +            }
>>> +
>>> +            if ( !atomic_read(&d->pause_count) ||
>>> +                 !atomic_read(&cd->pause_count) )
>>> +            {
>>> +                rcu_unlock_domain(cd);
>>> +                rc = -EINVAL;
>>> +                goto out;
>>> +            }
>>> +
>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>> +
>>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>>
>> Why would the two domains need to agree in their maximum
>> GPFN? There's nothing similar in this file so far. Nor does the
>> right side of the || match anything pre-existing...
> 
> The use-case for this function is to deduplicate identical VMs, not to
> blindly share pages across arbitrary domains. So this is a safety
> check to avoid accidentally running this function on domains that
> obviously are not identical. The right hand size is a safety check
> against not properly initialized input structs where the start point
> is obviously outside the memory of the domain.

Is that use case the only possible one, or merely the one you care
about? In the former case, I'd be okay as long as a respctive
comment got added.

>>> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>>>              domid_t  client_domain; /* IN: the client domain id */
>>> -        } share;
>>> +        } share;
>>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>>> +                                                full deduplication. Field is
>>> +                                                used internally and may change
>>> +                                                when the hypercall returns. */
>>> +            uint64_aligned_t shared;         /* OUT: the number of gfns
>>> +                                                that are shared after this
>>> +                                                operation including pages
>>> +                                                already shared before */
>>> +            domid_t client_domain;           /* IN: the client domain id */
>>> +        } bulk;
>>
>> Let's not repeat pre-existing mistakes: There is explicit padding
>> missing here, which then also ought to be checked to be zero on
>> input.
> 
> This struct is part of a union and is smaller then largest struct in
> the union, even with padding. So how would padding have any effect on
> anything?

Being able to use the space for future extensions without having
to bump the interface version. In domctl-s it's not as important
due to them being separately versioned, but anyway.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 15:09     ` Jan Beulich
@ 2016-05-13 15:31       ` Tamas K Lengyel
  2016-05-13 16:12         ` Jan Beulich
  2016-05-13 15:35       ` Daniel De Graaf
  1 sibling, 1 reply; 15+ messages in thread
From: Tamas K Lengyel @ 2016-05-13 15:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, dgdegra

On Fri, May 13, 2016 at 9:09 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
>> On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
>>>> +            if ( !rc )
>>>> +                mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);
>>>
>>> You shouldn't be ignoring errors here.
>>
>> The only error this function returns is if the sh/ch handles are
>> invalid. However we obtained those just now from successful
>> nominations, so we are guaranteed to have valid handles. This error
>> checking is only important when nominations/sharing happen
>> independently where the handle may go stale in-between. Here that is
>> not possible.
>
> You describe the state of things right now. What if someone
> adds an error path to that function without inspecting all callers,
> as it is clear that the function could have returned errors before.
> Ignoring errors is plain bad. If you're sure there can't be errors,
> at least add a respective ASSERT().
>

Sure, that sounds like a reasonable middle ground.

>>>> +        }
>>>> +
>>>> +        ++(bulk->start);
>>>
>>> Pointless parentheses.
>>
>> Pointless but I prefer this style.
>
> But it's in contrast to what we do almost everywhere else ...
>

Alright..

>>>> +        /* Check for continuation if it's not the last iteration. */
>>>> +        if ( bulk->start < max && hypercall_preempt_check() )
>>>
>>> The loop head has <=; why < here?
>>
>> Because we only do preempt check if there are more then one pages left
>> (as the comment states).
>
> No, the comment says "last iteration", whereas the check is for
> "next to last" (as the increment has already happened). And I
> also don't see when between any two iterations preemption is
> possible, just not between the second from last and the last one.
>

Ah I see, yes you are right.

>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /* We only propagate -ENOMEM so reset rc here */
>>>> +    if ( rc < 0 && rc != -ENOMEM )
>>>> +        rc = 0;
>>>
>>> What's the rationale for discarding all other errors? At least the
>>> patch description, but perhaps even the comment (which btw
>>> is lacking a full stop) should be explaining this.
>>
>> The reason we swallow errors here other then ENOMEM is that it's quite
>> possible that max_gpfn page is unsharable for example, thus rc would
>> have an EINVAL final error value. However, we don't care about the
>> success/fail of individual pages, we only care about the overall
>> state. For that only ENOMEM is critical.
>
> And you think no possible caller would care about the hypercall
> reporting success yet not everything having got done that was
> requested? Sounds strange to me, but as said - at least a bold
> comment please.

The user has no way of knowing what pages will be sharable to begin
with, nor does he has any recourse when something doesn't share. The
request is thus not to "share everything" but to share "as much as
possible". I'll state this explicitly in a comment.

>
>>>> @@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>>          }
>>>>          break;
>>>>
>>>> +        case XENMEM_sharing_op_bulk_share:
>>>> +        {
>>>> +            unsigned long max_sgfn, max_cgfn;
>>>> +            struct domain *cd;
>>>> +
>>>> +            rc = -EINVAL;
>>>> +            if ( !mem_sharing_enabled(d) )
>>>> +                goto out;
>>>> +
>>>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>>>> +                                                   &cd);
>>>> +            if ( rc )
>>>> +                goto out;
>>>> +
>>>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
>>>
>>> Either you pass XENMEM_sharing_op_share here, or you need to
>>> update xen/xsm/flask/policy/access_vectors (even if it's only a
>>> comment which needs updating).
>>
>> Right, it should actually be sharing_op_share here.
>>
>>>
>>> That said - are this and the similar pre-existing XSM checks actually
>>> correct? I.e. is one of the two domains here really controlling the
>>> other? I would have expected that a tool stack domain initiates the
>>> sharing between two domains it controls...
>>
>> Not sure what was the original rationale behind it either.
>
> Daniel - any opinion on this one?
>
>>>> +            if ( rc )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            if ( !mem_sharing_enabled(cd) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>> +                 !atomic_read(&cd->pause_count) )
>>>> +            {
>>>> +                rcu_unlock_domain(cd);
>>>> +                rc = -EINVAL;
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>>> +
>>>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>>>
>>> Why would the two domains need to agree in their maximum
>>> GPFN? There's nothing similar in this file so far. Nor does the
>>> right side of the || match anything pre-existing...
>>
>> The use-case for this function is to deduplicate identical VMs, not to
>> blindly share pages across arbitrary domains. So this is a safety
>> check to avoid accidentally running this function on domains that
>> obviously are not identical. The right hand size is a safety check
>> against not properly initialized input structs where the start point
>> is obviously outside the memory of the domain.
>
> Is that use case the only possible one, or merely the one you care
> about? In the former case, I'd be okay as long as a respctive
> comment got added.

I would say "blind" sharing like this only makes sense for identical
VMs. Replacing the memory of the VM with that of another one not in
the same state will lead to some spectacular crash for sure, so we
should do at least some sanity checking before.

>
>>>> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>>>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>>>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>>>>              domid_t  client_domain; /* IN: the client domain id */
>>>> -        } share;
>>>> +        } share;
>>>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>>>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>>>> +                                                full deduplication. Field is
>>>> +                                                used internally and may change
>>>> +                                                when the hypercall returns. */
>>>> +            uint64_aligned_t shared;         /* OUT: the number of gfns
>>>> +                                                that are shared after this
>>>> +                                                operation including pages
>>>> +                                                already shared before */
>>>> +            domid_t client_domain;           /* IN: the client domain id */
>>>> +        } bulk;
>>>
>>> Let's not repeat pre-existing mistakes: There is explicit padding
>>> missing here, which then also ought to be checked to be zero on
>>> input.
>>
>> This struct is part of a union and is smaller then largest struct in
>> the union, even with padding. So how would padding have any effect on
>> anything?
>
> Being able to use the space for future extensions without having
> to bump the interface version. In domctl-s it's not as important
> due to them being separately versioned, but anyway.

I don't really follow, we are not growing the union here. This struct
is still smaller then the space available in the union, so what would
prevent us from later on expending this struct to the size of the
union without padding?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 15:09     ` Jan Beulich
  2016-05-13 15:31       ` Tamas K Lengyel
@ 2016-05-13 15:35       ` Daniel De Graaf
  2016-05-13 16:19         ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel De Graaf @ 2016-05-13 15:35 UTC (permalink / raw)
  To: Jan Beulich, Tamas K Lengyel; +Cc: Xen-devel

On 05/13/2016 11:09 AM, Jan Beulich wrote:
>>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
[...]
>>>> @@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>>           }
>>>>           break;
>>>>
>>>> +        case XENMEM_sharing_op_bulk_share:
>>>> +        {
>>>> +            unsigned long max_sgfn, max_cgfn;
>>>> +            struct domain *cd;
>>>> +
>>>> +            rc = -EINVAL;
>>>> +            if ( !mem_sharing_enabled(d) )
>>>> +                goto out;
>>>> +
>>>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>>>> +                                                   &cd);
>>>> +            if ( rc )
>>>> +                goto out;
>>>> +
>>>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
>>>
>>> Either you pass XENMEM_sharing_op_share here, or you need to
>>> update xen/xsm/flask/policy/access_vectors (even if it's only a
>>> comment which needs updating).
>>
>> Right, it should actually be sharing_op_share here.
>>
>>>
>>> That said - are this and the similar pre-existing XSM checks actually
>>> correct? I.e. is one of the two domains here really controlling the
>>> other? I would have expected that a tool stack domain initiates the
>>> sharing between two domains it controls...
>>
>> Not sure what was the original rationale behind it either.
>
> Daniel - any opinion on this one?

This hook checks two permissions; the primary check is that current (which
is not either argument) can perform HVM__MEM_SHARING on (cd).  When XSM is
disabled, this is checked as device model permissions.  I don't think this
is what you were asking about, because this is actually a control operation.

The other permission check invoked by this hook, only when XSM is enabled,
is a check for HVM__SHARE_MEM between (d) and (cd).  This is to allow a
security policy to be written that forbids memory sharing between different
users but allow it between VMs belonging to a single user (as an example).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 15:31       ` Tamas K Lengyel
@ 2016-05-13 16:12         ` Jan Beulich
  2016-05-13 16:29           ` Tamas K Lengyel
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2016-05-13 16:12 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Xen-devel, dgdegra

>>> On 13.05.16 at 17:31, <tamas@tklengyel.com> wrote:
> On Fri, May 13, 2016 at 9:09 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
>>> On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
>>>>> @@ -1468,6 +1505,69 @@ int 
>>>>> +            if ( rc )
>>>>> +            {
>>>>> +                rcu_unlock_domain(cd);
>>>>> +                goto out;
>>>>> +            }
>>>>> +
>>>>> +            if ( !mem_sharing_enabled(cd) )
>>>>> +            {
>>>>> +                rcu_unlock_domain(cd);
>>>>> +                rc = -EINVAL;
>>>>> +                goto out;
>>>>> +            }
>>>>> +
>>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>>> +                 !atomic_read(&cd->pause_count) )
>>>>> +            {
>>>>> +                rcu_unlock_domain(cd);
>>>>> +                rc = -EINVAL;
>>>>> +                goto out;
>>>>> +            }
>>>>> +
>>>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>>>> +
>>>>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>>>>
>>>> Why would the two domains need to agree in their maximum
>>>> GPFN? There's nothing similar in this file so far. Nor does the
>>>> right side of the || match anything pre-existing...
>>>
>>> The use-case for this function is to deduplicate identical VMs, not to
>>> blindly share pages across arbitrary domains. So this is a safety
>>> check to avoid accidentally running this function on domains that
>>> obviously are not identical. The right hand size is a safety check
>>> against not properly initialized input structs where the start point
>>> is obviously outside the memory of the domain.
>>
>> Is that use case the only possible one, or merely the one you care
>> about? In the former case, I'd be okay as long as a respctive
>> comment got added.
> 
> I would say "blind" sharing like this only makes sense for identical
> VMs. Replacing the memory of the VM with that of another one not in
> the same state will lead to some spectacular crash for sure, so we
> should do at least some sanity checking before.

Crash? Pages would be shared only if their contents match (and
unshared when they're about to diverge), and once that's
guaranteed, I don't see how it matters whether the two involved
guests are identical. I agree that between dissimilar guests the
potential for sharing is likely much lower, but that's about all.

>>>>> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>>>>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>>>>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>>>>>              domid_t  client_domain; /* IN: the client domain id */
>>>>> -        } share;
>>>>> +        } share;
>>>>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>>>>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>>>>> +                                                full deduplication. Field is
>>>>> +                                                used internally and may change
>>>>> +                                                when the hypercall returns. */
>>>>> +            uint64_aligned_t shared;         /* OUT: the number of gfns
>>>>> +                                                that are shared after this
>>>>> +                                                operation including pages
>>>>> +                                                already shared before */
>>>>> +            domid_t client_domain;           /* IN: the client domain id */
>>>>> +        } bulk;
>>>>
>>>> Let's not repeat pre-existing mistakes: There is explicit padding
>>>> missing here, which then also ought to be checked to be zero on
>>>> input.
>>>
>>> This struct is part of a union and is smaller then largest struct in
>>> the union, even with padding. So how would padding have any effect on
>>> anything?
>>
>> Being able to use the space for future extensions without having
>> to bump the interface version. In domctl-s it's not as important
>> due to them being separately versioned, but anyway.
> 
> I don't really follow, we are not growing the union here. This struct
> is still smaller then the space available in the union, so what would
> prevent us from later on expending this struct to the size of the
> union without padding?

If a future hypervisor expects some value in a field that's now
(anonymous) padding, and a caller written against the headers
with just your patch applied passes a structure with random
data in those padding fields to that new hypervisor, what do
you expect to happen?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 15:35       ` Daniel De Graaf
@ 2016-05-13 16:19         ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2016-05-13 16:19 UTC (permalink / raw)
  To: Daniel De Graaf; +Cc: Xen-devel, Tamas K Lengyel

>>> On 13.05.16 at 17:35, <dgdegra@tycho.nsa.gov> wrote:
> On 05/13/2016 11:09 AM, Jan Beulich wrote:
>>>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
> [...]
>>>>> @@ -1468,6 +1505,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>>>>>           }
>>>>>           break;
>>>>>
>>>>> +        case XENMEM_sharing_op_bulk_share:
>>>>> +        {
>>>>> +            unsigned long max_sgfn, max_cgfn;
>>>>> +            struct domain *cd;
>>>>> +
>>>>> +            rc = -EINVAL;
>>>>> +            if ( !mem_sharing_enabled(d) )
>>>>> +                goto out;
>>>>> +
>>>>> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
>>>>> +                                                   &cd);
>>>>> +            if ( rc )
>>>>> +                goto out;
>>>>> +
>>>>> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
>>>>
>>>> Either you pass XENMEM_sharing_op_share here, or you need to
>>>> update xen/xsm/flask/policy/access_vectors (even if it's only a
>>>> comment which needs updating).
>>>
>>> Right, it should actually be sharing_op_share here.
>>>
>>>>
>>>> That said - are this and the similar pre-existing XSM checks actually
>>>> correct? I.e. is one of the two domains here really controlling the
>>>> other? I would have expected that a tool stack domain initiates the
>>>> sharing between two domains it controls...
>>>
>>> Not sure what was the original rationale behind it either.
>>
>> Daniel - any opinion on this one?
> 
> This hook checks two permissions; the primary check is that current (which
> is not either argument) can perform HVM__MEM_SHARING on (cd).  When XSM is
> disabled, this is checked as device model permissions.  I don't think this
> is what you were asking about, because this is actually a control operation.
> 
> The other permission check invoked by this hook, only when XSM is enabled,
> is a check for HVM__SHARE_MEM between (d) and (cd).  This is to allow a
> security policy to be written that forbids memory sharing between different
> users but allow it between VMs belonging to a single user (as an example).

Ah, I see - I missed the use of current->domain. But the asymmetry
still seems odd: In a sharing operation, both domains are equally
affected, and hence current->domain should have control over both,
and the second check should be done both ways (unless
domain_has_perm()'s first two arguments are treated equally, which
it doesn't look like is the case).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 16:12         ` Jan Beulich
@ 2016-05-13 16:29           ` Tamas K Lengyel
  2016-05-17  7:49             ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Tamas K Lengyel @ 2016-05-13 16:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, dgdegra

On Fri, May 13, 2016 at 10:12 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 13.05.16 at 17:31, <tamas@tklengyel.com> wrote:
>> On Fri, May 13, 2016 at 9:09 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
>>>> On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
>>>>>> @@ -1468,6 +1505,69 @@ int
>>>>>> +            if ( rc )
>>>>>> +            {
>>>>>> +                rcu_unlock_domain(cd);
>>>>>> +                goto out;
>>>>>> +            }
>>>>>> +
>>>>>> +            if ( !mem_sharing_enabled(cd) )
>>>>>> +            {
>>>>>> +                rcu_unlock_domain(cd);
>>>>>> +                rc = -EINVAL;
>>>>>> +                goto out;
>>>>>> +            }
>>>>>> +
>>>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>>>> +                 !atomic_read(&cd->pause_count) )
>>>>>> +            {
>>>>>> +                rcu_unlock_domain(cd);
>>>>>> +                rc = -EINVAL;
>>>>>> +                goto out;
>>>>>> +            }
>>>>>> +
>>>>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>>>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>>>>> +
>>>>>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>>>>>
>>>>> Why would the two domains need to agree in their maximum
>>>>> GPFN? There's nothing similar in this file so far. Nor does the
>>>>> right side of the || match anything pre-existing...
>>>>
>>>> The use-case for this function is to deduplicate identical VMs, not to
>>>> blindly share pages across arbitrary domains. So this is a safety
>>>> check to avoid accidentally running this function on domains that
>>>> obviously are not identical. The right hand size is a safety check
>>>> against not properly initialized input structs where the start point
>>>> is obviously outside the memory of the domain.
>>>
>>> Is that use case the only possible one, or merely the one you care
>>> about? In the former case, I'd be okay as long as a respctive
>>> comment got added.
>>
>> I would say "blind" sharing like this only makes sense for identical
>> VMs. Replacing the memory of the VM with that of another one not in
>> the same state will lead to some spectacular crash for sure, so we
>> should do at least some sanity checking before.
>
> Crash? Pages would be shared only if their contents match (and
> unshared when they're about to diverge), and once that's
> guaranteed, I don't see how it matters whether the two involved
> guests are identical. I agree that between dissimilar guests the
> potential for sharing is likely much lower, but that's about all.

No, that's not how it works. Xen's mem_sharing doesn't do _any_ checks
on the contents of the pages before sharing. It's up to the user to
know what he is doing.

>
>>>>>> @@ -488,7 +489,18 @@ struct xen_mem_sharing_op {
>>>>>>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>>>>>>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>>>>>>              domid_t  client_domain; /* IN: the client domain id */
>>>>>> -        } share;
>>>>>> +        } share;
>>>>>> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>>>>>> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
>>>>>> +                                                full deduplication. Field is
>>>>>> +                                                used internally and may change
>>>>>> +                                                when the hypercall returns. */
>>>>>> +            uint64_aligned_t shared;         /* OUT: the number of gfns
>>>>>> +                                                that are shared after this
>>>>>> +                                                operation including pages
>>>>>> +                                                already shared before */
>>>>>> +            domid_t client_domain;           /* IN: the client domain id */
>>>>>> +        } bulk;
>>>>>
>>>>> Let's not repeat pre-existing mistakes: There is explicit padding
>>>>> missing here, which then also ought to be checked to be zero on
>>>>> input.
>>>>
>>>> This struct is part of a union and is smaller then largest struct in
>>>> the union, even with padding. So how would padding have any effect on
>>>> anything?
>>>
>>> Being able to use the space for future extensions without having
>>> to bump the interface version. In domctl-s it's not as important
>>> due to them being separately versioned, but anyway.
>>
>> I don't really follow, we are not growing the union here. This struct
>> is still smaller then the space available in the union, so what would
>> prevent us from later on expending this struct to the size of the
>> union without padding?
>
> If a future hypervisor expects some value in a field that's now
> (anonymous) padding, and a caller written against the headers
> with just your patch applied passes a structure with random
> data in those padding fields to that new hypervisor, what do
> you expect to happen?

I see, ok.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2016-05-13 16:29           ` Tamas K Lengyel
@ 2016-05-17  7:49             ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2016-05-17  7:49 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Xen-devel, dgdegra

>>> On 13.05.16 at 18:29, <tamas@tklengyel.com> wrote:
> On Fri, May 13, 2016 at 10:12 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 13.05.16 at 17:31, <tamas@tklengyel.com> wrote:
>>> On Fri, May 13, 2016 at 9:09 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 13.05.16 at 16:50, <tamas@tklengyel.com> wrote:
>>>>> On Fri, May 13, 2016 at 6:00 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>>> On 12.05.16 at 17:25, <tamas@tklengyel.com> wrote:
>>>>>>> @@ -1468,6 +1505,69 @@ int
>>>>>>> +            if ( rc )
>>>>>>> +            {
>>>>>>> +                rcu_unlock_domain(cd);
>>>>>>> +                goto out;
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            if ( !mem_sharing_enabled(cd) )
>>>>>>> +            {
>>>>>>> +                rcu_unlock_domain(cd);
>>>>>>> +                rc = -EINVAL;
>>>>>>> +                goto out;
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            if ( !atomic_read(&d->pause_count) ||
>>>>>>> +                 !atomic_read(&cd->pause_count) )
>>>>>>> +            {
>>>>>>> +                rcu_unlock_domain(cd);
>>>>>>> +                rc = -EINVAL;
>>>>>>> +                goto out;
>>>>>>> +            }
>>>>>>> +
>>>>>>> +            max_sgfn = domain_get_maximum_gpfn(d);
>>>>>>> +            max_cgfn = domain_get_maximum_gpfn(cd);
>>>>>>> +
>>>>>>> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
>>>>>>
>>>>>> Why would the two domains need to agree in their maximum
>>>>>> GPFN? There's nothing similar in this file so far. Nor does the
>>>>>> right side of the || match anything pre-existing...
>>>>>
>>>>> The use-case for this function is to deduplicate identical VMs, not to
>>>>> blindly share pages across arbitrary domains. So this is a safety
>>>>> check to avoid accidentally running this function on domains that
>>>>> obviously are not identical. The right hand size is a safety check
>>>>> against not properly initialized input structs where the start point
>>>>> is obviously outside the memory of the domain.
>>>>
>>>> Is that use case the only possible one, or merely the one you care
>>>> about? In the former case, I'd be okay as long as a respctive
>>>> comment got added.
>>>
>>> I would say "blind" sharing like this only makes sense for identical
>>> VMs. Replacing the memory of the VM with that of another one not in
>>> the same state will lead to some spectacular crash for sure, so we
>>> should do at least some sanity checking before.
>>
>> Crash? Pages would be shared only if their contents match (and
>> unshared when they're about to diverge), and once that's
>> guaranteed, I don't see how it matters whether the two involved
>> guests are identical. I agree that between dissimilar guests the
>> potential for sharing is likely much lower, but that's about all.
> 
> No, that's not how it works. Xen's mem_sharing doesn't do _any_ checks
> on the contents of the pages before sharing. It's up to the user to
> know what he is doing.

Ok, okay - that explains to me why this feature is rarely used and
not fully supported.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2015-10-16 17:02   ` Tamas K Lengyel
@ 2015-10-19  9:04     ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2015-10-19  9:04 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, Tamas K Lengyel, Xen-devel,
	Keir Fraser

>>> On 16.10.15 at 19:02, <tamas@tklengyel.com> wrote:
> On Fri, Oct 16, 2015 at 12:46 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 15.10.15 at 20:09, <tamas@tklengyel.com> wrote:
>> > +                    rc = -EFAULT;
>> > +                else
>> > +                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
>> > +                                                       "lh", XENMEM_sharing_op,
>> > +                                                       arg);
>> > +            } else {
>>
>> Coding style.
> 
> Coding style matches the rest of the file.

Sorry, but this is not Xen coding style, and hence if there are other
bad examples in the file, these shouldn't be used as an excuse.

>> > +                mso.u.bulk.applied = atomic_read(&d->shr_pages);
>>
>> Is there no possibility for something else to also modify d->shr_pages
>> in parallel, misguiding the caller when looking at the output field. Also
>> you're not copying this back to guest memory...
> 
> It shouldn't be an issue as I don't really care about how many pages were
> shared by the operation itself, I care about the total number of pages
> shared on the domain. If there were already some pages shared before this
> operation, those should be counted here again. I could of course issue a
> separate getdomaininfo to get this same information, it's just more
> expedient to report it right away instead of having to do a separate call.
> By default shr_pages will be equivalent to the number of pages successfully
> shared during this operation. If someone decides to also do unsharing in
> parallel to this op (..how would that make sense?).. well, that's not
> supported right now so all bets are off from my perspective.

But the field being named "applied" suggests a different meaning.

> Also, we are copying it back to guest memory when the operation finishes
> for all mso. It's not bulk specific, applies to all !rc cases further down.

Oh, okay.

>> > @@ -482,7 +483,16 @@ struct xen_mem_sharing_op {
>> >              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>> >              uint64_aligned_t client_handle; /* IN: handle to the client
>> page */
>> >              domid_t  client_domain; /* IN: the client domain id */
>> > -        } share;
>> > +        } share;
>> > +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
>> > +            domid_t client_domain;           /* IN: the client domain
>> id */
>>
>> Explicit padding here please (albeit I see already from the context
>> that this isn't being done in e.g. the share sub-structure above
>> either).
> 
> Not sure what you mean by explicit padding here. The way it's right now
> matches pretty much what was already in place.

domid_t is a 16-bit value, i.e. there's a gap after this field which
would better be made explicit (as we do in many places elsewhere,
albeit - as said - sadly not in the example visible in patch context
here).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2015-10-16  6:46 ` Jan Beulich
@ 2015-10-16 17:02   ` Tamas K Lengyel
  2015-10-19  9:04     ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Tamas K Lengyel @ 2015-10-16 17:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, Tamas K Lengyel, Xen-devel,
	Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 5660 bytes --]

On Fri, Oct 16, 2015 at 12:46 AM, Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 15.10.15 at 20:09, <tamas@tklengyel.com> wrote:
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -1293,6 +1293,42 @@ int relinquish_shared_pages(struct domain *d)
> >      return rc;
> >  }
> >
> > +static int bulk_share(struct domain *d, struct domain *cd, unsigned
> long max,
> > +                      struct mem_sharing_op_bulk *bulk)
> > +{
> > +    int rc = 0;
> > +    shr_handle_t sh, ch;
> > +
> > +    while( bulk->start <= max )
> > +    {
> > +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
> > +        if ( rc == -ENOMEM )
> > +            break;
> > +        else if ( rc )
>
> Pointless else.
>

Pointless but harmless.


>
> > +            goto next;
> > +
> > +        rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
> > +        if ( rc == -ENOMEM )
> > +            break;
> > +        else if ( rc )
> > +            goto next;
> > +
> > +        mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start,
> ch);
> > +
> > +next:
>
> Labels indented by at least one space please.
>

Yes, keep forgetting..


>
> > @@ -1467,6 +1503,69 @@ int
> mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
> >          }
> >          break;
> >
> > +        case XENMEM_sharing_op_bulk_share:
> > +        {
> > +            unsigned long max_sgfn, max_cgfn;
> > +            struct domain *cd;
> > +
> > +            rc = -EINVAL;
> > +            if ( !mem_sharing_enabled(d) )
> > +                goto out;
> > +
> > +            rc =
> rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> > +                                                   &cd);
> > +            if ( rc )
> > +                goto out;
> > +
> > +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
> > +            if ( rc )
> > +            {
> > +                rcu_unlock_domain(cd);
> > +                goto out;
> > +            }
> > +
> > +            if ( !mem_sharing_enabled(cd) )
> > +            {
> > +                rcu_unlock_domain(cd);
> > +                rc = -EINVAL;
> > +                goto out;
> > +            }
> > +
> > +            if ( !atomic_read(&d->pause_count) ||
> > +                 !atomic_read(&cd->pause_count) )
>
> As said in the reply to your respective inquiry - you want to look at
> controller_pause_count here instead.
>

Thanks!


>
> > +            {
> > +                rcu_unlock_domain(cd);
>
> Considering how many times you do this, perhaps worth putting
> a label at the (current) success path's unlock?
>

I'll take a look.


>
> > +                rc = -EINVAL;
> > +                goto out;
> > +            }
> > +
> > +            max_sgfn = domain_get_maximum_gpfn(d);
> > +            max_cgfn = domain_get_maximum_gpfn(cd);
> > +
> > +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
> > +            {
> > +                rcu_unlock_domain(cd);
> > +                rc = -EINVAL;
> > +                goto out;
> > +            }
> > +
> > +            rc = bulk_share(d, cd, max_sgfn, &mso.u.bulk);
> > +            if ( rc > 0 )
> > +            {
> > +                if ( __copy_to_guest(arg, &mso, 1) )
>
> Wouldn't copying just the one field you care about suffice here?
>

Sure.


>
> > +                    rc = -EFAULT;
> > +                else
> > +                    rc =
> hypercall_create_continuation(__HYPERVISOR_memory_op,
> > +                                                       "lh",
> XENMEM_sharing_op,
> > +                                                       arg);
> > +            } else {
>
> Coding style.
>

Coding style matches the rest of the file.


>
> > +                mso.u.bulk.applied = atomic_read(&d->shr_pages);
>
> Is there no possibility for something else to also modify d->shr_pages
> in parallel, misguiding the caller when looking at the output field. Also
> you're not copying this back to guest memory...
>

It shouldn't be an issue as I don't really care about how many pages were
shared by the operation itself, I care about the total number of pages
shared on the domain. If there were already some pages shared before this
operation, those should be counted here again. I could of course issue a
separate getdomaininfo to get this same information, it's just more
expedient to report it right away instead of having to do a separate call.
By default shr_pages will be equivalent to the number of pages successfully
shared during this operation. If someone decides to also do unsharing in
parallel to this op (..how would that make sense?).. well, that's not
supported right now so all bets are off from my perspective.

Also, we are copying it back to guest memory when the operation finishes
for all mso. It's not bulk specific, applies to all !rc cases further down.


>
> > @@ -482,7 +483,16 @@ struct xen_mem_sharing_op {
> >              uint64_aligned_t client_gfn;    /* IN: the client gfn */
> >              uint64_aligned_t client_handle; /* IN: handle to the client
> page */
> >              domid_t  client_domain; /* IN: the client domain id */
> > -        } share;
> > +        } share;
> > +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
> > +            domid_t client_domain;           /* IN: the client domain
> id */
>
> Explicit padding here please (albeit I see already from the context
> that this isn't being done in e.g. the share sub-structure above
> either).


Not sure what you mean by explicit padding here. The way it's right now
matches pretty much what was already in place.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 8651 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
  2015-10-15 18:09 Tamas K Lengyel
@ 2015-10-16  6:46 ` Jan Beulich
  2015-10-16 17:02   ` Tamas K Lengyel
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2015-10-16  6:46 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, George Dunlap,
	Andrew Cooper, Ian Jackson, Tamas K Lengyel, xen-devel,
	Keir Fraser

>>> On 15.10.15 at 20:09, <tamas@tklengyel.com> wrote:
> --- a/xen/arch/x86/mm/mem_sharing.c
> +++ b/xen/arch/x86/mm/mem_sharing.c
> @@ -1293,6 +1293,42 @@ int relinquish_shared_pages(struct domain *d)
>      return rc;
>  }
>  
> +static int bulk_share(struct domain *d, struct domain *cd, unsigned long max,
> +                      struct mem_sharing_op_bulk *bulk)
> +{
> +    int rc = 0;
> +    shr_handle_t sh, ch;
> +
> +    while( bulk->start <= max )
> +    {
> +        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
> +        if ( rc == -ENOMEM )
> +            break;
> +        else if ( rc )

Pointless else.

> +            goto next;
> +
> +        rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
> +        if ( rc == -ENOMEM )
> +            break;
> +        else if ( rc )
> +            goto next;
> +
> +        mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);
> +
> +next:

Labels indented by at least one space please.

> @@ -1467,6 +1503,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>  
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if ( !mem_sharing_enabled(d) )
> +                goto out;
> +
> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> +                                                   &cd);
> +            if ( rc )
> +                goto out;
> +
> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
> +            if ( rc )
> +            {
> +                rcu_unlock_domain(cd);
> +                goto out;
> +            }
> +
> +            if ( !mem_sharing_enabled(cd) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            if ( !atomic_read(&d->pause_count) ||
> +                 !atomic_read(&cd->pause_count) )

As said in the reply to your respective inquiry - you want to look at
controller_pause_count here instead.

> +            {
> +                rcu_unlock_domain(cd);

Considering how many times you do this, perhaps worth putting
a label at the (current) success path's unlock?

> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            max_sgfn = domain_get_maximum_gpfn(d);
> +            max_cgfn = domain_get_maximum_gpfn(cd);
> +
> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            rc = bulk_share(d, cd, max_sgfn, &mso.u.bulk);
> +            if ( rc > 0 )
> +            {
> +                if ( __copy_to_guest(arg, &mso, 1) )

Wouldn't copying just the one field you care about suffice here?

> +                    rc = -EFAULT;
> +                else
> +                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
> +                                                       "lh", XENMEM_sharing_op,
> +                                                       arg);
> +            } else {

Coding style.

> +                mso.u.bulk.applied = atomic_read(&d->shr_pages);

Is there no possibility for something else to also modify d->shr_pages
in parallel, misguiding the caller when looking at the output field. Also
you're not copying this back to guest memory...

> @@ -482,7 +483,16 @@ struct xen_mem_sharing_op {
>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>              domid_t  client_domain; /* IN: the client domain id */
> -        } share; 
> +        } share;
> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
> +            domid_t client_domain;           /* IN: the client domain id */

Explicit padding here please (albeit I see already from the context
that this isn't being done in e.g. the share sub-structure above
either).

Jan

> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
> +                                                full deduplication. Field is
> +                                                used internally and may change
> +                                                when the hypercall returns. */
> +            uint64_aligned_t applied;        /* OUT: the number of gfns
> +                                                where the operation applied */
> +        } bulk;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
@ 2015-10-15 18:09 Tamas K Lengyel
  2015-10-16  6:46 ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Tamas K Lengyel @ 2015-10-15 18:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Wei Liu, Ian Campbell, Stefano Stabellini,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	Tamas K Lengyel, Keir Fraser

Currently mem-sharing can be performed on a page-by-page base from the control
domain. However, when completely deduplicating (cloning) a VM, this requires
at least 3 hypercalls per page. As the user has to loop through all pages up
to max_gpfn, this process is very slow and wasteful.

This patch introduces a new mem_sharing memop for bulk deduplication where
the user doesn't have to separately nominate each page in both the source and
destination domain, and the looping over all pages happen in the hypervisor.
This significantly reduces the overhead of completely deduplicating entire
domains.

Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v3: Bail if domains are not paused
    Rename bulk_share struct to just bulk
    Return -ENOMEM error if nomination fails
    Return total number of shared pages (not keeping separate count)
v2: Stash hypercall continuation start point in xen_mem_sharing_op_t
    Return number of successfully shared pages in xen_mem_sharing_op_t
---
 tools/libxc/include/xenctrl.h | 15 +++++++
 tools/libxc/xc_memshr.c       | 19 +++++++++
 xen/arch/x86/mm/mem_sharing.c | 99 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/memory.h   | 12 +++++-
 4 files changed, 144 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 3bfa00b..3093a7c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2594,6 +2594,21 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
                     domid_t client_domain,
                     unsigned long client_gfn);
 
+/* Allows to deduplicate the entire memory of a client domain in bulk. Using
+ * this function is equivalent of calling xc_memshr_nominate_gfn for each gfn
+ * in the two domains followed by xc_memshr_share_gfns. If successfull,
+ * returns the number of shared pages in 'shared'. Both domains must be paused.
+ *
+ * May fail with -EINVAL if the source and client domain have different
+ * memory size or if memory sharing is not enabled on either of the domains.
+ * May also fail with -ENOMEM if there isn't enough memory available to store
+ * the sharing metadata before deduplication can happen.
+ */
+int xc_memshr_bulk_share(xc_interface *xch,
+                         domid_t source_domain,
+                         domid_t client_domain,
+                         uint64_t *shared);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater. 
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index deb0aa4..d38a6a9 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -181,6 +181,25 @@ int xc_memshr_add_to_physmap(xc_interface *xch,
     return xc_memshr_memop(xch, source_domain, &mso);
 }
 
+int xc_memshr_bulk_share(xc_interface *xch,
+                         domid_t source_domain,
+                         domid_t client_domain,
+                         uint64_t *shared)
+{
+    int rc;
+    xen_mem_sharing_op_t mso;
+
+    memset(&mso, 0, sizeof(mso));
+
+    mso.op = XENMEM_sharing_op_bulk_share;
+    mso.u.bulk.client_domain = client_domain;
+
+    rc = xc_memshr_memop(xch, source_domain, &mso);
+    if ( !rc && shared ) *shared = mso.u.bulk.applied;
+
+    return rc;
+}
+
 int xc_memshr_domain_resume(xc_interface *xch,
                             domid_t domid)
 {
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index a95e105..b398d5a 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1293,6 +1293,42 @@ int relinquish_shared_pages(struct domain *d)
     return rc;
 }
 
+static int bulk_share(struct domain *d, struct domain *cd, unsigned long max,
+                      struct mem_sharing_op_bulk *bulk)
+{
+    int rc = 0;
+    shr_handle_t sh, ch;
+
+    while( bulk->start <= max )
+    {
+        rc = mem_sharing_nominate_page(d, bulk->start, 0, &sh);
+        if ( rc == -ENOMEM )
+            break;
+        else if ( rc )
+            goto next;
+
+        rc = mem_sharing_nominate_page(cd, bulk->start, 0, &ch);
+        if ( rc == -ENOMEM )
+            break;
+        else if ( rc )
+            goto next;
+
+        mem_sharing_share_pages(d, bulk->start, sh, cd, bulk->start, ch);
+
+next:
+        ++(bulk->start);
+
+        /* Check for continuation if it's not the last iteration. */
+        if ( bulk->start < max && hypercall_preempt_check() )
+        {
+            rc = 1;
+            break;
+        }
+    }
+
+    return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
     int rc;
@@ -1467,6 +1503,69 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
         }
         break;
 
+        case XENMEM_sharing_op_bulk_share:
+        {
+            unsigned long max_sgfn, max_cgfn;
+            struct domain *cd;
+
+            rc = -EINVAL;
+            if ( !mem_sharing_enabled(d) )
+                goto out;
+
+            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
+                                                   &cd);
+            if ( rc )
+                goto out;
+
+            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, mso.op);
+            if ( rc )
+            {
+                rcu_unlock_domain(cd);
+                goto out;
+            }
+
+            if ( !mem_sharing_enabled(cd) )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            if ( !atomic_read(&d->pause_count) ||
+                 !atomic_read(&cd->pause_count) )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            max_sgfn = domain_get_maximum_gpfn(d);
+            max_cgfn = domain_get_maximum_gpfn(cd);
+
+            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
+            {
+                rcu_unlock_domain(cd);
+                rc = -EINVAL;
+                goto out;
+            }
+
+            rc = bulk_share(d, cd, max_sgfn, &mso.u.bulk);
+            if ( rc > 0 )
+            {
+                if ( __copy_to_guest(arg, &mso, 1) )
+                    rc = -EFAULT;
+                else
+                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
+                                                       "lh", XENMEM_sharing_op,
+                                                       arg);
+            } else {
+                mso.u.bulk.applied = atomic_read(&d->shr_pages);
+            }
+
+            rcu_unlock_domain(cd);
+        }
+        break;
+
         case XENMEM_sharing_op_debug_gfn:
         {
             unsigned long gfn = mso.u.debug.u.gfn;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 320de91..e2ba1e7 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -447,6 +447,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_debug_gref        5
 #define XENMEM_sharing_op_add_physmap       6
 #define XENMEM_sharing_op_audit             7
+#define XENMEM_sharing_op_bulk_share        8
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
@@ -482,7 +483,16 @@ struct xen_mem_sharing_op {
             uint64_aligned_t client_gfn;    /* IN: the client gfn */
             uint64_aligned_t client_handle; /* IN: handle to the client page */
             domid_t  client_domain; /* IN: the client domain id */
-        } share; 
+        } share;
+        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
+            domid_t client_domain;           /* IN: the client domain id */
+            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
+                                                full deduplication. Field is
+                                                used internally and may change
+                                                when the hypercall returns. */
+            uint64_aligned_t applied;        /* OUT: the number of gfns
+                                                where the operation applied */
+        } bulk;
         struct mem_sharing_op_debug {     /* OP_DEBUG_xxx */
             union {
                 uint64_aligned_t gfn;      /* IN: gfn to debug          */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-05-17  7:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12 15:25 [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Tamas K Lengyel
2016-05-12 15:25 ` [PATCH v3 2/2] tests/mem-sharing: Add bulk option to memshrtool Tamas K Lengyel
2016-05-13 12:00 ` [PATCH v3 1/2] x86/mem-sharing: Bulk mem-sharing entire domains Jan Beulich
2016-05-13 14:50   ` Tamas K Lengyel
2016-05-13 15:09     ` Jan Beulich
2016-05-13 15:31       ` Tamas K Lengyel
2016-05-13 16:12         ` Jan Beulich
2016-05-13 16:29           ` Tamas K Lengyel
2016-05-17  7:49             ` Jan Beulich
2016-05-13 15:35       ` Daniel De Graaf
2016-05-13 16:19         ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2015-10-15 18:09 Tamas K Lengyel
2015-10-16  6:46 ` Jan Beulich
2015-10-16 17:02   ` Tamas K Lengyel
2015-10-19  9:04     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).