Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains

From: George Dunlap <dunlapg@umich.edu>
To: Tamas K Lengyel <tamas@tklengyel.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH v5 1/2] x86/mem-sharing: Bulk mem-sharing entire domains
Date: Wed, 22 Jun 2016 16:38:13 +0100	[thread overview]
Message-ID: <CAFLBxZbWGTqn4dtpJaJW9fKcNqrBgKTMA2ohWAkom4Y+1YXafw@mail.gmail.com> (raw)
In-Reply-To: <1465687481-10095-1-git-send-email-tamas@tklengyel.com>

On Sun, Jun 12, 2016 at 12:24 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
> Currently mem-sharing can be performed on a page-by-page base from the control
> domain. However, when completely deduplicating (cloning) a VM, this requires
> at least 3 hypercalls per page. As the user has to loop through all pages up
> to max_gpfn, this process is very slow and wasteful.
>
> This patch introduces a new mem_sharing memop for bulk deduplication where
> the user doesn't have to separately nominate each page in both the source and
> destination domain, and the looping over all pages happen in the hypervisor.
> This significantly reduces the overhead of completely deduplicating entire
> domains.
>
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> ---
> Ian Jackson <ian.jackson@eu.citrix.com>
> George Dunlap <george.dunlap@eu.citrix.com>
> Jan Beulich <jbeulich@suse.com>
> Andrew Cooper <andrew.cooper3@citrix.com>

I'm sorry I'm a bit late to this party -- I'm not sure how I managed
to miss the earlier posts of this.  A couple of questions...

> @@ -1468,6 +1516,79 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
>          }
>          break;
>
> +        case XENMEM_sharing_op_bulk_share:
> +        {
> +            unsigned long max_sgfn, max_cgfn;
> +            struct domain *cd;
> +
> +            rc = -EINVAL;
> +            if( mso.u.bulk._pad[0] || mso.u.bulk._pad[1] || mso.u.bulk._pad[2] )
> +                goto out;
> +
> +            if ( !mem_sharing_enabled(d) )
> +                goto out;
> +
> +            rc = rcu_lock_live_remote_domain_by_id(mso.u.bulk.client_domain,
> +                                                   &cd);
> +            if ( rc )
> +                goto out;
> +
> +            /*
> +             * We reuse XENMEM_sharing_op_share XSM check here as this is essentially
> +             * the same concept repeated over multiple pages.
> +             */
> +            rc = xsm_mem_sharing_op(XSM_DM_PRIV, d, cd, XENMEM_sharing_op_share);
> +            if ( rc )
> +            {
> +                rcu_unlock_domain(cd);
> +                goto out;
> +            }
> +
> +            if ( !mem_sharing_enabled(cd) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            if ( !atomic_read(&d->pause_count) ||
> +                 !atomic_read(&cd->pause_count) )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }

I realize that Jan asked for this, but I'm really not sure what good
this is supposed to do, since the guest can be un-paused at any point
halfway through the transaction.

Wouldn't it make more sense to have this function pause and unpause
the domains itself?

> +
> +            max_sgfn = domain_get_maximum_gpfn(d);
> +            max_cgfn = domain_get_maximum_gpfn(cd);
> +
> +            if ( max_sgfn != max_cgfn || max_sgfn < mso.u.bulk.start )
> +            {
> +                rcu_unlock_domain(cd);
> +                rc = -EINVAL;
> +                goto out;
> +            }
> +
> +            rc = bulk_share(d, cd, max_sgfn + 1, &mso.u.bulk);
> +            if ( rc > 0 )
> +            {
> +                if ( __copy_to_guest(arg, &mso, 1) )
> +                    rc = -EFAULT;
> +                else
> +                    rc = hypercall_create_continuation(__HYPERVISOR_memory_op,
> +                                                       "lh", XENMEM_sharing_op,
> +                                                       arg);
> +            }
> +            else
> +            {
> +                mso.u.bulk.start = 0;
> +                mso.u.bulk.shared = atomic_read(&cd->shr_pages);
> +            }
> +
> +            rcu_unlock_domain(cd);
> +        }
> +        break;
> +
>          case XENMEM_sharing_op_debug_gfn:
>          {
>              unsigned long gfn = mso.u.debug.u.gfn;
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 29ec571..084f06e 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -465,6 +465,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
>  #define XENMEM_sharing_op_debug_gref        5
>  #define XENMEM_sharing_op_add_physmap       6
>  #define XENMEM_sharing_op_audit             7
> +#define XENMEM_sharing_op_bulk_share        8
>
>  #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
>  #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
> @@ -500,7 +501,19 @@ struct xen_mem_sharing_op {
>              uint64_aligned_t client_gfn;    /* IN: the client gfn */
>              uint64_aligned_t client_handle; /* IN: handle to the client page */
>              domid_t  client_domain; /* IN: the client domain id */
> -        } share;
> +        } share;
> +        struct mem_sharing_op_bulk {         /* OP_BULK_SHARE */
> +            uint64_aligned_t start;          /* IN: start gfn. Set to 0 for
> +                                                full deduplication. Field is
> +                                                reset to 0 when hypercall
> +                                                completes */

It took me a while to figure out that this value isn't actually
intended to be used as described.  You're actually intended to always
set this to zero, and the hypervisor just uses it for scratch space.

To start with, it seems like having a "bulk share" option which could
do just a specific range would be useful as well as a "bulk share"
which automatically deduped the entire set of memory.

Secondly, structuring the information like the other memory operations
-- for example, "nr_extents" and "nr_done" -- would make it more
consistent, and would allow you to also to avoid overwriting one of
the "in" values and having to restore it when you were done.

Other than that, it looks OK.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel