Re: [PATCH v5 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.

From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>, xen-devel@lists.xen.org
Cc: Kevin Tian <kevin.tian@intel.com>,
	Jan Beulich <jbeulich@suse.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	zhiyuan.lv@intel.com, Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v5 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
Date: Tue, 30 Aug 2016 20:17:07 +0800	[thread overview]
Message-ID: <57C57943.5030609@linux.intel.com> (raw)
In-Reply-To: <66dee11c-242c-20ff-9581-f42402c31d8b@citrix.com>

On 8/16/2016 9:35 PM, George Dunlap wrote:
> On 12/07/16 10:02, Yu Zhang wrote:
>> This patch resets p2m_ioreq_server entries back to p2m_ram_rw,
>> after an ioreq server has unmapped. The resync is done both
>> asynchronously with the current p2m_change_entry_type_global()
>> interface, and synchronously by iterating the p2m table. The
>> synchronous resetting is necessary because we need to guarantee
>> the p2m table is clean before another ioreq server is mapped.
>> And since the sweeping of p2m table could be time consuming,
>> it is done with hypercall continuation. Asynchronous approach
>> is also taken so that p2m_ioreq_server entries can also be reset
>> when the HVM is scheduled between hypercall continuations.
>>
>> This patch also disallows live migration, when there's still any
>> outstanding p2m_ioreq_server entry left. The core reason is our
>> current implementation of p2m_change_entry_type_global() can not
>> tell the state of p2m_ioreq_server entries(can not decide if an
>> entry is to be emulated or to be resynced).
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Thanks for doing this Yu Zhang!  A couple of comments.

Thanks a lot for your advice, George.
And sorry for the delayed reply(had been in annual leave previously).

>> ---
>> Cc: Paul Durrant <paul.durrant@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> ---
>>   xen/arch/x86/hvm/hvm.c    | 52 ++++++++++++++++++++++++++++++++++++++++++++---
>>   xen/arch/x86/mm/hap/hap.c |  9 ++++++++
>>   xen/arch/x86/mm/p2m-ept.c |  6 +++++-
>>   xen/arch/x86/mm/p2m-pt.c  | 10 +++++++--
>>   xen/arch/x86/mm/p2m.c     |  3 +++
>>   xen/include/asm-x86/p2m.h |  5 ++++-
>>   6 files changed, 78 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 4d98cc6..e57c8b9 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -5485,6 +5485,7 @@ static int hvmop_set_mem_type(
>>       {
>>           unsigned long pfn = a.first_pfn + start_iter;
>>           p2m_type_t t;
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>   
>>           get_gfn_unshare(d, pfn, &t);
>>           if ( p2m_is_paging(t) )
>> @@ -5512,6 +5513,12 @@ static int hvmop_set_mem_type(
>>           if ( rc )
>>               goto out;
>>   
>> +        if ( t == p2m_ram_rw && memtype[a.hvmmem_type] == p2m_ioreq_server )
>> +            p2m->ioreq.entry_count++;
>> +
>> +        if ( t == p2m_ioreq_server && memtype[a.hvmmem_type] == p2m_ram_rw )
>> +            p2m->ioreq.entry_count--;
> Changing these here might make sense if they were *only* changed in the
> hvm code; but as you also have to modify this value in the p2m code (in
> resolve_misconfig), I think it makes sense to try to do all the counting
> in the p2m code.  That would take care of any locking issues as well.
>
> Logically the most sensible place to do it would be
> atomic_write_ept_entry(); that would make it basically impossible to
> miss a case where we change to of from p2m_ioreq_server.
>
> On the other hand, it would mean adding code to a core path for all p2m
> updates.

Well, the atomic_write_ept_entry() is a rather core path, and is only 
for ept. Although p2m_ioreq_server is
not accepted in shadow page table code, I'd still like to support it in 
AMD SVM - also consistent with the
p2m_change_entry_type_global() functionality. :)

I am considering p2m_change_type_one(), which has gfn_lock/unlock() 
inside. And since previously the p2m
type changes to/from p2m_ioreq_server are only triggered by hvmop which 
leads to p2m_change_type_one(),
I believe calculating the entry_count here would make sense.

Besides, the resolve_misconfig()/do_recalc() still need to decrease the 
entry_count for p2m_ioreq_server,
but since both routines have p2m locked by their caller, so it's also OK.

Other places that read entry_count can be protected by 
read_atomic(&p2m->ioreq.entry_count).

Do you think this acceptable?

>> +
>>           /* Check for continuation if it's not the last interation */
>>           if ( a.nr > ++start_iter && !(start_iter & HVMOP_op_mask) &&
>>                hypercall_preempt_check() )
>> @@ -5530,11 +5537,13 @@ static int hvmop_set_mem_type(
>>   }
>>   
>>   static int hvmop_map_mem_type_to_ioreq_server(
>> -    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop)
>> +    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop,
>> +    unsigned long *iter)
>>   {
>>       xen_hvm_map_mem_type_to_ioreq_server_t op;
>>       struct domain *d;
>>       int rc;
>> +    unsigned long gfn = *iter;
>>   
>>       if ( copy_from_guest(&op, uop, 1) )
>>           return -EFAULT;
>> @@ -5559,7 +5568,42 @@ static int hvmop_map_mem_type_to_ioreq_server(
>>       if ( rc != 0 )
>>           goto out;
>>   
>> -    rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, op.flags);
>> +    if ( gfn == 0 || op.flags != 0 )
>> +        rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, op.flags);
>> +
>> +    /*
>> +     * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
>> +     * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
>> +     */
>> +    if ( op.flags == 0 && rc == 0 )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        while ( gfn <= p2m->max_mapped_pfn )
>> +        {
>> +            p2m_type_t t;
>> +
>> +            if ( p2m->ioreq.entry_count == 0 )
>> +                break;
> Any reason not to make this part of the while() condition?
>
>> +
>> +            get_gfn_unshare(d, gfn, &t);
> This will completely unshare all pages in a domain below the last
> dangling p2m_ioreq_server page.  I don't think unsharing is necessary at
> all; if it's shared, it certainly won't be of type p2m_ioreq_server.
>
> Actually -- it seems like ept_get_entry() really should be calling
> resolve_misconfig(), the same way that ept_set_entry() does.  In that
> case, simply going through and reading all the p2m entries would suffice
> to finish off the type change.

Well, I have doubts about this - resolve_misconfig() is supposed to 
change p2m table,
and should probably be protected by p2m lock. But ept_get_entry() may be 
triggered
by some routine like get_gfn_query_unlocked().

Besides, although calling resolve_misconfig() will speed up the p2m type 
sweeping,
it also introduces more cost each time we peek the p2m table.

>
> Although really, it seems like having a "p2m_finish_type_change()"
> function which looked for misconfigured entries and reset them would be
> a step closer to the right direction, in that it could be re-used in
> other situations where the type change may not have finished.

Thanks. This sounds a good idea. And I'd like to define the interface of 
this routine
like this:
  void p2m_finish_type_change(struct domain *d,
                            unsigned long start, unsigned long end,
                            p2m_type_t ot, p2m_type_t nt)
So it can be triggered by hvmop_map_mem_type_to_ioreq_server() with 
hypercall continuation
method. Do you think this OK?

In fact, I have worked out a new version of this patch according to my 
understanding, and if you
have no more disagreement on the above issues, I'll send out the new 
patch soon after some test
in XenGT environment. :)

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel