Re: [PATCH v5 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.

From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: George Dunlap <george.dunlap@citrix.com>, xen-devel@lists.xen.org
Subject: Re: [PATCH v5 4/4] x86/ioreq server: Reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
Date: Fri, 2 Sep 2016 19:00:27 +0800	[thread overview]
Message-ID: <57C95BCB.5050003@linux.intel.com> (raw)
In-Reply-To: <57C57943.5030609@linux.intel.com>

On 8/30/2016 8:17 PM, Yu Zhang wrote:
>
>
> On 8/16/2016 9:35 PM, George Dunlap wrote:
>> On 12/07/16 10:02, Yu Zhang wrote:
>>> This patch resets p2m_ioreq_server entries back to p2m_ram_rw,
>>> after an ioreq server has unmapped. The resync is done both
>>> asynchronously with the current p2m_change_entry_type_global()
>>> interface, and synchronously by iterating the p2m table. The
>>> synchronous resetting is necessary because we need to guarantee
>>> the p2m table is clean before another ioreq server is mapped.
>>> And since the sweeping of p2m table could be time consuming,
>>> it is done with hypercall continuation. Asynchronous approach
>>> is also taken so that p2m_ioreq_server entries can also be reset
>>> when the HVM is scheduled between hypercall continuations.
>>>
>>> This patch also disallows live migration, when there's still any
>>> outstanding p2m_ioreq_server entry left. The core reason is our
>>> current implementation of p2m_change_entry_type_global() can not
>>> tell the state of p2m_ioreq_server entries(can not decide if an
>>> entry is to be emulated or to be resynced).
>>>
>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> Thanks for doing this Yu Zhang!  A couple of comments.
>
> Thanks a lot for your advice, George.
> And sorry for the delayed reply(had been in annual leave previously).
>
>>> ---
>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>> ---
>>>   xen/arch/x86/hvm/hvm.c    | 52 
>>> ++++++++++++++++++++++++++++++++++++++++++++---
>>>   xen/arch/x86/mm/hap/hap.c |  9 ++++++++
>>>   xen/arch/x86/mm/p2m-ept.c |  6 +++++-
>>>   xen/arch/x86/mm/p2m-pt.c  | 10 +++++++--
>>>   xen/arch/x86/mm/p2m.c     |  3 +++
>>>   xen/include/asm-x86/p2m.h |  5 ++++-
>>>   6 files changed, 78 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index 4d98cc6..e57c8b9 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -5485,6 +5485,7 @@ static int hvmop_set_mem_type(
>>>       {
>>>           unsigned long pfn = a.first_pfn + start_iter;
>>>           p2m_type_t t;
>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>             get_gfn_unshare(d, pfn, &t);
>>>           if ( p2m_is_paging(t) )
>>> @@ -5512,6 +5513,12 @@ static int hvmop_set_mem_type(
>>>           if ( rc )
>>>               goto out;
>>>   +        if ( t == p2m_ram_rw && memtype[a.hvmmem_type] == 
>>> p2m_ioreq_server )
>>> +            p2m->ioreq.entry_count++;
>>> +
>>> +        if ( t == p2m_ioreq_server && memtype[a.hvmmem_type] == 
>>> p2m_ram_rw )
>>> +            p2m->ioreq.entry_count--;
>> Changing these here might make sense if they were *only* changed in the
>> hvm code; but as you also have to modify this value in the p2m code (in
>> resolve_misconfig), I think it makes sense to try to do all the counting
>> in the p2m code.  That would take care of any locking issues as well.
>>
>> Logically the most sensible place to do it would be
>> atomic_write_ept_entry(); that would make it basically impossible to
>> miss a case where we change to of from p2m_ioreq_server.
>>
>> On the other hand, it would mean adding code to a core path for all p2m
>> updates.
>
> Well, the atomic_write_ept_entry() is a rather core path, and is only 
> for ept. Although p2m_ioreq_server is
> not accepted in shadow page table code, I'd still like to support it 
> in AMD SVM - also consistent with the
> p2m_change_entry_type_global() functionality. :)
>
> I am considering p2m_change_type_one(), which has gfn_lock/unlock() 
> inside. And since previously the p2m
> type changes to/from p2m_ioreq_server are only triggered by hvmop 
> which leads to p2m_change_type_one(),
> I believe calculating the entry_count here would make sense.
>
> Besides, the resolve_misconfig()/do_recalc() still need to decrease 
> the entry_count for p2m_ioreq_server,
> but since both routines have p2m locked by their caller, so it's also OK.
>
> Other places that read entry_count can be protected by 
> read_atomic(&p2m->ioreq.entry_count).
>
> Do you think this acceptable?
>
>>> +
>>>           /* Check for continuation if it's not the last interation */
>>>           if ( a.nr > ++start_iter && !(start_iter & HVMOP_op_mask) &&
>>>                hypercall_preempt_check() )
>>> @@ -5530,11 +5537,13 @@ static int hvmop_set_mem_type(
>>>   }
>>>     static int hvmop_map_mem_type_to_ioreq_server(
>>> - XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop)
>>> + XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop,
>>> +    unsigned long *iter)
>>>   {
>>>       xen_hvm_map_mem_type_to_ioreq_server_t op;
>>>       struct domain *d;
>>>       int rc;
>>> +    unsigned long gfn = *iter;
>>>         if ( copy_from_guest(&op, uop, 1) )
>>>           return -EFAULT;
>>> @@ -5559,7 +5568,42 @@ static int hvmop_map_mem_type_to_ioreq_server(
>>>       if ( rc != 0 )
>>>           goto out;
>>>   -    rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, 
>>> op.flags);
>>> +    if ( gfn == 0 || op.flags != 0 )
>>> +        rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, 
>>> op.flags);
>>> +
>>> +    /*
>>> +     * Iterate p2m table when an ioreq server unmaps from 
>>> p2m_ioreq_server,
>>> +     * and reset the remaining p2m_ioreq_server entries back to 
>>> p2m_ram_rw.
>>> +     */
>>> +    if ( op.flags == 0 && rc == 0 )
>>> +    {
>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +
>>> +        while ( gfn <= p2m->max_mapped_pfn )
>>> +        {
>>> +            p2m_type_t t;
>>> +
>>> +            if ( p2m->ioreq.entry_count == 0 )
>>> +                break;
>> Any reason not to make this part of the while() condition?
>>
>>> +
>>> +            get_gfn_unshare(d, gfn, &t);
>> This will completely unshare all pages in a domain below the last
>> dangling p2m_ioreq_server page.  I don't think unsharing is necessary at
>> all; if it's shared, it certainly won't be of type p2m_ioreq_server.
>>
>> Actually -- it seems like ept_get_entry() really should be calling
>> resolve_misconfig(), the same way that ept_set_entry() does.  In that
>> case, simply going through and reading all the p2m entries would suffice
>> to finish off the type change.
>
> Well, I have doubts about this - resolve_misconfig() is supposed to 
> change p2m table,
> and should probably be protected by p2m lock. But ept_get_entry() may 
> be triggered
> by some routine like get_gfn_query_unlocked().
>
> Besides, although calling resolve_misconfig() will speed up the p2m 
> type sweeping,
> it also introduces more cost each time we peek the p2m table.
>
>>
>> Although really, it seems like having a "p2m_finish_type_change()"
>> function which looked for misconfigured entries and reset them would be
>> a step closer to the right direction, in that it could be re-used in
>> other situations where the type change may not have finished.
>
> Thanks. This sounds a good idea. And I'd like to define the interface 
> of this routine
> like this:
>  void p2m_finish_type_change(struct domain *d,
>                            unsigned long start, unsigned long end,
>                            p2m_type_t ot, p2m_type_t nt)
> So it can be triggered by hvmop_map_mem_type_to_ioreq_server() with 
> hypercall continuation
> method. Do you think this OK?
>
> In fact, I have worked out a new version of this patch according to my 
> understanding, and if you
> have no more disagreement on the above issues, I'll send out the new 
> patch soon after some test
> in XenGT environment. :)
>

Hi George, I just send out the v6 patch set, you can have a look on that 
version directly when you
have time. Thanks! :)

Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel