Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.

From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org, Paul Durrant <paul.durrant@citrix.com>,
	zhiyuan.lv@intel.com, Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v4 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Thu, 16 Jun 2016 17:32:52 +0800	[thread overview]
Message-ID: <57627244.7010503@linux.intel.com> (raw)
In-Reply-To: <5761657C02000078000F5512@prv-mh.provo.novell.com>

> On 6/14/2016 6:45 PM, Jan Beulich wrote:
>>>>>> On 19.05.16 at 11:05, <yu.c.zhang@linux.intel.com> wrote:
>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>> let one ioreq server claim/disclaim its responsibility for the
>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>> of this HVMOP can specify which kind of operation is supposed
>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>> only support the emulation of write operations. And it can be
>>>> easily extended to support the emulation of read ones if an
>>>> ioreq server has such requirement in the future.
>>> Didn't we determine that this isn't as easy as everyone first thought?
>> My understanding is that to emulate read, we need to change the definition
>> of is_epte_present(), and I do not think this change will cause much
>> trouble.
>> But since no one is using the read emulation, I am convinced the more
>> cautious
>> way is to only support emulations for write operations for now.
> Well, okay. I'd personally drop the "easily", but you know what
> to tell people if later they come ask how this "easily" was meant.

OK. Let's drop the word "easily". :)

>>>> @@ -914,6 +916,45 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain
>> *d, ioservid_t id,
>>>>        return rc;
>>>>    }
>>>>    
>>>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>> +                                     uint16_t type, uint32_t flags)
>>> I see no reason why both can't be unsigned int.
>> Parameter type is passed in from the type field inside struct
>> xen_hvm_map_mem_type_to_ioreq_server,
>> which is a uint16_t, followed with a uint16_t pad. Now I am wondering,
>> may be we can just remove the pad
>> field in this structure and just define type as uint32_t.
> I think keeping the interface structure unchanged is the desirable
> route here. What I dislike is the passing around of non-natural
> width types, which is more expensive in terms of processing. I.e.
> as long as a fixed width type (which is necessary to be used in
> the public interface) fits in "unsigned int", that should be the
> respective internal type. Otherwise "unsigned long" etc.
>
> There are cases where even internally we indeed want to use
> fixed width types, and admittedly there are likely far more cases
> where internally fixed width types get used without good reason,
> but just like everywhere else - let's please not make this worse.
> IOW please use fixed width types only when you really need them.
OK. I can keep the interface, and using uint32_t type in the internal 
routine
would means a implicit type conversion from uint16_6, which I do not think
is a problem.

>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -132,6 +132,12 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>                entry->r = entry->w = entry->x = 1;
>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>                break;
>>>> +        case p2m_ioreq_server:
>>>> +            entry->r = entry->x = 1;
>>> Why x?
>> Setting entry->x to 1 is not a must. I can remove it. :)
> Please do. We shouldn't grant permissions without reason.

Got it. Thanks.

>>>> @@ -94,8 +96,16 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>>>        default:
>>>>            return flags | _PAGE_NX_BIT;
>>>>        case p2m_grant_map_ro:
>>>> -    case p2m_ioreq_server:
>>>>            return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>>>> +    case p2m_ioreq_server:
>>>> +    {
>>>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW;
>>>> +
>>>> +        if ( p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS )
>>>> +            return flags & ~_PAGE_RW;
>>>> +        else
>>>> +            return flags;
>>>> +    }
>>> Same here (for the missing _PAGE_NX) plus no need for braces.
>> I'll remove the brace. And we do not need to set the _PAGE_NX_BIT, like
>> the p2m_ram_ro case I guess.
> I hope you mean the inverse: You should set _PAGE_NX_BIT here.
Oh, right. I meant the reverse. Thanks for the remind. :)
And I have a question,  here in p2m_type_to_flags(), I saw current code 
uses _PAGE_NX_BIT
to disable the executable permission,  and I wonder, why don't we choose 
the _PAGE_NX,
which is defined as:

#define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)

How do we know for sure that bit 63 from pte is not a reserved one 
without checking
the cpu capability(the cpu_has_nx)? Is there any other reasons, i.e. the 
page tables might
be shared with IOMMU?

>>>> +                         struct hvm_ioreq_server *s)
>>>> +{
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    int rc;
>>>> +
>>>> +    spin_lock(&p2m->ioreq.lock);
>>>> +
>>>> +    if ( flags == 0 )
>>>> +    {
>>>> +        rc = -EINVAL;
>>>> +        if ( p2m->ioreq.server != s )
>>>> +            goto out;
>>>> +
>>>> +        /* Unmap ioreq server from p2m type by passing flags with 0. */
>>>> +        p2m->ioreq.server = NULL;
>>>> +        p2m->ioreq.flags = 0;
>>>> +    }
>>> What does "passing" refer to in the comment?
>> It means if this routine is called with flags=0, it will unmap the ioreq
>> server.
> Oh, that's a completely different reading of the comment than I had
> implied: With what you say, "flags" here really refers to the function
> parameter, whereas I implied it to refer to the structure field. I think
> if that's what you want to say, then the comment should be put next
> to the surrounding if() to clarify what "flags" refers to.
Agreed. I'll move this comment above the surrounding if().

>>>> +{
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    struct hvm_ioreq_server *s;
>>>> +
>>>> +    spin_lock(&p2m->ioreq.lock);
>>>> +
>>>> +    s = p2m->ioreq.server;
>>>> +    *flags = p2m->ioreq.flags;
>>>> +
>>>> +    spin_unlock(&p2m->ioreq.lock);
>>>> +    return s;
>>>> +}
>>> Locking is somewhat strange here: You protect against the "set"
>>> counterpart altering state while you retrieve it, but you don't
>>> protect against the returned data becoming stale by the time
>>> the caller can consume it. Is that not a problem? (The most
>>> concerning case would seem to be a race of hvmop_set_mem_type()
>>> with de-registration of the type.)
>> Yes. The case you mentioned might happen. But it's not a big deal I
>> guess. If such
>> case happens, the  backend driver will receive an io request and can
>> then discard it
>> if it has just de-registered the mem type.
> Could you clarify in a comment then what the lock is (and is not)
> meant to guard against?

For now, only one ioreq server is allowed to be bounded with 
HVMMEM_ioreq_server,
one usage of this lock is that in p2m_set_ioreq_server(), it can prevent 
concurrent
setting requirements from multiple ioreq servers. And although it can 
not protect the
return value from p2m_get_ioreq_server(), it can still provide some kind 
protection inside
the routine.
I'll add the comments to illustrate this. :)

>>>> +struct xen_hvm_map_mem_type_to_ioreq_server {
>>>> +    domid_t domid;      /* IN - domain to be serviced */
>>>> +    ioservid_t id;      /* IN - ioreq server id */
>>>> +    uint16_t type;      /* IN - memory type */
>>>> +    uint16_t pad;
>>> This field does not appear to get checked in the handler.
>> I am now wondering, how about we remove this pad field and define type
>> as uint32_t?
> As above - I think the current layout is fine. But I'm also not heavily
> opposed to using uint32_t here. It's not a stable interface anyway
> (and I already have a series mostly ready to split off all control
> operations from the HVMOP_* ones, into a new HVMCTL_* set,
> which will make all of them interface-versioned).

I'd like to keep this interface. BTW, you mentioned "this field does not 
appear to
get checked in the handler", do you mean we need to check the pad in the 
handler?
And why?

Thanks
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel