Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.

From: Yu Zhang <yu.c.zhang@linux.intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org, Paul Durrant <paul.durrant@citrix.com>,
	zhiyuan.lv@intel.com, Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Sat, 11 Mar 2017 16:42:07 +0800	[thread overview]
Message-ID: <58C3B85F.6060205@linux.intel.com> (raw)
In-Reply-To: <58C2D462020000780014203E@prv-mh.provo.novell.com>

On 3/10/2017 11:29 PM, Jan Beulich wrote:
>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>> changes in v7:
>>    - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
>>    - According to comments from George: removed domain_pause/unpause() in
>>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>>      and we can avoid the:
>>      a> deadlock between p2m lock and ioreq server lock by using these locks
>>         in the same order - solved in patch 4;
> That is, until patch 4 there is deadlock potential? I think you want to
> re-order the patches if so. Or was it that the type can't really be used
> until the last patch of the series? (I'm sorry, it's been quite a while
> since the previous version.)

Oh. There's no deadlock potential in this version patch set. But in v6, 
there was, and I used
domain_pause/unpause() to avoid this. Later on, I realized that if I use 
different locks in the
same order, the deadlock potential can be avoid and we do not need 
domain_pause/unpause
in this version.

>> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
>>           break;
>>       }
>>   
>> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
>> +    {
>> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>> +            &op.u.map_mem_type_to_ioreq_server;
>> +
>> +        rc = -EINVAL;
>> +        if ( data->pad )
>> +            break;
>> +
>> +        /* Only support for HAP enabled hvm. */
>> +        if ( !hap_enabled(d) )
>> +            break;
> Perhaps better to give an error other than -EINVAL in this case?
> If so, then the same error should likely also be used in your
> set_mem_type() addition.
How about -ENOTSUP?

>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
>>       uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
>>   {
>>       struct vcpu *curr = current;
>> +    struct domain *currd = curr->domain;
>>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>>       ioreq_t p = {
>>           .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
>> @@ -140,7 +141,7 @@ static int hvmemul_do_io(
>>                (p.dir != dir) ||
>>                (p.df != df) ||
>>                (p.data_is_ptr != data_is_addr) )
>> -            domain_crash(curr->domain);
>> +            domain_crash(currd);
> If you mean to do this transformation here, then please do so
> consistently for the entire function.

OK. Thanks.

>> @@ -177,8 +178,65 @@ static int hvmemul_do_io(
>>           break;
>>       case X86EMUL_UNHANDLEABLE:
>>       {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
>> +        /*
>> +         * Xen isn't emulating the instruction internally, so see if
>> +         * there's an ioreq server that can handle it. Rules:
>> +         *
>> +         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
>> +         * to choose the ioreq server by range. If no server is found,
>> +         * the access is ignored.
>> +         *
>> +         * - p2m_ioreq_server accesses are handled by the current
>> +         * ioreq_server for the domain, but there are some corner
>> +         * cases:
> Who or what is "the current ioreq_server for the domain"?
It means "the current ioreq_server which maps the p2m_ioreq_server type 
for this domain"...
I'd like to use a succinct phrase, but now seems not accurate enough. 
Any preference?

>> +         *   - If the domain ioreq_server is NULL, assume this is a
>> +         *   race between the unbinding of ioreq server and guest fault
>> +         *   so re-try the instruction.
>> +         *
>> +         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
>> +         *   like a normal PIO or MMIO that doesn't have an ioreq
>> +         *   server (i.e., by ignoring it).
>> +         */
>> +        struct hvm_ioreq_server *s = NULL;
>> +        p2m_type_t p2mt = p2m_invalid;
>> +
>> +        if ( is_mmio )
>> +        {
>> +            unsigned long gmfn = paddr_to_pfn(addr);
>> +
>> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
> Stray cast.

OK. Will remove it.

>
>> +            if ( p2mt == p2m_ioreq_server )
>> +            {
>> +                unsigned int flags;
>> +
>> +                s = p2m_get_ioreq_server(currd, &flags);
>> +
>> +                /*
>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
>> +                 * we probably lost a race with unbinding of ioreq
>> +                 * server, just retry the access.
>> +                 */
>> +                if ( s == NULL )
>> +                {
>> +                    rc = X86EMUL_RETRY;
>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>> +                    break;
>> +                }
>> +
>> +                /*
>> +                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
>> +                 * we should set s to NULL, and just ignore such
>> +                 * access.
>> +                 */
>> +                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
>> +                    s = NULL;
> What is this about? You only allow WRITE registrations, so this looks
> to be dead code. Yet if it is meant to guard against future enabling
> of READ, then this clearly should not be done for reads.

It's to guard against future emulation of READ. We can remove it for now.

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>               entry->r = entry->w = entry->x = 1;
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>> +        case p2m_ioreq_server:
>> +            entry->r = 1;
>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> Along the lines of the previous comment - if you mean to have the
> code cope with READ, please also set ->r accordingly, or add a
> comment why this won't have the intended effect (yielding a not
> present EPTE).

How about we keep this code and do not support READ? I'll remove above 
dead code in hvmemul_do_io().

>> @@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>       default:
>>           return flags | _PAGE_NX_BIT;
>>       case p2m_grant_map_ro:
>> -    case p2m_ioreq_server:
>>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>> +    case p2m_ioreq_server:
>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
>> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
>> +            return flags & ~_PAGE_RW;
>> +        else
>> +            return flags;
> Stray else. But even better would imo be
>
>          if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
>              flags &= ~_PAGE_RW;
>          return flags;
Oh. Thanks. :)

>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>> +                                              unsigned int *flags)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    struct hvm_ioreq_server *s;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    s = p2m->ioreq.server;
>> +    *flags = p2m->ioreq.flags;
>> +
>> +    spin_unlock(&p2m->ioreq.lock);
>> +    return s;
>> +}
> I'm afraid this question was asked before, but since there's no
> comment here or anywhere, I can't recall if there was a reason why
> s potentially being stale by the time the caller looks at it is not a
> problem.

Well, it is possibe that s is stale. I did not take it as a problem 
because the device model
will later discard such io request. And I believe current 
hvm_select_ioreq_server() also
has the same issue - the returned s should be considered to be stale, if 
the MMIO/PIO
address is removed from the ioreq server's rangeset.

Another thought is, if you think it is inappropriate for device model to 
do the check,
we can use spin_lock_recursive on ioreq_server.lock to protect all the 
ioreq server select
and release the lock after the ioreq server is sent out.

>> --- a/xen/include/public/hvm/dm_op.h
>> +++ b/xen/include/public/hvm/dm_op.h
>> @@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi {
>>       uint64_aligned_t addr;
>>   };
>>   
>> +/*
>> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
>> + *                                      to specific memroy type <type>
>> + *                                      for specific accesses <flags>
>> + *
>> + * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
>> + * which means only write operations are to be forwarded to an ioreq server.
>> + * Support for the emulation of read operations can be added when an ioreq
>> + * server has such requirement in future.
>> + */
>> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
>> +
>> +struct xen_dm_op_map_mem_type_to_ioreq_server {
>> +    ioservid_t id;      /* IN - ioreq server id */
>> +    uint16_t type;      /* IN - memory type */
>> +    uint16_t pad;
> Perhaps there was padding needed when this was a hvmop, but
> now the padding does exactly the wrong thing.

Right, padding is useless in this new structure. I will remove it if 
other field does not change(I proposed
a change for flag field in reply to Andrew's comments on patch 5). Thank 
you, Jan.

Yu
>
> Jan
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel