Re: [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server

From: "Jan Beulich" <JBeulich@suse.com>
To: Paul Durrant <paul.durrant@citrix.com>,
	Zhang Yu <yu.c.zhang@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com,
	Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
Date: Mon, 11 Apr 2016 10:31:58 -0600	[thread overview]
Message-ID: <570BDF8E02000078000E63F1@prv-mh.provo.novell.com> (raw)
In-Reply-To: <570B871E.7040703@linux.intel.com>

>>> On 11.04.16 at 13:14, <yu.c.zhang@linux.intel.com> wrote:
> On 4/9/2016 6:28 AM, Jan Beulich wrote:
>>>>> On 31.03.16 at 12:53, <yu.c.zhang@linux.intel.com> wrote:
>>> @@ -168,13 +226,72 @@ static int hvmemul_do_io(
>>>           break;
>>>       case X86EMUL_UNHANDLEABLE:
>>>       {
>>> -        struct hvm_ioreq_server *s =
>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>> +        struct hvm_ioreq_server *s;
>>> +        p2m_type_t p2mt;
>>> +
>>> +        if ( is_mmio )
>>> +        {
>>> +            unsigned long gmfn = paddr_to_pfn(addr);
>>> +
>>> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
>>> +
>>> +            switch ( p2mt )
>>> +            {
>>> +                case p2m_ioreq_server:
>>> +                {
>>> +                    unsigned long flags;
>>> +
>>> +                    p2m_get_ioreq_server(currd, &flags, &s);
>>
>> As the function apparently returns no value right now, please avoid
>> the indirection on both values you're after - one of the two
>> (presumably s) can be the function's return value.
> 
> Well, current implementation of p2m_get_ioreq_server() has spin_lock/
> spin_unlock surrounding the reading of flags and the s, but I believe
> we can also use the s as return value.

The use of a lock inside the function has nothing to do with how it
returns values to the caller.

>>>           /* If there is no suitable backing DM, just ignore accesses */
>>>           if ( !s )
>>>           {
>>> -            rc = hvm_process_io_intercept(&null_handler, &p);
>>> +            switch ( p2mt )
>>> +            {
>>> +            case p2m_ioreq_server:
>>> +            /*
>>> +             * Race conditions may exist when access to a gfn with
>>> +             * p2m_ioreq_server is intercepted by hypervisor, during
>>> +             * which time p2m type of this gfn is recalculated back
>>> +             * to p2m_ram_rw. mem_handler is used to handle this
>>> +             * corner case.
>>> +             */
>>
>> Now if there is such a race condition, the race could also be with a
>> page changing first to ram_rw and then immediately further to e.g.
>> ram_ro. See the earlier comment about assuming the page to be
>> writable.
>>
> 
> Thanks, Jan. After rechecking the code, I suppose the race condition
> will not happen. In hvmemul_do_io(), get_gfn_query_unlocked() is used
> to peek the p2mt for the gfn, but get_gfn_type_access() is called inside
> hvm_hap_nested_page_fault(), and this will guarantee no p2m change shall
> occur during the emulation.
> Is this understanding correct?

Ah, yes, I think so. So the comment is misleading.

>>> +static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
>>> +                                            ioservid_t id,
>>> +                                            hvmmem_type_t type,
>>> +                                            uint32_t flags)
>>> +{
>>> +    struct hvm_ioreq_server *s;
>>> +    int rc;
>>> +
>>> +    /* For now, only HVMMEM_ioreq_server is supported */
>>> +    if ( type != HVMMEM_ioreq_server )
>>> +        return -EINVAL;
>>> +
>>> +    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
>>> +                   HVMOP_IOREQ_MEM_ACCESS_WRITE) )
>>> +        return -EINVAL;
>>> +
>>> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
>>> +
>>> +    rc = -ENOENT;
>>> +    list_for_each_entry ( s,
>>> +                          &d->arch.hvm_domain.ioreq_server.list,
>>> +                          list_entry )
>>> +    {
>>> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
>>> +            continue;
>>> +
>>> +        if ( s->id == id )
>>> +        {
>>> +            rc = p2m_set_ioreq_server(d, flags, s);
>>> +            if ( rc == 0 )
>>> +                gdprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
>>> +                         s->id, (flags != 0) ? "mapped to" : "unmapped from");
>>
>> Why gdprintk()? I don't think the current domain is of much
>> interest here. What would be of interest is the subject domain.
>>
> 
> s->id is not the domain_id, but id of the ioreq server.

That's understood. But gdprintk() itself logs the current domain,
which isn't as useful as the subject one.

>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -132,6 +132,19 @@ static void ept_p2m_type_to_flags(struct p2m_domain
>>> *p2m, ept_entry_t *entry,
>>>               entry->r = entry->w = entry->x = 1;
>>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>               break;
>>> +        case p2m_ioreq_server:
>>> +            entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS);
>>> +	    /*
>>> +	     * write access right is disabled when entry->r is 0, but whether
>>> +	     * write accesses are emulated by hypervisor or forwarded to an
>>> +	     * ioreq server depends on the setting of p2m->ioreq.flags.
>>> +	     */
>>> +            entry->w = (entry->r &&
>>> +                        !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
>>> +            entry->x = entry->r;
>>
>> Why would we want to allow instruction execution from such pages?
>> And with all three bits now possibly being clear, aren't we risking the
>> entries to be mis-treated as not-present ones?
>>
> 
> Hah. You got me. Thanks! :)
> Now I realized it would be difficult if we wanna to emulate the read
> operations for HVM. According to Intel mannual, entry->r is to be
> cleared, so should entry->w if we do not want ept misconfig. And
> with both read and write permissions being forbidden, entry->x can be
> set only on processors with EXECUTE_ONLY capability.
> To avoid any entry to be mis-treated as not-present. We have several
> solutions:
> a> do not support the read emulation for now - we have no such usage
> case;
> b> add the check of p2m_t against p2m_ioreq_server in is_epte_present -
> a bit weird to me.
> Which one do you prefer? or any other suggestions?

That question would also need to be asked to others who had
suggested supporting both. I'd be fine with a, but I also don't view
b as too awkward.

>>> +    /*
>>> +     * Each time we map/unmap an ioreq server to/from p2m_ioreq_server,
>>> +     * we mark the p2m table to be recalculated, so that gfns which were
>>> +     * previously marked with p2m_ioreq_server can be resynced.
>>> +     */
>>> +    p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>
>> What does "resynced" here mean? I.e. I can see why this is wanted
>> when unmapping a server, but when mapping a server there shouldn't
>> be any such pages in the first place.
>>
> 
> There shouldn't be. But if there is(misbehavior from the device model
> side), it can be recalculated back to p2m_ram_rw(which is not quite
> necessary as the unmapping case).

DM misbehavior should not result in such a problem - the hypervisor
should refuse any bad requests.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel