xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: Jan Beulich <JBeulich@suse.com>, Paul Durrant <paul.durrant@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Keir Fraser <keir@xen.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	xen-devel@lists.xen.org, zhiyuan.lv@intel.com,
	Jun Nakajima <jun.nakajima@intel.com>
Subject: Re: [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
Date: Tue, 12 Apr 2016 17:37:38 +0800	[thread overview]
Message-ID: <570CC1E2.1010408@linux.intel.com> (raw)
In-Reply-To: <570BDF8E02000078000E63F1@prv-mh.provo.novell.com>



On 4/12/2016 12:31 AM, Jan Beulich wrote:
>>>> On 11.04.16 at 13:14, <yu.c.zhang@linux.intel.com> wrote:
>> On 4/9/2016 6:28 AM, Jan Beulich wrote:
>>>>>> On 31.03.16 at 12:53, <yu.c.zhang@linux.intel.com> wrote:
>>>> @@ -168,13 +226,72 @@ static int hvmemul_do_io(
>>>>            break;
>>>>        case X86EMUL_UNHANDLEABLE:
>>>>        {
>>>> -        struct hvm_ioreq_server *s =
>>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>>> +        struct hvm_ioreq_server *s;
>>>> +        p2m_type_t p2mt;
>>>> +
>>>> +        if ( is_mmio )
>>>> +        {
>>>> +            unsigned long gmfn = paddr_to_pfn(addr);
>>>> +
>>>> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
>>>> +
>>>> +            switch ( p2mt )
>>>> +            {
>>>> +                case p2m_ioreq_server:
>>>> +                {
>>>> +                    unsigned long flags;
>>>> +
>>>> +                    p2m_get_ioreq_server(currd, &flags, &s);
>>>
>>> As the function apparently returns no value right now, please avoid
>>> the indirection on both values you're after - one of the two
>>> (presumably s) can be the function's return value.
>>
>> Well, current implementation of p2m_get_ioreq_server() has spin_lock/
>> spin_unlock surrounding the reading of flags and the s, but I believe
>> we can also use the s as return value.
>
> The use of a lock inside the function has nothing to do with how it
> returns values to the caller.
>

Agree. I'll use s as return value then.

>>>>            /* If there is no suitable backing DM, just ignore accesses */
>>>>            if ( !s )
>>>>            {
>>>> -            rc = hvm_process_io_intercept(&null_handler, &p);
>>>> +            switch ( p2mt )
>>>> +            {
>>>> +            case p2m_ioreq_server:
>>>> +            /*
>>>> +             * Race conditions may exist when access to a gfn with
>>>> +             * p2m_ioreq_server is intercepted by hypervisor, during
>>>> +             * which time p2m type of this gfn is recalculated back
>>>> +             * to p2m_ram_rw. mem_handler is used to handle this
>>>> +             * corner case.
>>>> +             */
>>>
>>> Now if there is such a race condition, the race could also be with a
>>> page changing first to ram_rw and then immediately further to e.g.
>>> ram_ro. See the earlier comment about assuming the page to be
>>> writable.
>>>
>>
>> Thanks, Jan. After rechecking the code, I suppose the race condition
>> will not happen. In hvmemul_do_io(), get_gfn_query_unlocked() is used
>> to peek the p2mt for the gfn, but get_gfn_type_access() is called inside
>> hvm_hap_nested_page_fault(), and this will guarantee no p2m change shall
>> occur during the emulation.
>> Is this understanding correct?
>
> Ah, yes, I think so. So the comment is misleading.
>

I'll remove the comment, together with the p2m_ram_rw case. Thanks. :)

>>>> +static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
>>>> +                                            ioservid_t id,
>>>> +                                            hvmmem_type_t type,
>>>> +                                            uint32_t flags)
>>>> +{
>>>> +    struct hvm_ioreq_server *s;
>>>> +    int rc;
>>>> +
>>>> +    /* For now, only HVMMEM_ioreq_server is supported */
>>>> +    if ( type != HVMMEM_ioreq_server )
>>>> +        return -EINVAL;
>>>> +
>>>> +    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
>>>> +                   HVMOP_IOREQ_MEM_ACCESS_WRITE) )
>>>> +        return -EINVAL;
>>>> +
>>>> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
>>>> +
>>>> +    rc = -ENOENT;
>>>> +    list_for_each_entry ( s,
>>>> +                          &d->arch.hvm_domain.ioreq_server.list,
>>>> +                          list_entry )
>>>> +    {
>>>> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
>>>> +            continue;
>>>> +
>>>> +        if ( s->id == id )
>>>> +        {
>>>> +            rc = p2m_set_ioreq_server(d, flags, s);
>>>> +            if ( rc == 0 )
>>>> +                gdprintk(XENLOG_DEBUG, "%u %s type HVMMEM_ioreq_server.\n",
>>>> +                         s->id, (flags != 0) ? "mapped to" : "unmapped from");
>>>
>>> Why gdprintk()? I don't think the current domain is of much
>>> interest here. What would be of interest is the subject domain.
>>>
>>
>> s->id is not the domain_id, but id of the ioreq server.
>
> That's understood. But gdprintk() itself logs the current domain,
> which isn't as useful as the subject one.
>

Oh, I see. So the correct routine here should be dprintk(), right?

>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -132,6 +132,19 @@ static void ept_p2m_type_to_flags(struct p2m_domain
>>>> *p2m, ept_entry_t *entry,
>>>>                entry->r = entry->w = entry->x = 1;
>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>                break;
>>>> +        case p2m_ioreq_server:
>>>> +            entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS);
>>>> +	    /*
>>>> +	     * write access right is disabled when entry->r is 0, but whether
>>>> +	     * write accesses are emulated by hypervisor or forwarded to an
>>>> +	     * ioreq server depends on the setting of p2m->ioreq.flags.
>>>> +	     */
>>>> +            entry->w = (entry->r &&
>>>> +                        !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
>>>> +            entry->x = entry->r;
>>>
>>> Why would we want to allow instruction execution from such pages?
>>> And with all three bits now possibly being clear, aren't we risking the
>>> entries to be mis-treated as not-present ones?
>>>
>>
>> Hah. You got me. Thanks! :)
>> Now I realized it would be difficult if we wanna to emulate the read
>> operations for HVM. According to Intel mannual, entry->r is to be
>> cleared, so should entry->w if we do not want ept misconfig. And
>> with both read and write permissions being forbidden, entry->x can be
>> set only on processors with EXECUTE_ONLY capability.
>> To avoid any entry to be mis-treated as not-present. We have several
>> solutions:
>> a> do not support the read emulation for now - we have no such usage
>> case;
>> b> add the check of p2m_t against p2m_ioreq_server in is_epte_present -
>> a bit weird to me.
>> Which one do you prefer? or any other suggestions?
>
> That question would also need to be asked to others who had
> suggested supporting both. I'd be fine with a, but I also don't view
> b as too awkward.
>

According to Intel mannual, an entry is regarded as not present, if
bit0:2 is 0. So adding a p2m type check in is_epte_present() means we
will change its semantics, if this is acceptable(with no hurt to
hypervisor). I'd prefer option b>

Does anyone else have different suggestions other than b> ?


>>>> +    /*
>>>> +     * Each time we map/unmap an ioreq server to/from p2m_ioreq_server,
>>>> +     * we mark the p2m table to be recalculated, so that gfns which were
>>>> +     * previously marked with p2m_ioreq_server can be resynced.
>>>> +     */
>>>> +    p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>
>>> What does "resynced" here mean? I.e. I can see why this is wanted
>>> when unmapping a server, but when mapping a server there shouldn't
>>> be any such pages in the first place.
>>>
>>
>> There shouldn't be. But if there is(misbehavior from the device model
>> side), it can be recalculated back to p2m_ram_rw(which is not quite
>> necessary as the unmapping case).
>
> DM misbehavior should not result in such a problem - the hypervisor
> should refuse any bad requests.
>
OK. I can add code to guarantee from hypervisor side no entries changed
to p2m_ioreq_server before the mapping happens.

B.R.
Yu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-04-12  9:37 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-31 10:53 [PATCH v2 0/3] x86/ioreq server: introduce HVMMEM_ioreq_server mem type Yu Zhang
2016-03-31 10:53 ` [PATCH v2 1/3] x86/ioreq server: Add new functions to get/set memory types Yu Zhang
2016-04-05 13:57   ` George Dunlap
2016-04-05 14:08     ` George Dunlap
2016-04-08 13:25   ` Andrew Cooper
2016-03-31 10:53 ` [PATCH v2 2/3] x86/ioreq server: Rename p2m_mmio_write_dm to p2m_ioreq_server Yu Zhang
2016-04-05 14:38   ` George Dunlap
2016-04-08 13:26   ` Andrew Cooper
2016-04-08 21:48   ` Jan Beulich
2016-04-18  8:41     ` Paul Durrant
2016-04-18  9:10       ` George Dunlap
2016-04-18  9:14         ` Wei Liu
2016-04-18  9:45           ` Paul Durrant
2016-04-18 16:40       ` Jan Beulich
2016-04-18 16:45         ` Paul Durrant
2016-04-18 16:47           ` Jan Beulich
2016-04-18 16:58             ` Paul Durrant
2016-04-19 11:02               ` Yu, Zhang
2016-04-19 11:15                 ` Paul Durrant
2016-04-19 11:38                   ` Yu, Zhang
2016-04-19 11:50                     ` Paul Durrant
2016-04-19 16:51                     ` Jan Beulich
2016-04-20 14:59                       ` Wei Liu
2016-04-20 15:02                 ` George Dunlap
2016-04-20 16:30                   ` George Dunlap
2016-04-20 16:52                     ` Jan Beulich
2016-04-20 16:58                       ` Paul Durrant
2016-04-20 17:06                         ` George Dunlap
2016-04-20 17:09                           ` Paul Durrant
2016-04-21 12:24                           ` Yu, Zhang
2016-04-21 13:31                             ` Paul Durrant
2016-04-21 13:48                               ` Yu, Zhang
2016-04-21 13:56                                 ` Paul Durrant
2016-04-21 14:09                                   ` George Dunlap
2016-04-20 17:08                       ` George Dunlap
2016-04-21 12:04                       ` Yu, Zhang
2016-03-31 10:53 ` [PATCH v2 3/3] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
     [not found]   ` <20160404082556.GC28633@deinos.phlegethon.org>
2016-04-05  6:01     ` Yu, Zhang
2016-04-06 17:13   ` George Dunlap
2016-04-07  7:01     ` Yu, Zhang
     [not found]       ` <CAFLBxZbLp2zWzCzQTaJNWbanQSmTJ57ZyTh0qaD-+YUn8o8pyQ@mail.gmail.com>
2016-04-08 10:39         ` George Dunlap
     [not found]         ` <5707839F.9060803@linux.intel.com>
2016-04-08 11:01           ` George Dunlap
2016-04-11 11:15             ` Yu, Zhang
2016-04-14 10:45               ` Yu, Zhang
2016-04-18 15:57                 ` Paul Durrant
2016-04-19  9:11                   ` Yu, Zhang
2016-04-19  9:21                     ` Paul Durrant
2016-04-19  9:44                       ` Yu, Zhang
2016-04-19 10:05                         ` Paul Durrant
2016-04-19 11:17                           ` Yu, Zhang
2016-04-19 11:47                             ` Paul Durrant
2016-04-19 11:59                               ` Yu, Zhang
2016-04-20 14:50                                 ` George Dunlap
2016-04-20 14:57                                   ` Paul Durrant
2016-04-20 15:37                                     ` George Dunlap
2016-04-20 16:30                                       ` Paul Durrant
2016-04-20 16:58                                         ` George Dunlap
2016-04-21 13:28                                         ` Yu, Zhang
2016-04-21 13:21                                   ` Yu, Zhang
2016-04-22 11:27                                     ` Wei Liu
2016-04-22 11:30                                       ` George Dunlap
2016-04-19  4:37                 ` Tian, Kevin
2016-04-19  9:21                   ` Yu, Zhang
2016-04-08 13:33   ` Andrew Cooper
2016-04-11 11:14     ` Yu, Zhang
2016-04-11 12:20       ` Andrew Cooper
2016-04-11 16:25         ` Jan Beulich
2016-04-08 22:28   ` Jan Beulich
2016-04-11 11:14     ` Yu, Zhang
2016-04-11 16:31       ` Jan Beulich
2016-04-12  9:37         ` Yu, Zhang [this message]
2016-04-12 15:08           ` Jan Beulich
2016-04-14  9:56             ` Yu, Zhang
2016-04-19  4:50               ` Tian, Kevin
2016-04-19  8:46                 ` Paul Durrant
2016-04-19  9:27                   ` Yu, Zhang
2016-04-19  9:40                     ` Paul Durrant
2016-04-19  9:49                       ` Yu, Zhang
2016-04-19 10:01                         ` Paul Durrant
2016-04-19  9:54                           ` Yu, Zhang
2016-04-19  9:15                 ` Yu, Zhang
2016-04-19  9:23                   ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570CC1E2.1010408@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=paul.durrant@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).