Re: [PATCH 3/3] Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server

From: Paul Durrant <Paul.Durrant@citrix.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: Kevin Tian <kevin.tian@intel.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	"jbeulich@suse.com" <jbeulich@suse.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"Tim (Xen.org)" <tim@xen.org>,
	George Dunlap <George.Dunlap@citrix.com>,
	"zhiyuan.lv@intel.com" <zhiyuan.lv@intel.com>,
	"jun.nakajima@intel.com" <jun.nakajima@intel.com>
Subject: Re: [PATCH 3/3] Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
Date: Tue, 22 Mar 2016 17:51:12 +0000	[thread overview]
Message-ID: <a52cfdf42b1945868ae157844abaf060@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <56F1805C.6060604@citrix.com>

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@citrix.com]
> Sent: 22 March 2016 17:27
> To: Yu Zhang; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); jbeulich@suse.com; Andrew Cooper; George Dunlap;
> jun.nakajima@intel.com; Kevin Tian; Tim (Xen.org); Paul Durrant;
> zhiyuan.lv@intel.com
> Subject: Re: [PATCH 3/3] Add HVMOP to map guest ram with
> p2m_ioreq_server to an ioreq server
> 
> On 16/03/16 12:22, Yu Zhang wrote:
> > A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
> > let one ioreq server claim its responsibility for the handling
> > of guest pages with p2m type p2m_ioreq_server. Users of this
> > HVMOP can specify whether the p2m_ioreq_server is supposed to
> > handle write accesses or read ones in a flag. By now, we only
> > support one ioreq server for this p2m type, so once an ioreq
> > server has claimed its ownership, following calls of the
> > HVMOP_map_mem_type_to_ioreq_server will fail.
> >
> > Note: this HVMOP does not change the p2m type of any guest ram
> > page, until the HVMOP_set_mem_type is triggered. So normally
> > the steps would be the backend driver first claims its ownership
> > of guest ram pages with p2m_ioreq_server type. At then sets the
> > memory type to p2m_ioreq_server for specified guest ram pages.
> 
> Yu, thanks for this work.  I think it's heading in the right direction.
> 
> A couple of comments:
> 
> There's not much documentation in the code about how this is expected to
> be used.
> 
> For instance, having separate flags seems to imply that you can
> independently select either read intercept, write intercept, or both;
> but [ept_]p2m_type_to_flags() seems to assume that READ_ACCESS implies
> WRITE_ACCESS.  Do you plan to implement them separately in the future?
> If not, would it be better to make the interface an enum instead?
> 
> At very least it should be documented that READ_ACCESS implies
> WRITE_ACCESS.

That's not true. If WRITE_ACCESS has not been requested then writes are handled directly in Xen rather than being forwarded to the ioreq server. If h/w were to allow pages to be marked write-only then we wouldn't need to do that.

> 
> Calling the function with no flags appears to mean, "Detach the ioreq
> server from this type" -- this should also be documented.
> 

That was documented in the design, but it's true that it does need to pulled into comments.

> Furthermore, you appear to deal with p2m_ioreq_server types when there
> is no ioreq server by making the "flags=0" case RW in
> [ept_]p2m_type_to_flags.  This should be spelled out somewhere (probably
> in the p2m structure definition).
> 
> Finally, you call p2m_memory_type_changed() when changing the ioreq
> server; but this appears to be only implemented for ept; meaning that if
> you're either running HAP on AMD or running in shadow mode, if the
> caller isn't careful (i.e., doesn't remove all p2m_ioreq_server types
> before removing itself as a server), they may end up with bogus
> permissions left over in the p2m table.
> 
> Maybe you should check for the existence of p2m->memory_type_changed
> and
> return -ENOTSUPP or something if it's not present?
> 
> More comments inline...
> 
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> 
> > @@ -168,13 +226,65 @@ static int hvmemul_do_io(
> >          break;
> >      case X86EMUL_UNHANDLEABLE:
> >      {
> > -        struct hvm_ioreq_server *s =
> > -            hvm_select_ioreq_server(curr->domain, &p);
> > +        struct hvm_ioreq_server *s;
> > +        p2m_type_t p2mt;
> > +
> > +        if ( is_mmio )
> > +        {
> > +            unsigned long gmfn = paddr_to_pfn(addr);
> > +
> > +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
> > +
> > +            switch ( p2mt )
> > +            {
> > +                case p2m_ioreq_server:
> > +                {
> > +                    unsigned long flags;
> > +
> > +                    p2m_get_ioreq_server(currd, p2mt, &flags, &s);
> > +
> > +                    if ( !s )
> > +                        break;
> > +
> > +                    if ( (dir == IOREQ_READ &&
> > +                          !(flags & P2M_IOREQ_HANDLE_READ_ACCESS)) ||
> > +                         (dir == IOREQ_WRITE &&
> > +                          !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS)) )
> > +                        s = NULL;
> > +
> > +                    break;
> > +                }
> > +                case p2m_ram_rw:
> > +                    s = NULL;
> > +                    break;
> > +
> > +                default:
> > +                    s = hvm_select_ioreq_server(currd, &p);
> > +                    break;
> > +            }
> > +        }
> > +        else
> > +        {
> > +            p2mt = p2m_invalid;
> > +
> > +            s = hvm_select_ioreq_server(currd, &p);
> > +        }
> >
> >          /* If there is no suitable backing DM, just ignore accesses */
> >          if ( !s )
> >          {
> > -            rc = hvm_process_io_intercept(&null_handler, &p);
> > +            switch ( p2mt )
> > +            {
> > +            case p2m_ioreq_server:
> > +            case p2m_ram_rw:
> > +                rc = hvm_process_io_intercept(&mem_handler, &p);
> > +                break;
> > +
> > +            default:
> > +                rc = hvm_process_io_intercept(&null_handler, &p);
> > +                break;
> > +            }
> 
> Is it actually possible to get here with "is_mmio" true and p2mt ==
> p2m_ram_rw?

I think that's possible if the type change races with an access.

> 
> If so, it would appear that this changes the handling in that case.
> Before this patch, on p2m_ram_rw, you'd call hvm_select_ioreq_server(),
> which would either give you a server (perhaps the default one), or you'd
> be called with null_handler.
> 
> After this patch, on p2m_ram_rw, you'll always be called with mem_handler.
> 
> If this is an intentional change, you need to explan why in the
> changelog (and probably also a comment here).
> 

That is an intentional change.

> > @@ -1411,6 +1413,55 @@ static int
> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> >      return rc;
> >  }
> >
> > +static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
> > +                                            ioservid_t id,
> > +                                            hvmmem_type_t type,
> > +                                            uint32_t flags)
> > +{
> > +    struct hvm_ioreq_server *s;
> > +    p2m_type_t p2mt;
> > +    int rc;
> > +
> > +    switch ( type )
> > +    {
> > +    case HVMMEM_ioreq_server:
> > +        p2mt = p2m_ioreq_server;
> > +        break;
> > +
> > +    default:
> > +        return -EINVAL;
> > +    }
> > +
> > +    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
> > +                   HVMOP_IOREQ_MEM_ACCESS_WRITE) )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        if ( s->id == id )
> > +        {
> > +            rc = p2m_set_ioreq_server(d, p2mt, flags, s);
> > +            if ( rc )
> > +                break;
> > +
> > +            gdprintk(XENLOG_DEBUG, "%u claimed type p2m_ioreq_server\n",
> > +                     s->id);
> 
> Don't we want to break out of the loop if p2m_set_ioreq_server()
> succeeds as well?  Or do we really want to go check all the other ioreq
> servers to see if their ID matches too? :-)

No, the break should be unconditional.

> 
> > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> > index 380ec25..21e04ce 100644
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -132,6 +132,14 @@ static void ept_p2m_type_to_flags(struct
> p2m_domain *p2m, ept_entry_t *entry,
> >              entry->r = entry->w = entry->x = 1;
> >              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >              break;
> > +        case p2m_ioreq_server:
> > +            entry->r = !(p2m->ioreq.flags &
> P2M_IOREQ_HANDLE_READ_ACCESS);
> > +            entry->w = (entry->r &
> > +                        !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
> > +            entry->x = entry->r;
> > +            entry->a = !!cpu_has_vmx_ept_ad;
> > +            entry->d = 0;
> > +            break;
> 
> In line with the other types, would it make sense for entry->d here to
> be entry->w && cpu_has_vmx_ept_ad?
> 
> 
> > @@ -289,6 +291,83 @@ void p2m_memory_type_changed(struct domain
> *d)
> >      }
> >  }
> >
> > +int p2m_set_ioreq_server(struct domain *d, p2m_type_t t,
> > +                         unsigned long flags,
> > +                         struct hvm_ioreq_server *s)
> > +{
> > +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> > +    int rc;
> > +
> > +    spin_lock(&p2m->ioreq.lock);
> > +
> > +    rc = -EINVAL;
> > +    if ( t != p2m_ioreq_server )
> > +        goto out;
> 
> Since these are all internal interfaces, I'm not sure what the point is
> of passing this type in when we only accept one type anyway.  If we're
> going to have a type in the interface but only accept one type, we
> should just check it at the top level and assume p2m_ioreq_server all
> the way through (until such time as we accept a second type).
> 

OK. It was there for future expansion, but that could be added later.

> > +
> > +    rc = -EBUSY;
> > +    if ( ( p2m->ioreq.server != NULL ) && (flags != 0) )
> > +        goto out;
> > +
> > +    if ( flags == 0 )
> > +    {
> > +        p2m->ioreq.server = NULL;
> > +        p2m->ioreq.flags = 0;
> > +    }
> > +    else
> > +    {
> > +        p2m->ioreq.server = s;
> > +        p2m->ioreq.flags = flags;
> > +    }
> 
> I think it would be a bit more robust to
> 
> 1) Require callers to always specify (their own) server ID when making
> this call, even if they're un-claiming the p2m type
> 
> 2) When un-claiming, check to make sure that the server id of the caller
> matches the server id that's being detached.
> 

Ok. Fair enough.

> > +
> > +    p2m_memory_type_changed(d);
> > +
> > +    rc = 0;
> > +
> > +out:
> > +    spin_unlock(&p2m->ioreq.lock);
> > +
> > +    return rc;
> > +}
> > +
> > +void p2m_get_ioreq_server(struct domain *d, p2m_type_t t,
> > +                          unsigned long *flags,
> > +                          struct hvm_ioreq_server **s)
> > +{
> > +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> > +
> > +    if ( t != p2m_ioreq_server )
> > +    {
> > +        *s = NULL;
> > +        *flags = 0;
> > +        return;
> > +    }
> > +
> > +    spin_lock(&p2m->ioreq.lock);
> > +
> > +    *s = p2m->ioreq.server;
> > +    *flags = p2m->ioreq.flags;
> > +
> > +    spin_unlock(&p2m->ioreq.lock);
> > +}
> > +
> > +void p2m_destroy_ioreq_server(struct domain *d,
> > +                              struct hvm_ioreq_server *s)
> > +{
> > +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> > +
> > +    spin_lock(&p2m->ioreq.lock);
> > +
> > +    if ( p2m->ioreq.server == s )
> > +    {
> > +        p2m->ioreq.server = NULL;
> > +        p2m->ioreq.flags = 0;
> > +    }
> > +
> > +    p2m_memory_type_changed(d);
> 
> Do we really want to call this unconditionally, even if the server being
> destroyed here isn't the server associated with p2m_ioreq_server?  It
> shouldn't have a functional impact, but it will slow things down as the
> ept tables are unnecessarily rebuilt.
> 

True. That should be conditional.

> > diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> > index a1eae52..e7fc75d 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -489,6 +489,30 @@ struct xen_hvm_altp2m_op {
> >  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
> >
> > +#if defined(__XEN__) || defined(__XEN_TOOLS__)
> > +
> > +#define HVMOP_map_mem_type_to_ioreq_server 26
> > +struct xen_hvm_map_mem_type_to_ioreq_server {
> > +    domid_t domid;      /* IN - domain to be serviced */
> > +    ioservid_t id;      /* IN - server id */
> > +    hvmmem_type_t type; /* IN - memory type */
> 
> What is this 'type' for?  Do you expect to map other types of memory
> this way in the future?
> 

As stated in the design, yes.

  Paul

>  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel