Re: [PATCH 3/3] Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server

From: George Dunlap <george.dunlap@citrix.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>, xen-devel@lists.xen.org
Cc: kevin.tian@intel.com, keir@xen.org, jbeulich@suse.com,
	george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	tim@xen.org, Paul.Durrant@citrix.com, zhiyuan.lv@intel.com,
	jun.nakajima@intel.com
Subject: Re: [PATCH 3/3] Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server
Date: Tue, 22 Mar 2016 17:26:52 +0000	[thread overview]
Message-ID: <56F1805C.6060604@citrix.com> (raw)
In-Reply-To: <1458130960-30109-1-git-send-email-yu.c.zhang@linux.intel.com>

On 16/03/16 12:22, Yu Zhang wrote:
> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
> let one ioreq server claim its responsibility for the handling
> of guest pages with p2m type p2m_ioreq_server. Users of this
> HVMOP can specify whether the p2m_ioreq_server is supposed to
> handle write accesses or read ones in a flag. By now, we only
> support one ioreq server for this p2m type, so once an ioreq
> server has claimed its ownership, following calls of the
> HVMOP_map_mem_type_to_ioreq_server will fail.
> 
> Note: this HVMOP does not change the p2m type of any guest ram
> page, until the HVMOP_set_mem_type is triggered. So normally
> the steps would be the backend driver first claims its ownership
> of guest ram pages with p2m_ioreq_server type. At then sets the
> memory type to p2m_ioreq_server for specified guest ram pages.

Yu, thanks for this work.  I think it's heading in the right direction.

A couple of comments:

There's not much documentation in the code about how this is expected to
be used.

For instance, having separate flags seems to imply that you can
independently select either read intercept, write intercept, or both;
but [ept_]p2m_type_to_flags() seems to assume that READ_ACCESS implies
WRITE_ACCESS.  Do you plan to implement them separately in the future?
If not, would it be better to make the interface an enum instead?

At very least it should be documented that READ_ACCESS implies WRITE_ACCESS.

Calling the function with no flags appears to mean, "Detach the ioreq
server from this type" -- this should also be documented.

Furthermore, you appear to deal with p2m_ioreq_server types when there
is no ioreq server by making the "flags=0" case RW in
[ept_]p2m_type_to_flags.  This should be spelled out somewhere (probably
in the p2m structure definition).

Finally, you call p2m_memory_type_changed() when changing the ioreq
server; but this appears to be only implemented for ept; meaning that if
you're either running HAP on AMD or running in shadow mode, if the
caller isn't careful (i.e., doesn't remove all p2m_ioreq_server types
before removing itself as a server), they may end up with bogus
permissions left over in the p2m table.

Maybe you should check for the existence of p2m->memory_type_changed and
return -ENOTSUPP or something if it's not present?

More comments inline...

> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>

> @@ -168,13 +226,65 @@ static int hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> -        struct hvm_ioreq_server *s =
> -            hvm_select_ioreq_server(curr->domain, &p);
> +        struct hvm_ioreq_server *s;
> +        p2m_type_t p2mt;
> +
> +        if ( is_mmio )
> +        {
> +            unsigned long gmfn = paddr_to_pfn(addr);
> +
> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
> +
> +            switch ( p2mt )
> +            {
> +                case p2m_ioreq_server:
> +                {
> +                    unsigned long flags;
> +
> +                    p2m_get_ioreq_server(currd, p2mt, &flags, &s);
> +
> +                    if ( !s )
> +                        break;
> +
> +                    if ( (dir == IOREQ_READ &&
> +                          !(flags & P2M_IOREQ_HANDLE_READ_ACCESS)) ||
> +                         (dir == IOREQ_WRITE &&
> +                          !(flags & P2M_IOREQ_HANDLE_WRITE_ACCESS)) )
> +                        s = NULL;
> +
> +                    break;
> +                }
> +                case p2m_ram_rw:
> +                    s = NULL;
> +                    break;
> +
> +                default:
> +                    s = hvm_select_ioreq_server(currd, &p);
> +                    break;
> +            }
> +        }
> +        else
> +        {
> +            p2mt = p2m_invalid;
> +
> +            s = hvm_select_ioreq_server(currd, &p);
> +        }
>  
>          /* If there is no suitable backing DM, just ignore accesses */
>          if ( !s )
>          {
> -            rc = hvm_process_io_intercept(&null_handler, &p);
> +            switch ( p2mt )
> +            {
> +            case p2m_ioreq_server:
> +            case p2m_ram_rw:
> +                rc = hvm_process_io_intercept(&mem_handler, &p);
> +                break;
> +
> +            default:
> +                rc = hvm_process_io_intercept(&null_handler, &p);
> +                break;
> +            }

Is it actually possible to get here with "is_mmio" true and p2mt ==
p2m_ram_rw?

If so, it would appear that this changes the handling in that case.
Before this patch, on p2m_ram_rw, you'd call hvm_select_ioreq_server(),
which would either give you a server (perhaps the default one), or you'd
be called with null_handler.

After this patch, on p2m_ram_rw, you'll always be called with mem_handler.

If this is an intentional change, you need to explan why in the
changelog (and probably also a comment here).

> @@ -1411,6 +1413,55 @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +static int hvm_map_mem_type_to_ioreq_server(struct domain *d,
> +                                            ioservid_t id,
> +                                            hvmmem_type_t type,
> +                                            uint32_t flags)
> +{
> +    struct hvm_ioreq_server *s;
> +    p2m_type_t p2mt;
> +    int rc;
> +
> +    switch ( type )
> +    {
> +    case HVMMEM_ioreq_server:
> +        p2mt = p2m_ioreq_server;
> +        break;
> +
> +    default:
> +        return -EINVAL;
> +    }
> +
> +    if ( flags & ~(HVMOP_IOREQ_MEM_ACCESS_READ |
> +                   HVMOP_IOREQ_MEM_ACCESS_WRITE) )
> +        return -EINVAL;
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        if ( s->id == id )
> +        {
> +            rc = p2m_set_ioreq_server(d, p2mt, flags, s);
> +            if ( rc )
> +                break;
> +
> +            gdprintk(XENLOG_DEBUG, "%u claimed type p2m_ioreq_server\n",
> +                     s->id);

Don't we want to break out of the loop if p2m_set_ioreq_server()
succeeds as well?  Or do we really want to go check all the other ioreq
servers to see if their ID matches too? :-)

> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 380ec25..21e04ce 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -132,6 +132,14 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>              entry->r = entry->w = entry->x = 1;
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
> +        case p2m_ioreq_server:
> +            entry->r = !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_READ_ACCESS);
> +            entry->w = (entry->r &
> +                        !(p2m->ioreq.flags & P2M_IOREQ_HANDLE_WRITE_ACCESS));
> +            entry->x = entry->r;
> +            entry->a = !!cpu_has_vmx_ept_ad;
> +            entry->d = 0;
> +            break;

In line with the other types, would it make sense for entry->d here to
be entry->w && cpu_has_vmx_ept_ad?

> @@ -289,6 +291,83 @@ void p2m_memory_type_changed(struct domain *d)
>      }
>  }
>  
> +int p2m_set_ioreq_server(struct domain *d, p2m_type_t t,
> +                         unsigned long flags,
> +                         struct hvm_ioreq_server *s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int rc;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    rc = -EINVAL;
> +    if ( t != p2m_ioreq_server )
> +        goto out;

Since these are all internal interfaces, I'm not sure what the point is
of passing this type in when we only accept one type anyway.  If we're
going to have a type in the interface but only accept one type, we
should just check it at the top level and assume p2m_ioreq_server all
the way through (until such time as we accept a second type).

> +
> +    rc = -EBUSY;
> +    if ( ( p2m->ioreq.server != NULL ) && (flags != 0) )
> +        goto out;
> +
> +    if ( flags == 0 )
> +    {
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +    else
> +    {
> +        p2m->ioreq.server = s;
> +        p2m->ioreq.flags = flags;
> +    }

I think it would be a bit more robust to

1) Require callers to always specify (their own) server ID when making
this call, even if they're un-claiming the p2m type

2) When un-claiming, check to make sure that the server id of the caller
matches the server id that's being detached.

> +
> +    p2m_memory_type_changed(d);
> +
> +    rc = 0;
> +
> +out:
> +    spin_unlock(&p2m->ioreq.lock);
> +
> +    return rc;
> +}
> +
> +void p2m_get_ioreq_server(struct domain *d, p2m_type_t t,
> +                          unsigned long *flags,
> +                          struct hvm_ioreq_server **s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    if ( t != p2m_ioreq_server )
> +    {
> +        *s = NULL;
> +        *flags = 0;
> +        return;
> +    }
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    *s = p2m->ioreq.server;
> +    *flags = p2m->ioreq.flags;
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +}
> +
> +void p2m_destroy_ioreq_server(struct domain *d,
> +                              struct hvm_ioreq_server *s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    if ( p2m->ioreq.server == s )
> +    {
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +
> +    p2m_memory_type_changed(d);

Do we really want to call this unconditionally, even if the server being
destroyed here isn't the server associated with p2m_ioreq_server?  It
shouldn't have a functional impact, but it will slow things down as the
ept tables are unnecessarily rebuilt.

> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index a1eae52..e7fc75d 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -489,6 +489,30 @@ struct xen_hvm_altp2m_op {
>  typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t);
>  
> +#if defined(__XEN__) || defined(__XEN_TOOLS__)
> +
> +#define HVMOP_map_mem_type_to_ioreq_server 26
> +struct xen_hvm_map_mem_type_to_ioreq_server {
> +    domid_t domid;      /* IN - domain to be serviced */
> +    ioservid_t id;      /* IN - server id */
> +    hvmmem_type_t type; /* IN - memory type */

What is this 'type' for?  Do you expect to map other types of memory
this way in the future?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel