Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.

From: "Tian, Kevin" <kevin.tian@intel.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Nakajima, Jun" <jun.nakajima@intel.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>, Paul Durrant <paul.durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Fri, 24 Mar 2017 09:26:46 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D190C7D0FD@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <58D24E25.5010000@linux.intel.com>

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Wednesday, March 22, 2017 6:13 PM
> 
> On 3/22/2017 3:49 PM, Tian, Kevin wrote:
> >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> >> Sent: Tuesday, March 21, 2017 10:53 AM
> >>
> >> A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to
> let
> >> one ioreq server claim/disclaim its responsibility for the handling
> >> of guest pages with p2m type p2m_ioreq_server. Users of this DMOP can
> >> specify which kind of operation is supposed to be emulated in a
> >> parameter named flags. Currently, this DMOP only support the emulation
> of write operations.
> >> And it can be further extended to support the emulation of read ones
> >> if an ioreq server has such requirement in the future.
> > p2m_ioreq_server was already introduced before. Do you want to give
> > some background how current state is around that type which will be
> > helpful about purpose of this patch?
> 
> Sorry? I thought the background is described in the cover letter.
> Previously p2m_ioreq_server is only for write-protection, and is tracked in an
> ioreq server's rangeset, this patch is to bind the p2m type with an ioreq
> server directly.

cover letter will not be in git repo. Better you can include it to make
this commit along complete.

> 
> >> For now, we only support one ioreq server for this p2m type, so once
> >> an ioreq server has claimed its ownership, subsequent calls of the
> >> XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can also
> >> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
> >> triggering this new DMOP, with ioreq server id set to the current
> >> owner's and flags parameter set to 0.
> >>
> >> Note both XEN_DMOP_map_mem_type_to_ioreq_server and
> p2m_ioreq_server
> >> are only supported for HVMs with HAP enabled.
> >>
> >> Also note that only after one ioreq server claims its ownership of
> >> p2m_ioreq_server, will the p2m type change to p2m_ioreq_server be
> >> allowed.
> >>
> >> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >> Acked-by: Tim Deegan <tim@xen.org>
> >> ---
> >> Cc: Jan Beulich <jbeulich@suse.com>
> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Cc: Paul Durrant <paul.durrant@citrix.com>
> >> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> >> Cc: Jun Nakajima <jun.nakajima@intel.com>
> >> Cc: Kevin Tian <kevin.tian@intel.com>
> >> Cc: Tim Deegan <tim@xen.org>
> >>
> >> changes in v8:
> >>    - According to comments from Jan & Paul: comments changes in
> >> hvmemul_do_io().
> >>    - According to comments from Jan: remove the redundant code which
> >> would only
> >>      be useful for read emulations.
> >>    - According to comments from Jan: change interface which maps mem
> >> type to
> >>      ioreq server, removed uint16_t pad and added an uint64_t opaque.
> >>    - Address other comments from Jan, i.e. correct return values; remove
> stray
> >>      cast.
> >>
> >> changes in v7:
> >>    - Use new ioreq server interface -
> >> XEN_DMOP_map_mem_type_to_ioreq_server.
> >>    - According to comments from George: removed
> >> domain_pause/unpause() in
> >>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
> >>      and we can avoid the:
> >>      a> deadlock issue existed in v6 patch, between p2m lock and ioreq
> server
> >>         lock by using these locks in the same order - solved in patch 4;
> >>      b> for race condition between vm exit and ioreq server unbinding, we
> can
> >>         just retry this instruction.
> >>    - According to comments from Jan and George: continue to clarify logic
> in
> >>      hvmemul_do_io().
> >>    - According to comments from Jan: clarify comment in
> >> p2m_set_ioreq_server().
> >>
> >> changes in v6:
> >>    - Clarify logic in hvmemul_do_io().
> >>    - Use recursive lock for ioreq server lock.
> >>    - Remove debug print when mapping ioreq server.
> >>    - Clarify code in ept_p2m_type_to_flags() for consistency.
> >>    - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
> >>    - Add comments for HVMMEM_ioreq_server to note only changes
> >>      to/from HVMMEM_ram_rw are permitted.
> >>    - Add domain_pause/unpause() in
> hvm_map_mem_type_to_ioreq_server()
> >>      to avoid the race condition when a vm exit happens on a write-
> >>      protected page, just to find the ioreq server has been unmapped
> >>      already.
> >>    - Introduce a seperate patch to delay the release of p2m
> >>      lock to avoid the race condition.
> >>    - Introduce a seperate patch to handle the read-modify-write
> >>      operations on a write protected page.
> >>
> >> changes in v5:
> >>    - Simplify logic in hvmemul_do_io().
> >>    - Use natual width types instead of fixed width types when possible.
> >>    - Do not grant executable permission for p2m_ioreq_server entries.
> >>    - Clarify comments and commit message.
> >>    - Introduce a seperate patch to recalculate the p2m types after
> >>      the ioreq server unmaps the p2m_ioreq_server.
> >>
> >> changes in v4:
> >>    - According to Paul's advice, add comments around the definition
> >>      of HVMMEM_iore_server in hvm_op.h.
> >>    - According to Wei Liu's comments, change the format of the commit
> >>      message.
> >>
> >> changes in v3:
> >>    - Only support write emulation in this patch;
> >>    - Remove the code to handle race condition in hvmemul_do_io(),
> >>    - No need to reset the p2m type after an ioreq server has disclaimed
> >>      its ownership of p2m_ioreq_server;
> >>    - Only allow p2m type change to p2m_ioreq_server after an ioreq
> >>      server has claimed its ownership of p2m_ioreq_server;
> >>    - Only allow p2m type change to p2m_ioreq_server from pages with
> type
> >>      p2m_ram_rw, and vice versa;
> >>    - HVMOP_map_mem_type_to_ioreq_server interface change - use
> uint16,
> >>      instead of enum to specify the memory type;
> >>    - Function prototype change to p2m_get_ioreq_server();
> >>    - Coding style changes;
> >>    - Commit message changes;
> >>    - Add Tim's Acked-by.
> >>
> >> changes in v2:
> >>    - Only support HAP enabled HVMs;
> >>    - Replace p2m_mem_type_changed() with
> p2m_change_entry_type_global()
> >>      to reset the p2m type, when an ioreq server tries to claim/disclaim
> >>      its ownership of p2m_ioreq_server;
> >>    - Comments changes.
> >> ---
> >>   xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
> >>   xen/arch/x86/hvm/emulate.c       | 65
> >> ++++++++++++++++++++++++++++++++---
> >>   xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
> >>   xen/arch/x86/mm/hap/nested_hap.c |  2 +-
> >>   xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
> >>   xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
> >>   xen/arch/x86/mm/p2m.c            | 74
> >> ++++++++++++++++++++++++++++++++++++++++
> >>   xen/arch/x86/mm/shadow/multi.c   |  3 +-
> >>   xen/include/asm-x86/hvm/ioreq.h  |  2 ++
> >>   xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
> >>   xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
> >>   xen/include/public/hvm/hvm_op.h  |  8 ++++-
> >>   12 files changed, 290 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
> >> 333c884..3f9484d 100644
> >> --- a/xen/arch/x86/hvm/dm.c
> >> +++ b/xen/arch/x86/hvm/dm.c
> >> @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
> >>
> >>   static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
> >> {
> >> +    if ( new == p2m_ioreq_server )
> >> +        return old == p2m_ram_rw;
> >> +
> >> +    if ( old == p2m_ioreq_server )
> >> +        return new == p2m_ram_rw;
> >> +
> >>       return p2m_is_ram(old) ||
> >> -           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
> >> -           (old == p2m_ioreq_server && new == p2m_ram_rw);
> >> +           (p2m_is_hole(old) && new == p2m_mmio_dm);
> >>   }
> >>
> >>   static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@
> >> static int set_mem_type(struct domain *d,
> >>            unlikely(data->mem_type == HVMMEM_unused) )
> >>           return -EINVAL;
> >>
> >> +    if ( data->mem_type  == HVMMEM_ioreq_server )
> >> +    {
> >> +        unsigned int flags;
> >> +
> >> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm.
> */
> >> +        if ( !hap_enabled(d) )
> >> +            return -EOPNOTSUPP;
> >> +
> >> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server
> mapped.
> >> */
> >> +        if ( !p2m_get_ioreq_server(d, &flags) )
> >> +            return -EINVAL;
> >> +    }
> >> +
> >>       while ( iter < data->nr )
> >>       {
> >>           unsigned long pfn = data->first_pfn + iter; @@ -365,6
> >> +383,21 @@ static int dm_op(domid_t domid,
> >>           break;
> >>       }
> >>
> >> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
> >> +    {
> >> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> >> +            &op.u.map_mem_type_to_ioreq_server;
> >> +
> >> +        rc = -EOPNOTSUPP;
> >> +        /* Only support for HAP enabled hvm. */
> > Isn't it obvious from code?
> 
> Yes. Can be removed.
> >> +        if ( !hap_enabled(d) )
> >> +            break;
> >> +
> >> +        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> >> +                                              data->type, data->flags);
> >> +        break;
> >> +    }
> >> +
> >>       case XEN_DMOP_set_ioreq_server_state:
> >>       {
> >>           const struct xen_dm_op_set_ioreq_server_state *data = diff
> >> --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index
> >> f36d7c9..37139e6 100644
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
> >>       uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)  {
> >>       struct vcpu *curr = current;
> >> +    struct domain *currd = curr->domain;
> >>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> >>       ioreq_t p = {
> >>           .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@
> >> -140,7
> >> +141,7 @@ static int hvmemul_do_io(
> >>                (p.dir != dir) ||
> >>                (p.df != df) ||
> >>                (p.data_is_ptr != data_is_addr) )
> >> -            domain_crash(curr->domain);
> >> +            domain_crash(currd);
> >>
> >>           if ( data_is_addr )
> >>               return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,64 @@
> >> static int hvmemul_do_io(
> >>           break;
> >>       case X86EMUL_UNHANDLEABLE:
> >>       {
> >> -        struct hvm_ioreq_server *s =
> >> -            hvm_select_ioreq_server(curr->domain, &p);
> >> +        /*
> >> +         * Xen isn't emulating the instruction internally, so see if
> >> +         * there's an ioreq server that can handle it. Rules:
> >> +         *
> >> +         * - PIO and "normal" MMIO run through
> >> + hvm_select_ioreq_server()
> > why highlights "normal" here? What does a "abnormal" MMIO mean here?
> > p2m_ioreq_server type?
> 
> Yes, it's just to differentiate the MMIO and the p2m_ioreq_server address,
> copied from George's previous comments.
> We can remove the "normal" here.

then you need add such explanation otherwise it's difficult for
other code reader to know its meaning.

> 
> 
> >> +         * to choose the ioreq server by range. If no server is found,
> >> +         * the access is ignored.
> >> +         *
> >> +         * - p2m_ioreq_server accesses are handled by the designated
> >> +         * ioreq_server for the domain, but there are some corner
> >> +         * cases:
> > since only one case is listed, "there is a corner case"
> 
> Another corner case is in patch 3/5 - handling the read-modify-write
> situations.
> Maybe the correct thing is to use word "case" in this patch and change
> it to "cases" in next patch. :-)

then leave it be. Not a big matter.

> 
> >> +         *
> >> +         *   - If the domain ioreq_server is NULL, assume there is a
> >> +         *   race between the unbinding of ioreq server and guest fault
> >> +         *   so re-try the instruction.
> >> +         */
> >> +        struct hvm_ioreq_server *s = NULL;
> >> +        p2m_type_t p2mt = p2m_invalid;
> >> +
> >> +        if ( is_mmio )
> >> +        {
> >> +            unsigned long gmfn = paddr_to_pfn(addr);
> >> +
> >> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
> >> +
> >> +            if ( p2mt == p2m_ioreq_server )
> >> +            {
> >> +                unsigned int flags;
> >> +
> >> +                /*
> >> +                 * Value of s could be stale, when we lost a race
> > better describe it in higher level, e.g. just "no ioreq server is
> > found".
> >
> > what's the meaning of "lost a race"? shouldn't it mean
> > "likely we suffer from a race with..."?
> >
> >> +                 * with dm_op which unmaps p2m_ioreq_server from the
> >> +                 * ioreq server. Yet there's no cheap way to avoid
> > again, not talking about specific code, focus on the operation,
> > e.g. "race with an unmap operation on the ioreq server"
> >
> >> +                 * this, so device model need to do the check.
> >> +                 */
> > How is above comment related to below line?
> 
> Well, the 's' returned by p2m_get_ioreq_server() can be stale - if the
> ioreq server is unmapped
> after p2m_get_ioreq_server() returns. Current rangeset code also has
> such issue if the PIO/MMIO
> is removed from the rangeset of the ioreq server after
> hvm_select_ioreq_server() returns.
> 
> Since using spinlock or domain_pause/unpause is too heavy weighted, we
> suggest the device
> model side check whether the received ioreq is a valid one.
> 
> Above comments are added, according to Jan & Paul's suggestion in v7, to
> let developer know
> we do not grantee the validity of 's' returned by
> p2m_get_ioreq_server/hvm_select_ioreq_server().
> 
> "Value of s could be stale, when we lost a race with..." is not for s
> being NULL, it's about s being
> not valid. For a NULL returned, it is...

Then it makes sense.

> 
> >
> >> +                s = p2m_get_ioreq_server(currd, &flags);
> >> +
> >> +                /*
> >> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
> > p2mt is definitely ioreq_server within this if condition.
> >
> >> +                 * we probably lost a race with unbinding of ioreq
> >> +                 * server, just retry the access.
> >> +                 */
> > looks redundant to earlier comment. Or earlier one should
> > be just removed?
> 
> ... described here, to just retry the access.
> 
> >> +                if ( s == NULL )
> >> +                {
> >> +                    rc = X86EMUL_RETRY;
> >> +                    vio->io_req.state = STATE_IOREQ_NONE;
> >> +                    break;
> >> +                }
> >> +            }
> >> +        }
> >> +
> >> +        /*
> >> +         * Value of s could be stale, when we lost a race with dm_op
> >> +         * which unmaps this PIO/MMIO address from the ioreq server.
> >> +         * The device model side need to do the check.
> >> +         */
> > another duplicated comment. below code is actually for 'normal'
> > MMIO case...
> 
> This is for another possible situation when 's' returned by
> hvm_select_ioreq_server()
> becomes stale later, when the PIO/MMIO is removed from the rangeset.
> 
> Logic in this hvmemul_do_io() has always been a bit mixed. I mean, many
> corner cases
> and race conditions:
>   - between the mapping/unmapping of PIO/MMIO from rangeset
>   - between mapping/unmapping of ioreq server from p2m_ioreq_server
> I tried to give much comments as I can when this patchset evolves, yet
> to find I just
> introduced more confusion...
> 
> Any suggestions?

maybe describe the whole story before the whole p2m_ioreq_server
branch?

	/* comment */
	if ( p2mt == p2m_ioreq_server )

Then you don't need duplicate a lot in specific code line?

> 
> >> +        if ( !s )
> >> +            s = hvm_select_ioreq_server(currd, &p);
> >>
> >>           /* If there is no suitable backing DM, just ignore accesses */
> >>           if ( !s )
> >> @@ -189,7 +246,7 @@ static int hvmemul_do_io(
> >>           else
> >>           {
> >>               rc = hvm_send_ioreq(s, &p, 0);
> >> -            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
> >> +            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
> >>                   vio->io_req.state = STATE_IOREQ_NONE;
> >>               else if ( data_is_addr )
> >>                   rc = X86EMUL_OKAY;
> >> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
> >> ad2edad..746799f 100644
> >> --- a/xen/arch/x86/hvm/ioreq.c
> >> +++ b/xen/arch/x86/hvm/ioreq.c
> >> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d,
> >> ioservid_t id)
> >>
> >>           domain_pause(d);
> >>
> >> +        p2m_destroy_ioreq_server(d, s);
> >> +
> >>           hvm_ioreq_server_disable(s, 0);
> >>
> >>           list_del(&s->list_entry);
> >> @@ -914,6 +916,42 @@ int
> >> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> >>       return rc;
> >>   }
> >>
> >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
> id,
> >> +                                     uint32_t type, uint32_t flags) {
> >> +    struct hvm_ioreq_server *s;
> >> +    int rc;
> >> +
> >> +    /* For now, only HVMMEM_ioreq_server is supported. */
> > obvious comment
> 
> IIRC, this comment(and the below one) is another changes that made
> according to
> some review comments, to remind that we can add new mem type in the
> future.
> So how about we add a line - "For the future, we can support other mem
> types"?
> 
> But that also sounds redundant to me. :)
> So I am also OK to remove this and below comments.

or add a general comment for whole function, indicating those
checks are extensible in the future.

> 
> >> +    if ( type != HVMMEM_ioreq_server )
> >> +        return -EINVAL;
> >> +
> >> +    /* For now, only write emulation is supported. */
> > ditto.
> >
> >> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
> >> +        return -EINVAL;
> >> +
> >> +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> >> +
> >> +    rc = -ENOENT;
> >> +    list_for_each_entry ( s,
> >> +                          &d->arch.hvm_domain.ioreq_server.list,
> >> +                          list_entry )
> >> +    {
> >> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> >> +            continue;
> > any reason why we cannot let default server to claim this
> > new type?
> 
> Well, my understanding about default ioreq server is that it is only for
> legacy
> qemu and is not even created in the dm op hypercall. Latest device models(
> including qemu) are all not default ioreq server now.

ah, didn't realize it. Thanks for letting me know.

> 
> >> +
> >> +        if ( s->id == id )
> >> +        {
> >> +            rc = p2m_set_ioreq_server(d, flags, s);
> >> +            break;
> >> +        }
> >> +    }
> >> +
> >> +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> >>                                  bool_t enabled)  { diff --git
> >> a/xen/arch/x86/mm/hap/nested_hap.c
> >> b/xen/arch/x86/mm/hap/nested_hap.c
> >> index 162afed..408ea7f 100644
> >> --- a/xen/arch/x86/mm/hap/nested_hap.c
> >> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> >> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain
> *p2m,
> >> paddr_t L1_gpa, paddr_t *L0_gpa,
> >>       if ( *p2mt == p2m_mmio_direct )
> >>           goto direct_mmio_out;
> >>       rc = NESTEDHVM_PAGEFAULT_MMIO;
> >> -    if ( *p2mt == p2m_mmio_dm )
> >> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
> >>           goto out;
> >>
> >>       rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
> >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> >> index 568944f..cc1eb21 100644
> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct
> >> p2m_domain *p2m, ept_entry_t *entry,
> >>               entry->r = entry->w = entry->x = 1;
> >>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >>               break;
> >> +        case p2m_ioreq_server:
> >> +            entry->r = 1;
> >> +            entry->w = !(p2m->ioreq.flags &
> >> XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> >> +            entry->x = 0;
> >> +            entry->a = !!cpu_has_vmx_ept_ad;
> >> +            entry->d = entry->w && entry->a;
> >> +            break;
> >>           case p2m_mmio_direct:
> >>               entry->r = entry->x = 1;
> >>               entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
> >> @@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct
> >> p2m_domain *p2m, ept_entry_t *entry,
> >>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >>               break;
> >>           case p2m_grant_map_ro:
> >> -        case p2m_ioreq_server:
> >>               entry->r = 1;
> >>               entry->w = entry->x = 0;
> >>               entry->a = !!cpu_has_vmx_ept_ad; diff --git
> >> a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index
> >> 07e2ccd..f6c45ec 100644
> >> --- a/xen/arch/x86/mm/p2m-pt.c
> >> +++ b/xen/arch/x86/mm/p2m-pt.c
> >> @@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
> >>       PGT_l3_page_table
> >>   };
> >>
> >> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
> >> +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
> >> +                                       p2m_type_t t,
> >> +                                       mfn_t mfn,
> >>                                          unsigned int level)  {
> >>       unsigned long flags;
> >> @@ -92,8 +94,12 @@ static unsigned long
> p2m_type_to_flags(p2m_type_t t,
> >> mfn_t mfn,
> >>       default:
> >>           return flags | _PAGE_NX_BIT;
> >>       case p2m_grant_map_ro:
> >> -    case p2m_ioreq_server:
> >>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> >> +    case p2m_ioreq_server:
> >> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
> >> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
> >> +            return flags & ~_PAGE_RW;
> >> +        return flags;
> >>       case p2m_ram_ro:
> >>       case p2m_ram_logdirty:
> >>       case p2m_ram_shared:
> >> @@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m,
> >> unsigned long gfn)
> >>               p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn
> |
> >> ~mask)
> >>                                 ? p2m_ram_logdirty : p2m_ram_rw;
> >>               unsigned long mfn = l1e_get_pfn(e);
> >> -            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
> >> +            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
> >> +                                                    _mfn(mfn), level);
> >>
> >>               if ( level )
> >>               {
> >> @@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>           l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
> >>               ? l3e_from_pfn(mfn_x(mfn),
> >> -                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
> >> +                           p2m_type_to_flags(p2m, p2mt, mfn, 2) |
> >> + _PAGE_PSE)
> >>               : l3e_empty();
> >>           entry_content.l1 = l3e_content.l3;
> >>
> >> @@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>
> >>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
> >>               entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
> >> -                                             p2m_type_to_flags(p2mt, mfn, 0));
> >> +                                         p2m_type_to_flags(p2m, p2mt,
> >> + mfn, 0));
> >>           else
> >>               entry_content = l1e_empty();
> >>
> >> @@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
> >>               l2e_content = l2e_from_pfn(mfn_x(mfn),
> >> -                                       p2m_type_to_flags(p2mt, mfn, 1) |
> >> +                                       p2m_type_to_flags(p2m, p2mt,
> >> + mfn, 1) |
> >>                                          _PAGE_PSE);
> >>           else
> >>               l2e_content = l2e_empty();
> >> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> >> a5651a3..dd4e477 100644
> >> --- a/xen/arch/x86/mm/p2m.c
> >> +++ b/xen/arch/x86/mm/p2m.c
> >> @@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct
> >> p2m_domain *p2m)
> >>       else
> >>           p2m_pt_init(p2m);
> >>
> >> +    spin_lock_init(&p2m->ioreq.lock);
> >> +
> >>       return ret;
> >>   }
> >>
> >> @@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain
> *d)
> >>       }
> >>   }
> >>
> >> +int p2m_set_ioreq_server(struct domain *d,
> >> +                         unsigned int flags,
> >> +                         struct hvm_ioreq_server *s) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    int rc;
> >> +
> >> +    /*
> >> +     * Use lock to prevent concurrent setting attempts
> >> +     * from multiple ioreq serers.
> > serers -> servers
> 
> Got it. Thanks.
> 
> >> +     */
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    /* Unmap ioreq server from p2m type by passing flags with 0. */
> >> +    if ( flags == 0 )
> >> +    {
> >> +        rc = -EINVAL;
> >> +        if ( p2m->ioreq.server != s )
> >> +            goto out;
> >> +
> >> +        p2m->ioreq.server = NULL;
> >> +        p2m->ioreq.flags = 0;
> >> +    }
> >> +    else
> >> +    {
> >> +        rc = -EBUSY;
> >> +        if ( p2m->ioreq.server != NULL )
> >> +            goto out;
> >> +
> >> +        p2m->ioreq.server = s;
> >> +        p2m->ioreq.flags = flags;
> >> +    }
> >> +
> >> +    rc = 0;
> >> +
> >> + out:
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >> +                                              unsigned int *flags) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    struct hvm_ioreq_server *s;
> >> +
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    s = p2m->ioreq.server;
> >> +    *flags = p2m->ioreq.flags;
> >> +
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +    return s;
> >> +}
> >> +
> >> +void p2m_destroy_ioreq_server(const struct domain *d,
> >> +                              const struct hvm_ioreq_server *s) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    if ( p2m->ioreq.server == s )
> >> +    {
> >> +        p2m->ioreq.server = NULL;
> >> +        p2m->ioreq.flags = 0;
> >> +    }
> >> +
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +}
> >> +
> >>   void p2m_enable_hardware_log_dirty(struct domain *d)  {
> >>       struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git
> >> a/xen/arch/x86/mm/shadow/multi.c
> b/xen/arch/x86/mm/shadow/multi.c
> >> index 7ea9d81..521b639 100644
> >> --- a/xen/arch/x86/mm/shadow/multi.c
> >> +++ b/xen/arch/x86/mm/shadow/multi.c
> >> @@ -3269,8 +3269,7 @@ static int sh_page_fault(struct vcpu *v,
> >>       }
> >>
> >>       /* Need to hand off device-model MMIO to the device model */
> >> -    if ( p2mt == p2m_mmio_dm
> >> -         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
> >> +    if ( p2mt == p2m_mmio_dm )
> >>       {
> >>           gpa = guest_walk_to_gpa(&gw);
> >>           goto mmio;
> >> diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-
> >> x86/hvm/ioreq.h index fbf2c74..b43667a 100644
> >> --- a/xen/include/asm-x86/hvm/ioreq.h
> >> +++ b/xen/include/asm-x86/hvm/ioreq.h
> >> @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct
> domain
> >> *d, ioservid_t id,  int hvm_unmap_io_range_from_ioreq_server(struct
> >> domain *d, ioservid_t id,
> >>                                            uint32_t type, uint64_t start,
> >>                                            uint64_t end);
> >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
> id,
> >> +                                     uint32_t type, uint32_t flags);
> >>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> >>                                  bool_t enabled);
> >>
> >> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index
> >> 470d29d..3786680 100644
> >> --- a/xen/include/asm-x86/p2m.h
> >> +++ b/xen/include/asm-x86/p2m.h
> >> @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
> >>                          | p2m_to_mask(p2m_ram_paging_out)      \
> >>                          | p2m_to_mask(p2m_ram_paged)           \
> >>                          | p2m_to_mask(p2m_ram_paging_in)       \
> >> -                       | p2m_to_mask(p2m_ram_shared))
> >> +                       | p2m_to_mask(p2m_ram_shared)          \
> >> +                       | p2m_to_mask(p2m_ioreq_server))
> >>
> >>   /* Types that represent a physmap hole that is ok to replace with a
> shared
> >>    * entry */
> >> @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
> >>   #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
> >>                         | p2m_to_mask(p2m_ram_ro)         \
> >>                         | p2m_to_mask(p2m_grant_map_ro)   \
> >> -                      | p2m_to_mask(p2m_ram_shared)     \
> >> -                      | p2m_to_mask(p2m_ioreq_server))
> >> +                      | p2m_to_mask(p2m_ram_shared))
> >>
> >>   /* Write-discard types, which should discard the write operations */
> >>   #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
> >> @@ -336,6 +336,20 @@ struct p2m_domain {
> >>           struct ept_data ept;
> >>           /* NPT-equivalent structure could be added here. */
> >>       };
> >> +
> >> +     struct {
> >> +         spinlock_t lock;
> >> +         /*
> >> +          * ioreq server who's responsible for the emulation of
> >> +          * gfns with specific p2m type(for now, p2m_ioreq_server).
> >> +          */
> >> +         struct hvm_ioreq_server *server;
> >> +         /*
> >> +          * flags specifies whether read, write or both operations
> >> +          * are to be emulated by an ioreq server.
> >> +          */
> >> +         unsigned int flags;
> >> +     } ioreq;
> >>   };
> >>
> >>   /* get host p2m table */
> >> @@ -827,6 +841,12 @@ static inline unsigned int
> >> p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
> >>       return flags;
> >>   }
> >>
> >> +int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
> >> +                         struct hvm_ioreq_server *s); struct
> >> +hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >> +                                              unsigned int *flags);
> >> +void p2m_destroy_ioreq_server(const struct domain *d, const struct
> >> +hvm_ioreq_server *s);
> >> +
> >>   #endif /* _XEN_ASM_X86_P2M_H */
> >>
> >>   /*
> >> diff --git a/xen/include/public/hvm/dm_op.h
> >> b/xen/include/public/hvm/dm_op.h index f54cece..2a36833 100644
> >> --- a/xen/include/public/hvm/dm_op.h
> >> +++ b/xen/include/public/hvm/dm_op.h
> >> @@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
> >>       uint64_aligned_t addr;
> >>   };
> >>
> >> +/*
> >> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the
> >> IOREQ Server <id>
> >> + *                                      to specific memroy type <type>
> > memroy->memory
> 
> Right. Thanks. :)
> 
> B.R.
> Yu
> >> + *                                      for specific accesses <flags>
> >> + *
> >> + * For now, flags only accept the value of
> >> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
> >> + * which means only write operations are to be forwarded to an ioreq
> >> server.
> >> + * Support for the emulation of read operations can be added when an
> >> +ioreq
> >> + * server has such requirement in future.
> >> + */
> >> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
> >> +
> >> +struct xen_dm_op_map_mem_type_to_ioreq_server {
> >> +    ioservid_t id;      /* IN - ioreq server id */
> >> +    uint16_t type;      /* IN - memory type */
> >> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
> >> +                           ioreq server. flags with 0 means to unmap the
> >> +                           ioreq server */
> >> +
> >> +#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0) #define
> >> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
> >> +
> >> +    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
> >> +                           has to be set to zero by the caller */ };
> >> +
> >>   struct xen_dm_op {
> >>       uint32_t op;
> >>       uint32_t pad;
> >> @@ -336,6 +362,8 @@ struct xen_dm_op {
> >>           struct xen_dm_op_set_mem_type set_mem_type;
> >>           struct xen_dm_op_inject_event inject_event;
> >>           struct xen_dm_op_inject_msi inject_msi;
> >> +        struct xen_dm_op_map_mem_type_to_ioreq_server
> >> +                map_mem_type_to_ioreq_server;
> >>       } u;
> >>   };
> >>
> >> diff --git a/xen/include/public/hvm/hvm_op.h
> >> b/xen/include/public/hvm/hvm_op.h index bc00ef0..0bdafdf 100644
> >> --- a/xen/include/public/hvm/hvm_op.h
> >> +++ b/xen/include/public/hvm/hvm_op.h
> >> @@ -93,7 +93,13 @@ typedef enum {
> >>       HVMMEM_unused,             /* Placeholder; setting memory to this type
> >>                                     will fail for code after 4.7.0 */  #endif
> >> -    HVMMEM_ioreq_server
> >> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq
> server;
> >> type
> >> +                                  changes to this value are only allowed after
> >> +                                  an ioreq server has claimed its ownership.
> >> +                                  Only pages with HVMMEM_ram_rw are allowed to
> >> +                                  change to this type; conversely, pages with
> >> +                                  this type are only allowed to be changed back
> >> +                                  to HVMMEM_ram_rw. */
> >>   } hvmmem_type_t;
> >>
> >>   /* Hint from PV drivers for pagetable destruction. */
> >> --
> >> 1.9.1
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel