All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Nakajima, Jun" <jun.nakajima@intel.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>, Paul Durrant <paul.durrant@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
Date: Fri, 24 Mar 2017 09:26:46 +0000	[thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D190C7D0FD@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <58D24E25.5010000@linux.intel.com>

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Wednesday, March 22, 2017 6:13 PM
> 
> On 3/22/2017 3:49 PM, Tian, Kevin wrote:
> >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> >> Sent: Tuesday, March 21, 2017 10:53 AM
> >>
> >> A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to
> let
> >> one ioreq server claim/disclaim its responsibility for the handling
> >> of guest pages with p2m type p2m_ioreq_server. Users of this DMOP can
> >> specify which kind of operation is supposed to be emulated in a
> >> parameter named flags. Currently, this DMOP only support the emulation
> of write operations.
> >> And it can be further extended to support the emulation of read ones
> >> if an ioreq server has such requirement in the future.
> > p2m_ioreq_server was already introduced before. Do you want to give
> > some background how current state is around that type which will be
> > helpful about purpose of this patch?
> 
> Sorry? I thought the background is described in the cover letter.
> Previously p2m_ioreq_server is only for write-protection, and is tracked in an
> ioreq server's rangeset, this patch is to bind the p2m type with an ioreq
> server directly.

cover letter will not be in git repo. Better you can include it to make
this commit along complete.

> 
> >> For now, we only support one ioreq server for this p2m type, so once
> >> an ioreq server has claimed its ownership, subsequent calls of the
> >> XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can also
> >> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
> >> triggering this new DMOP, with ioreq server id set to the current
> >> owner's and flags parameter set to 0.
> >>
> >> Note both XEN_DMOP_map_mem_type_to_ioreq_server and
> p2m_ioreq_server
> >> are only supported for HVMs with HAP enabled.
> >>
> >> Also note that only after one ioreq server claims its ownership of
> >> p2m_ioreq_server, will the p2m type change to p2m_ioreq_server be
> >> allowed.
> >>
> >> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >> Acked-by: Tim Deegan <tim@xen.org>
> >> ---
> >> Cc: Jan Beulich <jbeulich@suse.com>
> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Cc: Paul Durrant <paul.durrant@citrix.com>
> >> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> >> Cc: Jun Nakajima <jun.nakajima@intel.com>
> >> Cc: Kevin Tian <kevin.tian@intel.com>
> >> Cc: Tim Deegan <tim@xen.org>
> >>
> >> changes in v8:
> >>    - According to comments from Jan & Paul: comments changes in
> >> hvmemul_do_io().
> >>    - According to comments from Jan: remove the redundant code which
> >> would only
> >>      be useful for read emulations.
> >>    - According to comments from Jan: change interface which maps mem
> >> type to
> >>      ioreq server, removed uint16_t pad and added an uint64_t opaque.
> >>    - Address other comments from Jan, i.e. correct return values; remove
> stray
> >>      cast.
> >>
> >> changes in v7:
> >>    - Use new ioreq server interface -
> >> XEN_DMOP_map_mem_type_to_ioreq_server.
> >>    - According to comments from George: removed
> >> domain_pause/unpause() in
> >>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
> >>      and we can avoid the:
> >>      a> deadlock issue existed in v6 patch, between p2m lock and ioreq
> server
> >>         lock by using these locks in the same order - solved in patch 4;
> >>      b> for race condition between vm exit and ioreq server unbinding, we
> can
> >>         just retry this instruction.
> >>    - According to comments from Jan and George: continue to clarify logic
> in
> >>      hvmemul_do_io().
> >>    - According to comments from Jan: clarify comment in
> >> p2m_set_ioreq_server().
> >>
> >> changes in v6:
> >>    - Clarify logic in hvmemul_do_io().
> >>    - Use recursive lock for ioreq server lock.
> >>    - Remove debug print when mapping ioreq server.
> >>    - Clarify code in ept_p2m_type_to_flags() for consistency.
> >>    - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
> >>    - Add comments for HVMMEM_ioreq_server to note only changes
> >>      to/from HVMMEM_ram_rw are permitted.
> >>    - Add domain_pause/unpause() in
> hvm_map_mem_type_to_ioreq_server()
> >>      to avoid the race condition when a vm exit happens on a write-
> >>      protected page, just to find the ioreq server has been unmapped
> >>      already.
> >>    - Introduce a seperate patch to delay the release of p2m
> >>      lock to avoid the race condition.
> >>    - Introduce a seperate patch to handle the read-modify-write
> >>      operations on a write protected page.
> >>
> >> changes in v5:
> >>    - Simplify logic in hvmemul_do_io().
> >>    - Use natual width types instead of fixed width types when possible.
> >>    - Do not grant executable permission for p2m_ioreq_server entries.
> >>    - Clarify comments and commit message.
> >>    - Introduce a seperate patch to recalculate the p2m types after
> >>      the ioreq server unmaps the p2m_ioreq_server.
> >>
> >> changes in v4:
> >>    - According to Paul's advice, add comments around the definition
> >>      of HVMMEM_iore_server in hvm_op.h.
> >>    - According to Wei Liu's comments, change the format of the commit
> >>      message.
> >>
> >> changes in v3:
> >>    - Only support write emulation in this patch;
> >>    - Remove the code to handle race condition in hvmemul_do_io(),
> >>    - No need to reset the p2m type after an ioreq server has disclaimed
> >>      its ownership of p2m_ioreq_server;
> >>    - Only allow p2m type change to p2m_ioreq_server after an ioreq
> >>      server has claimed its ownership of p2m_ioreq_server;
> >>    - Only allow p2m type change to p2m_ioreq_server from pages with
> type
> >>      p2m_ram_rw, and vice versa;
> >>    - HVMOP_map_mem_type_to_ioreq_server interface change - use
> uint16,
> >>      instead of enum to specify the memory type;
> >>    - Function prototype change to p2m_get_ioreq_server();
> >>    - Coding style changes;
> >>    - Commit message changes;
> >>    - Add Tim's Acked-by.
> >>
> >> changes in v2:
> >>    - Only support HAP enabled HVMs;
> >>    - Replace p2m_mem_type_changed() with
> p2m_change_entry_type_global()
> >>      to reset the p2m type, when an ioreq server tries to claim/disclaim
> >>      its ownership of p2m_ioreq_server;
> >>    - Comments changes.
> >> ---
> >>   xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
> >>   xen/arch/x86/hvm/emulate.c       | 65
> >> ++++++++++++++++++++++++++++++++---
> >>   xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
> >>   xen/arch/x86/mm/hap/nested_hap.c |  2 +-
> >>   xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
> >>   xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
> >>   xen/arch/x86/mm/p2m.c            | 74
> >> ++++++++++++++++++++++++++++++++++++++++
> >>   xen/arch/x86/mm/shadow/multi.c   |  3 +-
> >>   xen/include/asm-x86/hvm/ioreq.h  |  2 ++
> >>   xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
> >>   xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
> >>   xen/include/public/hvm/hvm_op.h  |  8 ++++-
> >>   12 files changed, 290 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
> >> 333c884..3f9484d 100644
> >> --- a/xen/arch/x86/hvm/dm.c
> >> +++ b/xen/arch/x86/hvm/dm.c
> >> @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
> >>
> >>   static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
> >> {
> >> +    if ( new == p2m_ioreq_server )
> >> +        return old == p2m_ram_rw;
> >> +
> >> +    if ( old == p2m_ioreq_server )
> >> +        return new == p2m_ram_rw;
> >> +
> >>       return p2m_is_ram(old) ||
> >> -           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
> >> -           (old == p2m_ioreq_server && new == p2m_ram_rw);
> >> +           (p2m_is_hole(old) && new == p2m_mmio_dm);
> >>   }
> >>
> >>   static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@
> >> static int set_mem_type(struct domain *d,
> >>            unlikely(data->mem_type == HVMMEM_unused) )
> >>           return -EINVAL;
> >>
> >> +    if ( data->mem_type  == HVMMEM_ioreq_server )
> >> +    {
> >> +        unsigned int flags;
> >> +
> >> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm.
> */
> >> +        if ( !hap_enabled(d) )
> >> +            return -EOPNOTSUPP;
> >> +
> >> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server
> mapped.
> >> */
> >> +        if ( !p2m_get_ioreq_server(d, &flags) )
> >> +            return -EINVAL;
> >> +    }
> >> +
> >>       while ( iter < data->nr )
> >>       {
> >>           unsigned long pfn = data->first_pfn + iter; @@ -365,6
> >> +383,21 @@ static int dm_op(domid_t domid,
> >>           break;
> >>       }
> >>
> >> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
> >> +    {
> >> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> >> +            &op.u.map_mem_type_to_ioreq_server;
> >> +
> >> +        rc = -EOPNOTSUPP;
> >> +        /* Only support for HAP enabled hvm. */
> > Isn't it obvious from code?
> 
> Yes. Can be removed.
> >> +        if ( !hap_enabled(d) )
> >> +            break;
> >> +
> >> +        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> >> +                                              data->type, data->flags);
> >> +        break;
> >> +    }
> >> +
> >>       case XEN_DMOP_set_ioreq_server_state:
> >>       {
> >>           const struct xen_dm_op_set_ioreq_server_state *data = diff
> >> --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index
> >> f36d7c9..37139e6 100644
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
> >>       uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)  {
> >>       struct vcpu *curr = current;
> >> +    struct domain *currd = curr->domain;
> >>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> >>       ioreq_t p = {
> >>           .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@
> >> -140,7
> >> +141,7 @@ static int hvmemul_do_io(
> >>                (p.dir != dir) ||
> >>                (p.df != df) ||
> >>                (p.data_is_ptr != data_is_addr) )
> >> -            domain_crash(curr->domain);
> >> +            domain_crash(currd);
> >>
> >>           if ( data_is_addr )
> >>               return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,64 @@
> >> static int hvmemul_do_io(
> >>           break;
> >>       case X86EMUL_UNHANDLEABLE:
> >>       {
> >> -        struct hvm_ioreq_server *s =
> >> -            hvm_select_ioreq_server(curr->domain, &p);
> >> +        /*
> >> +         * Xen isn't emulating the instruction internally, so see if
> >> +         * there's an ioreq server that can handle it. Rules:
> >> +         *
> >> +         * - PIO and "normal" MMIO run through
> >> + hvm_select_ioreq_server()
> > why highlights "normal" here? What does a "abnormal" MMIO mean here?
> > p2m_ioreq_server type?
> 
> Yes, it's just to differentiate the MMIO and the p2m_ioreq_server address,
> copied from George's previous comments.
> We can remove the "normal" here.

then you need add such explanation otherwise it's difficult for
other code reader to know its meaning.

> 
> 
> >> +         * to choose the ioreq server by range. If no server is found,
> >> +         * the access is ignored.
> >> +         *
> >> +         * - p2m_ioreq_server accesses are handled by the designated
> >> +         * ioreq_server for the domain, but there are some corner
> >> +         * cases:
> > since only one case is listed, "there is a corner case"
> 
> Another corner case is in patch 3/5 - handling the read-modify-write
> situations.
> Maybe the correct thing is to use word "case" in this patch and change
> it to "cases" in next patch. :-)

then leave it be. Not a big matter.

> 
> >> +         *
> >> +         *   - If the domain ioreq_server is NULL, assume there is a
> >> +         *   race between the unbinding of ioreq server and guest fault
> >> +         *   so re-try the instruction.
> >> +         */
> >> +        struct hvm_ioreq_server *s = NULL;
> >> +        p2m_type_t p2mt = p2m_invalid;
> >> +
> >> +        if ( is_mmio )
> >> +        {
> >> +            unsigned long gmfn = paddr_to_pfn(addr);
> >> +
> >> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
> >> +
> >> +            if ( p2mt == p2m_ioreq_server )
> >> +            {
> >> +                unsigned int flags;
> >> +
> >> +                /*
> >> +                 * Value of s could be stale, when we lost a race
> > better describe it in higher level, e.g. just "no ioreq server is
> > found".
> >
> > what's the meaning of "lost a race"? shouldn't it mean
> > "likely we suffer from a race with..."?
> >
> >> +                 * with dm_op which unmaps p2m_ioreq_server from the
> >> +                 * ioreq server. Yet there's no cheap way to avoid
> > again, not talking about specific code, focus on the operation,
> > e.g. "race with an unmap operation on the ioreq server"
> >
> >> +                 * this, so device model need to do the check.
> >> +                 */
> > How is above comment related to below line?
> 
> Well, the 's' returned by p2m_get_ioreq_server() can be stale - if the
> ioreq server is unmapped
> after p2m_get_ioreq_server() returns. Current rangeset code also has
> such issue if the PIO/MMIO
> is removed from the rangeset of the ioreq server after
> hvm_select_ioreq_server() returns.
> 
> Since using spinlock or domain_pause/unpause is too heavy weighted, we
> suggest the device
> model side check whether the received ioreq is a valid one.
> 
> Above comments are added, according to Jan & Paul's suggestion in v7, to
> let developer know
> we do not grantee the validity of 's' returned by
> p2m_get_ioreq_server/hvm_select_ioreq_server().
> 
> "Value of s could be stale, when we lost a race with..." is not for s
> being NULL, it's about s being
> not valid. For a NULL returned, it is...

Then it makes sense.

> 
> >
> >> +                s = p2m_get_ioreq_server(currd, &flags);
> >> +
> >> +                /*
> >> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
> > p2mt is definitely ioreq_server within this if condition.
> >
> >> +                 * we probably lost a race with unbinding of ioreq
> >> +                 * server, just retry the access.
> >> +                 */
> > looks redundant to earlier comment. Or earlier one should
> > be just removed?
> 
> ... described here, to just retry the access.
> 
> >> +                if ( s == NULL )
> >> +                {
> >> +                    rc = X86EMUL_RETRY;
> >> +                    vio->io_req.state = STATE_IOREQ_NONE;
> >> +                    break;
> >> +                }
> >> +            }
> >> +        }
> >> +
> >> +        /*
> >> +         * Value of s could be stale, when we lost a race with dm_op
> >> +         * which unmaps this PIO/MMIO address from the ioreq server.
> >> +         * The device model side need to do the check.
> >> +         */
> > another duplicated comment. below code is actually for 'normal'
> > MMIO case...
> 
> This is for another possible situation when 's' returned by
> hvm_select_ioreq_server()
> becomes stale later, when the PIO/MMIO is removed from the rangeset.
> 
> Logic in this hvmemul_do_io() has always been a bit mixed. I mean, many
> corner cases
> and race conditions:
>   - between the mapping/unmapping of PIO/MMIO from rangeset
>   - between mapping/unmapping of ioreq server from p2m_ioreq_server
> I tried to give much comments as I can when this patchset evolves, yet
> to find I just
> introduced more confusion...
> 
> Any suggestions?

maybe describe the whole story before the whole p2m_ioreq_server
branch?

	/* comment */
	if ( p2mt == p2m_ioreq_server )

Then you don't need duplicate a lot in specific code line?

> 
> >> +        if ( !s )
> >> +            s = hvm_select_ioreq_server(currd, &p);
> >>
> >>           /* If there is no suitable backing DM, just ignore accesses */
> >>           if ( !s )
> >> @@ -189,7 +246,7 @@ static int hvmemul_do_io(
> >>           else
> >>           {
> >>               rc = hvm_send_ioreq(s, &p, 0);
> >> -            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
> >> +            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
> >>                   vio->io_req.state = STATE_IOREQ_NONE;
> >>               else if ( data_is_addr )
> >>                   rc = X86EMUL_OKAY;
> >> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
> >> ad2edad..746799f 100644
> >> --- a/xen/arch/x86/hvm/ioreq.c
> >> +++ b/xen/arch/x86/hvm/ioreq.c
> >> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d,
> >> ioservid_t id)
> >>
> >>           domain_pause(d);
> >>
> >> +        p2m_destroy_ioreq_server(d, s);
> >> +
> >>           hvm_ioreq_server_disable(s, 0);
> >>
> >>           list_del(&s->list_entry);
> >> @@ -914,6 +916,42 @@ int
> >> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> >>       return rc;
> >>   }
> >>
> >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
> id,
> >> +                                     uint32_t type, uint32_t flags) {
> >> +    struct hvm_ioreq_server *s;
> >> +    int rc;
> >> +
> >> +    /* For now, only HVMMEM_ioreq_server is supported. */
> > obvious comment
> 
> IIRC, this comment(and the below one) is another changes that made
> according to
> some review comments, to remind that we can add new mem type in the
> future.
> So how about we add a line - "For the future, we can support other mem
> types"?
> 
> But that also sounds redundant to me. :)
> So I am also OK to remove this and below comments.

or add a general comment for whole function, indicating those
checks are extensible in the future.

> 
> >> +    if ( type != HVMMEM_ioreq_server )
> >> +        return -EINVAL;
> >> +
> >> +    /* For now, only write emulation is supported. */
> > ditto.
> >
> >> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
> >> +        return -EINVAL;
> >> +
> >> +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> >> +
> >> +    rc = -ENOENT;
> >> +    list_for_each_entry ( s,
> >> +                          &d->arch.hvm_domain.ioreq_server.list,
> >> +                          list_entry )
> >> +    {
> >> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> >> +            continue;
> > any reason why we cannot let default server to claim this
> > new type?
> 
> Well, my understanding about default ioreq server is that it is only for
> legacy
> qemu and is not even created in the dm op hypercall. Latest device models(
> including qemu) are all not default ioreq server now.

ah, didn't realize it. Thanks for letting me know.

> 
> >> +
> >> +        if ( s->id == id )
> >> +        {
> >> +            rc = p2m_set_ioreq_server(d, flags, s);
> >> +            break;
> >> +        }
> >> +    }
> >> +
> >> +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> >>                                  bool_t enabled)  { diff --git
> >> a/xen/arch/x86/mm/hap/nested_hap.c
> >> b/xen/arch/x86/mm/hap/nested_hap.c
> >> index 162afed..408ea7f 100644
> >> --- a/xen/arch/x86/mm/hap/nested_hap.c
> >> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> >> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain
> *p2m,
> >> paddr_t L1_gpa, paddr_t *L0_gpa,
> >>       if ( *p2mt == p2m_mmio_direct )
> >>           goto direct_mmio_out;
> >>       rc = NESTEDHVM_PAGEFAULT_MMIO;
> >> -    if ( *p2mt == p2m_mmio_dm )
> >> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
> >>           goto out;
> >>
> >>       rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
> >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> >> index 568944f..cc1eb21 100644
> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct
> >> p2m_domain *p2m, ept_entry_t *entry,
> >>               entry->r = entry->w = entry->x = 1;
> >>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >>               break;
> >> +        case p2m_ioreq_server:
> >> +            entry->r = 1;
> >> +            entry->w = !(p2m->ioreq.flags &
> >> XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> >> +            entry->x = 0;
> >> +            entry->a = !!cpu_has_vmx_ept_ad;
> >> +            entry->d = entry->w && entry->a;
> >> +            break;
> >>           case p2m_mmio_direct:
> >>               entry->r = entry->x = 1;
> >>               entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
> >> @@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct
> >> p2m_domain *p2m, ept_entry_t *entry,
> >>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >>               break;
> >>           case p2m_grant_map_ro:
> >> -        case p2m_ioreq_server:
> >>               entry->r = 1;
> >>               entry->w = entry->x = 0;
> >>               entry->a = !!cpu_has_vmx_ept_ad; diff --git
> >> a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index
> >> 07e2ccd..f6c45ec 100644
> >> --- a/xen/arch/x86/mm/p2m-pt.c
> >> +++ b/xen/arch/x86/mm/p2m-pt.c
> >> @@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
> >>       PGT_l3_page_table
> >>   };
> >>
> >> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
> >> +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
> >> +                                       p2m_type_t t,
> >> +                                       mfn_t mfn,
> >>                                          unsigned int level)  {
> >>       unsigned long flags;
> >> @@ -92,8 +94,12 @@ static unsigned long
> p2m_type_to_flags(p2m_type_t t,
> >> mfn_t mfn,
> >>       default:
> >>           return flags | _PAGE_NX_BIT;
> >>       case p2m_grant_map_ro:
> >> -    case p2m_ioreq_server:
> >>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> >> +    case p2m_ioreq_server:
> >> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
> >> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
> >> +            return flags & ~_PAGE_RW;
> >> +        return flags;
> >>       case p2m_ram_ro:
> >>       case p2m_ram_logdirty:
> >>       case p2m_ram_shared:
> >> @@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m,
> >> unsigned long gfn)
> >>               p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn
> |
> >> ~mask)
> >>                                 ? p2m_ram_logdirty : p2m_ram_rw;
> >>               unsigned long mfn = l1e_get_pfn(e);
> >> -            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
> >> +            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
> >> +                                                    _mfn(mfn), level);
> >>
> >>               if ( level )
> >>               {
> >> @@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>           l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
> >>               ? l3e_from_pfn(mfn_x(mfn),
> >> -                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
> >> +                           p2m_type_to_flags(p2m, p2mt, mfn, 2) |
> >> + _PAGE_PSE)
> >>               : l3e_empty();
> >>           entry_content.l1 = l3e_content.l3;
> >>
> >> @@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>
> >>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
> >>               entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
> >> -                                             p2m_type_to_flags(p2mt, mfn, 0));
> >> +                                         p2m_type_to_flags(p2m, p2mt,
> >> + mfn, 0));
> >>           else
> >>               entry_content = l1e_empty();
> >>
> >> @@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
> >>               l2e_content = l2e_from_pfn(mfn_x(mfn),
> >> -                                       p2m_type_to_flags(p2mt, mfn, 1) |
> >> +                                       p2m_type_to_flags(p2m, p2mt,
> >> + mfn, 1) |
> >>                                          _PAGE_PSE);
> >>           else
> >>               l2e_content = l2e_empty();
> >> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> >> a5651a3..dd4e477 100644
> >> --- a/xen/arch/x86/mm/p2m.c
> >> +++ b/xen/arch/x86/mm/p2m.c
> >> @@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct
> >> p2m_domain *p2m)
> >>       else
> >>           p2m_pt_init(p2m);
> >>
> >> +    spin_lock_init(&p2m->ioreq.lock);
> >> +
> >>       return ret;
> >>   }
> >>
> >> @@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain
> *d)
> >>       }
> >>   }
> >>
> >> +int p2m_set_ioreq_server(struct domain *d,
> >> +                         unsigned int flags,
> >> +                         struct hvm_ioreq_server *s) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    int rc;
> >> +
> >> +    /*
> >> +     * Use lock to prevent concurrent setting attempts
> >> +     * from multiple ioreq serers.
> > serers -> servers
> 
> Got it. Thanks.
> 
> >> +     */
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    /* Unmap ioreq server from p2m type by passing flags with 0. */
> >> +    if ( flags == 0 )
> >> +    {
> >> +        rc = -EINVAL;
> >> +        if ( p2m->ioreq.server != s )
> >> +            goto out;
> >> +
> >> +        p2m->ioreq.server = NULL;
> >> +        p2m->ioreq.flags = 0;
> >> +    }
> >> +    else
> >> +    {
> >> +        rc = -EBUSY;
> >> +        if ( p2m->ioreq.server != NULL )
> >> +            goto out;
> >> +
> >> +        p2m->ioreq.server = s;
> >> +        p2m->ioreq.flags = flags;
> >> +    }
> >> +
> >> +    rc = 0;
> >> +
> >> + out:
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >> +                                              unsigned int *flags) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    struct hvm_ioreq_server *s;
> >> +
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    s = p2m->ioreq.server;
> >> +    *flags = p2m->ioreq.flags;
> >> +
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +    return s;
> >> +}
> >> +
> >> +void p2m_destroy_ioreq_server(const struct domain *d,
> >> +                              const struct hvm_ioreq_server *s) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    if ( p2m->ioreq.server == s )
> >> +    {
> >> +        p2m->ioreq.server = NULL;
> >> +        p2m->ioreq.flags = 0;
> >> +    }
> >> +
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +}
> >> +
> >>   void p2m_enable_hardware_log_dirty(struct domain *d)  {
> >>       struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git
> >> a/xen/arch/x86/mm/shadow/multi.c
> b/xen/arch/x86/mm/shadow/multi.c
> >> index 7ea9d81..521b639 100644
> >> --- a/xen/arch/x86/mm/shadow/multi.c
> >> +++ b/xen/arch/x86/mm/shadow/multi.c
> >> @@ -3269,8 +3269,7 @@ static int sh_page_fault(struct vcpu *v,
> >>       }
> >>
> >>       /* Need to hand off device-model MMIO to the device model */
> >> -    if ( p2mt == p2m_mmio_dm
> >> -         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
> >> +    if ( p2mt == p2m_mmio_dm )
> >>       {
> >>           gpa = guest_walk_to_gpa(&gw);
> >>           goto mmio;
> >> diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-
> >> x86/hvm/ioreq.h index fbf2c74..b43667a 100644
> >> --- a/xen/include/asm-x86/hvm/ioreq.h
> >> +++ b/xen/include/asm-x86/hvm/ioreq.h
> >> @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct
> domain
> >> *d, ioservid_t id,  int hvm_unmap_io_range_from_ioreq_server(struct
> >> domain *d, ioservid_t id,
> >>                                            uint32_t type, uint64_t start,
> >>                                            uint64_t end);
> >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
> id,
> >> +                                     uint32_t type, uint32_t flags);
> >>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> >>                                  bool_t enabled);
> >>
> >> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index
> >> 470d29d..3786680 100644
> >> --- a/xen/include/asm-x86/p2m.h
> >> +++ b/xen/include/asm-x86/p2m.h
> >> @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
> >>                          | p2m_to_mask(p2m_ram_paging_out)      \
> >>                          | p2m_to_mask(p2m_ram_paged)           \
> >>                          | p2m_to_mask(p2m_ram_paging_in)       \
> >> -                       | p2m_to_mask(p2m_ram_shared))
> >> +                       | p2m_to_mask(p2m_ram_shared)          \
> >> +                       | p2m_to_mask(p2m_ioreq_server))
> >>
> >>   /* Types that represent a physmap hole that is ok to replace with a
> shared
> >>    * entry */
> >> @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
> >>   #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
> >>                         | p2m_to_mask(p2m_ram_ro)         \
> >>                         | p2m_to_mask(p2m_grant_map_ro)   \
> >> -                      | p2m_to_mask(p2m_ram_shared)     \
> >> -                      | p2m_to_mask(p2m_ioreq_server))
> >> +                      | p2m_to_mask(p2m_ram_shared))
> >>
> >>   /* Write-discard types, which should discard the write operations */
> >>   #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
> >> @@ -336,6 +336,20 @@ struct p2m_domain {
> >>           struct ept_data ept;
> >>           /* NPT-equivalent structure could be added here. */
> >>       };
> >> +
> >> +     struct {
> >> +         spinlock_t lock;
> >> +         /*
> >> +          * ioreq server who's responsible for the emulation of
> >> +          * gfns with specific p2m type(for now, p2m_ioreq_server).
> >> +          */
> >> +         struct hvm_ioreq_server *server;
> >> +         /*
> >> +          * flags specifies whether read, write or both operations
> >> +          * are to be emulated by an ioreq server.
> >> +          */
> >> +         unsigned int flags;
> >> +     } ioreq;
> >>   };
> >>
> >>   /* get host p2m table */
> >> @@ -827,6 +841,12 @@ static inline unsigned int
> >> p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
> >>       return flags;
> >>   }
> >>
> >> +int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
> >> +                         struct hvm_ioreq_server *s); struct
> >> +hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >> +                                              unsigned int *flags);
> >> +void p2m_destroy_ioreq_server(const struct domain *d, const struct
> >> +hvm_ioreq_server *s);
> >> +
> >>   #endif /* _XEN_ASM_X86_P2M_H */
> >>
> >>   /*
> >> diff --git a/xen/include/public/hvm/dm_op.h
> >> b/xen/include/public/hvm/dm_op.h index f54cece..2a36833 100644
> >> --- a/xen/include/public/hvm/dm_op.h
> >> +++ b/xen/include/public/hvm/dm_op.h
> >> @@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
> >>       uint64_aligned_t addr;
> >>   };
> >>
> >> +/*
> >> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the
> >> IOREQ Server <id>
> >> + *                                      to specific memroy type <type>
> > memroy->memory
> 
> Right. Thanks. :)
> 
> B.R.
> Yu
> >> + *                                      for specific accesses <flags>
> >> + *
> >> + * For now, flags only accept the value of
> >> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
> >> + * which means only write operations are to be forwarded to an ioreq
> >> server.
> >> + * Support for the emulation of read operations can be added when an
> >> +ioreq
> >> + * server has such requirement in future.
> >> + */
> >> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
> >> +
> >> +struct xen_dm_op_map_mem_type_to_ioreq_server {
> >> +    ioservid_t id;      /* IN - ioreq server id */
> >> +    uint16_t type;      /* IN - memory type */
> >> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
> >> +                           ioreq server. flags with 0 means to unmap the
> >> +                           ioreq server */
> >> +
> >> +#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0) #define
> >> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
> >> +
> >> +    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
> >> +                           has to be set to zero by the caller */ };
> >> +
> >>   struct xen_dm_op {
> >>       uint32_t op;
> >>       uint32_t pad;
> >> @@ -336,6 +362,8 @@ struct xen_dm_op {
> >>           struct xen_dm_op_set_mem_type set_mem_type;
> >>           struct xen_dm_op_inject_event inject_event;
> >>           struct xen_dm_op_inject_msi inject_msi;
> >> +        struct xen_dm_op_map_mem_type_to_ioreq_server
> >> +                map_mem_type_to_ioreq_server;
> >>       } u;
> >>   };
> >>
> >> diff --git a/xen/include/public/hvm/hvm_op.h
> >> b/xen/include/public/hvm/hvm_op.h index bc00ef0..0bdafdf 100644
> >> --- a/xen/include/public/hvm/hvm_op.h
> >> +++ b/xen/include/public/hvm/hvm_op.h
> >> @@ -93,7 +93,13 @@ typedef enum {
> >>       HVMMEM_unused,             /* Placeholder; setting memory to this type
> >>                                     will fail for code after 4.7.0 */  #endif
> >> -    HVMMEM_ioreq_server
> >> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq
> server;
> >> type
> >> +                                  changes to this value are only allowed after
> >> +                                  an ioreq server has claimed its ownership.
> >> +                                  Only pages with HVMMEM_ram_rw are allowed to
> >> +                                  change to this type; conversely, pages with
> >> +                                  this type are only allowed to be changed back
> >> +                                  to HVMMEM_ram_rw. */
> >>   } hvmmem_type_t;
> >>
> >>   /* Hint from PV drivers for pagetable destruction. */
> >> --
> >> 1.9.1
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-03-24  9:26 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2017-03-21  2:52 ` [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2017-03-29 13:39   ` George Dunlap
2017-03-29 13:50     ` Jan Beulich
2017-03-21  2:52 ` [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2017-03-22  7:49   ` Tian, Kevin
2017-03-22 10:12     ` Yu Zhang
2017-03-24  9:26       ` Tian, Kevin [this message]
2017-03-24 12:34         ` Yu Zhang
2017-03-22 14:21   ` Jan Beulich
2017-03-23  3:23     ` Yu Zhang
2017-03-23  8:57       ` Jan Beulich
2017-03-24  9:05         ` Yu Zhang
2017-03-24 10:19           ` Jan Beulich
2017-03-24 12:35             ` Yu Zhang
2017-03-24 13:09               ` Jan Beulich
2017-03-21  2:52 ` [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2017-03-22 14:22   ` Jan Beulich
2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
2017-03-21 10:05   ` Paul Durrant
2017-03-22  8:10   ` Tian, Kevin
2017-03-22 10:12     ` Yu Zhang
2017-03-24  9:37       ` Tian, Kevin
2017-03-24 12:45         ` Yu Zhang
2017-03-22 14:29   ` Jan Beulich
2017-03-23  3:23     ` Yu Zhang
2017-03-23  9:00       ` Jan Beulich
2017-03-24  9:05         ` Yu Zhang
2017-03-24 10:37           ` Jan Beulich
2017-03-24 12:36             ` Yu Zhang
2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2017-03-21 10:00   ` Paul Durrant
2017-03-21 11:15     ` Yu Zhang
2017-03-21 13:49       ` Paul Durrant
2017-03-21 14:14         ` Yu Zhang
2017-03-22  8:28   ` Tian, Kevin
2017-03-22  8:54     ` Jan Beulich
2017-03-22  9:02       ` Tian, Kevin
2017-03-22 14:39   ` Jan Beulich
2017-03-23  3:23     ` Yu Zhang
2017-03-23  9:02       ` Jan Beulich
2017-03-24  9:05         ` Yu Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D190C7D0FD@SHSMSX101.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=paul.durrant@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.