All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
@ 2017-03-21  2:52 Yu Zhang
  2017-03-21  2:52 ` [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-21  2:52 UTC (permalink / raw)
  To: xen-devel; +Cc: zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to GPU 
I/O resources, e.g. the PPGTT(per-process graphic translation tables).
Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges
to be emulated. To select an ioreq server, the rangeset is searched to
see if the I/O range is recorded. However, number of ram pages to be
tracked may exceed the upper limit of rangeset.

Previously, one solution was proposed to refactor the rangeset, and 
extend its upper limit. However, after 12 rounds discussion, we have
decided to drop this approach due to security concerns. Now this new 
patch series introduces a new mem type, HVMMEM_ioreq_server, and added
hvm operations to let one ioreq server to claim its ownership of ram 
pages with this type. Accesses to a page of this type will be handled
by the specified ioreq server directly. 

Yu Zhang (5):
  x86/ioreq server: Release the p2m lock after mmio is handled.
  x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to
    an ioreq server.
  x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server
    pages.
  x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server
    entries.
  x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server
    entries when an ioreq server unmaps.

 xen/arch/x86/hvm/dm.c            |  72 ++++++++++++++++++++++-
 xen/arch/x86/hvm/emulate.c       |  99 +++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/hvm.c           |   7 +--
 xen/arch/x86/hvm/ioreq.c         |  46 +++++++++++++++
 xen/arch/x86/mm/hap/hap.c        |   9 +++
 xen/arch/x86/mm/hap/nested_hap.c |   2 +-
 xen/arch/x86/mm/p2m-ept.c        |  16 ++++-
 xen/arch/x86/mm/p2m-pt.c         |  32 +++++++---
 xen/arch/x86/mm/p2m.c            | 123 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c   |   3 +-
 xen/include/asm-x86/hvm/ioreq.h  |   2 +
 xen/include/asm-x86/p2m.h        |  42 +++++++++++--
 xen/include/public/hvm/dm_op.h   |  28 +++++++++
 xen/include/public/hvm/hvm_op.h  |   8 ++-
 14 files changed, 459 insertions(+), 30 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled.
  2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
@ 2017-03-21  2:52 ` Yu Zhang
  2017-03-29 13:39   ` George Dunlap
  2017-03-21  2:52 ` [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-21  2:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
select the ioreq server. For example, operations on gfns with
p2m_ioreq_server type will be delivered to a corresponding ioreq
server, and this requires that the p2m type not be switched back
to p2m_ram_rw during the emulation process. To avoid this race
condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
until mmio is handled.

Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
before the handling of mmio, due to a deadlock risk between the p2m
lock and the event lock(in commit 77b8dfe). Later, a per-event channel
lock was introduced in commit de6acb7, to send events. So we do not
need to worry about the deadlock issue.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

changes in v4: 
  - According to comments from Jan: remove the redundant "rc = 0" code.
---
 xen/arch/x86/hvm/hvm.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0282986..bd18d8e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1815,15 +1815,10 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
          (npfec.write_access &&
           (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) )
     {
-        __put_gfn(p2m, gfn);
-        if ( ap2m_active )
-            __put_gfn(hostp2m, gfn);
-
-        rc = 0;
         if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec) )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         rc = 1;
-        goto out;
+        goto out_put_gfn;
     }
 
     /* Check if the page has been paged out */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2017-03-21  2:52 ` [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
@ 2017-03-21  2:52 ` Yu Zhang
  2017-03-22  7:49   ` Tian, Kevin
  2017-03-22 14:21   ` Jan Beulich
  2017-03-21  2:52 ` [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-21  2:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul Durrant, zhiyuan.lv, Jan Beulich

A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to
let one ioreq server claim/disclaim its responsibility for the
handling of guest pages with p2m type p2m_ioreq_server. Users
of this DMOP can specify which kind of operation is supposed
to be emulated in a parameter named flags. Currently, this DMOP
only support the emulation of write operations. And it can be
further extended to support the emulation of read ones if an
ioreq server has such requirement in the future.

For now, we only support one ioreq server for this p2m type, so
once an ioreq server has claimed its ownership, subsequent calls
of the XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can
also disclaim the ownership of guest ram pages with p2m_ioreq_server,
by triggering this new DMOP, with ioreq server id set to the current
owner's and flags parameter set to 0.

Note both XEN_DMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
are only supported for HVMs with HAP enabled.

Also note that only after one ioreq server claims its ownership
of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
be allowed.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Tim Deegan <tim@xen.org>

changes in v8:
  - According to comments from Jan & Paul: comments changes in hvmemul_do_io().
  - According to comments from Jan: remove the redundant code which would only
    be useful for read emulations.
  - According to comments from Jan: change interface which maps mem type to
    ioreq server, removed uint16_t pad and added an uint64_t opaque.
  - Address other comments from Jan, i.e. correct return values; remove stray
    cast.

changes in v7:
  - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
  - According to comments from George: removed domain_pause/unpause() in
    hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
    and we can avoid the:
    a> deadlock issue existed in v6 patch, between p2m lock and ioreq server
       lock by using these locks in the same order - solved in patch 4;
    b> for race condition between vm exit and ioreq server unbinding, we can
       just retry this instruction.
  - According to comments from Jan and George: continue to clarify logic in
    hvmemul_do_io().
  - According to comments from Jan: clarify comment in p2m_set_ioreq_server().

changes in v6:
  - Clarify logic in hvmemul_do_io().
  - Use recursive lock for ioreq server lock.
  - Remove debug print when mapping ioreq server.
  - Clarify code in ept_p2m_type_to_flags() for consistency.
  - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
  - Add comments for HVMMEM_ioreq_server to note only changes
    to/from HVMMEM_ram_rw are permitted.
  - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
    to avoid the race condition when a vm exit happens on a write-
    protected page, just to find the ioreq server has been unmapped
    already.
  - Introduce a seperate patch to delay the release of p2m
    lock to avoid the race condition.
  - Introduce a seperate patch to handle the read-modify-write
    operations on a write protected page.

changes in v5:
  - Simplify logic in hvmemul_do_io().
  - Use natual width types instead of fixed width types when possible.
  - Do not grant executable permission for p2m_ioreq_server entries.
  - Clarify comments and commit message.
  - Introduce a seperate patch to recalculate the p2m types after
    the ioreq server unmaps the p2m_ioreq_server.

changes in v4:
  - According to Paul's advice, add comments around the definition
    of HVMMEM_iore_server in hvm_op.h.
  - According to Wei Liu's comments, change the format of the commit
    message.

changes in v3:
  - Only support write emulation in this patch;
  - Remove the code to handle race condition in hvmemul_do_io(),
  - No need to reset the p2m type after an ioreq server has disclaimed
    its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server after an ioreq
    server has claimed its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server from pages with type
    p2m_ram_rw, and vice versa;
  - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
    instead of enum to specify the memory type;
  - Function prototype change to p2m_get_ioreq_server();
  - Coding style changes;
  - Commit message changes;
  - Add Tim's Acked-by.

changes in v2:
  - Only support HAP enabled HVMs;
  - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
    to reset the p2m type, when an ioreq server tries to claim/disclaim
    its ownership of p2m_ioreq_server;
  - Comments changes.
---
 xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
 xen/arch/x86/hvm/emulate.c       | 65 ++++++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c |  2 +-
 xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
 xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
 xen/arch/x86/mm/p2m.c            | 74 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c   |  3 +-
 xen/include/asm-x86/hvm/ioreq.h  |  2 ++
 xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
 xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
 xen/include/public/hvm/hvm_op.h  |  8 ++++-
 12 files changed, 290 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 333c884..3f9484d 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
 
 static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
 {
+    if ( new == p2m_ioreq_server )
+        return old == p2m_ram_rw;
+
+    if ( old == p2m_ioreq_server )
+        return new == p2m_ram_rw;
+
     return p2m_is_ram(old) ||
-           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
-           (old == p2m_ioreq_server && new == p2m_ram_rw);
+           (p2m_is_hole(old) && new == p2m_mmio_dm);
 }
 
 static int set_mem_type(struct domain *d,
@@ -202,6 +207,19 @@ static int set_mem_type(struct domain *d,
          unlikely(data->mem_type == HVMMEM_unused) )
         return -EINVAL;
 
+    if ( data->mem_type  == HVMMEM_ioreq_server )
+    {
+        unsigned int flags;
+
+        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */
+        if ( !hap_enabled(d) )
+            return -EOPNOTSUPP;
+
+        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */
+        if ( !p2m_get_ioreq_server(d, &flags) )
+            return -EINVAL;
+    }
+
     while ( iter < data->nr )
     {
         unsigned long pfn = data->first_pfn + iter;
@@ -365,6 +383,21 @@ static int dm_op(domid_t domid,
         break;
     }
 
+    case XEN_DMOP_map_mem_type_to_ioreq_server:
+    {
+        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
+            &op.u.map_mem_type_to_ioreq_server;
+
+        rc = -EOPNOTSUPP;
+        /* Only support for HAP enabled hvm. */
+        if ( !hap_enabled(d) )
+            break;
+
+        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                              data->type, data->flags);
+        break;
+    }
+
     case XEN_DMOP_set_ioreq_server_state:
     {
         const struct xen_dm_op_set_ioreq_server_state *data =
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f36d7c9..37139e6 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -99,6 +99,7 @@ static int hvmemul_do_io(
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
@@ -140,7 +141,7 @@ static int hvmemul_do_io(
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
-            domain_crash(curr->domain);
+            domain_crash(currd);
 
         if ( data_is_addr )
             return X86EMUL_UNHANDLEABLE;
@@ -177,8 +178,64 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
     {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
+        /*
+         * Xen isn't emulating the instruction internally, so see if
+         * there's an ioreq server that can handle it. Rules:
+         *
+         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
+         * to choose the ioreq server by range. If no server is found,
+         * the access is ignored.
+         *
+         * - p2m_ioreq_server accesses are handled by the designated
+         * ioreq_server for the domain, but there are some corner
+         * cases:
+         *
+         *   - If the domain ioreq_server is NULL, assume there is a
+         *   race between the unbinding of ioreq server and guest fault
+         *   so re-try the instruction.
+         */
+        struct hvm_ioreq_server *s = NULL;
+        p2m_type_t p2mt = p2m_invalid;
+
+        if ( is_mmio )
+        {
+            unsigned long gmfn = paddr_to_pfn(addr);
+
+            get_gfn_query_unlocked(currd, gmfn, &p2mt);
+
+            if ( p2mt == p2m_ioreq_server )
+            {
+                unsigned int flags;
+
+                /*
+                 * Value of s could be stale, when we lost a race
+                 * with dm_op which unmaps p2m_ioreq_server from the
+                 * ioreq server. Yet there's no cheap way to avoid
+                 * this, so device model need to do the check.
+                 */
+                s = p2m_get_ioreq_server(currd, &flags);
+
+                /*
+                 * If p2mt is ioreq_server but ioreq_server is NULL,
+                 * we probably lost a race with unbinding of ioreq
+                 * server, just retry the access.
+                 */
+                if ( s == NULL )
+                {
+                    rc = X86EMUL_RETRY;
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+            }
+        }
+
+        /*
+         * Value of s could be stale, when we lost a race with dm_op
+         * which unmaps this PIO/MMIO address from the ioreq server.
+         * The device model side need to do the check.
+         */
+        if ( !s )
+            s = hvm_select_ioreq_server(currd, &p);
 
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
@@ -189,7 +246,7 @@ static int hvmemul_do_io(
         else
         {
             rc = hvm_send_ioreq(s, &p, 0);
-            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
+            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
                 vio->io_req.state = STATE_IOREQ_NONE;
             else if ( data_is_addr )
                 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index ad2edad..746799f 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
         domain_pause(d);
 
+        p2m_destroy_ioreq_server(d, s);
+
         hvm_ioreq_server_disable(s, 0);
 
         list_del(&s->list_entry);
@@ -914,6 +916,42 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    /* For now, only HVMMEM_ioreq_server is supported. */
+    if ( type != HVMMEM_ioreq_server )
+        return -EINVAL;
+
+    /* For now, only write emulation is supported. */
+    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            rc = p2m_set_ioreq_server(d, flags, s);
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled)
 {
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 162afed..408ea7f 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
     if ( *p2mt == p2m_mmio_direct )
         goto direct_mmio_out;
     rc = NESTEDHVM_PAGEFAULT_MMIO;
-    if ( *p2mt == p2m_mmio_dm )
+    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
         goto out;
 
     rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 568944f..cc1eb21 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->r = entry->w = entry->x = 1;
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
+        case p2m_ioreq_server:
+            entry->r = 1;
+            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
+            entry->x = 0;
+            entry->a = !!cpu_has_vmx_ept_ad;
+            entry->d = entry->w && entry->a;
+            break;
         case p2m_mmio_direct:
             entry->r = entry->x = 1;
             entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
@@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
         case p2m_grant_map_ro:
-        case p2m_ioreq_server:
             entry->r = 1;
             entry->w = entry->x = 0;
             entry->a = !!cpu_has_vmx_ept_ad;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 07e2ccd..f6c45ec 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
     PGT_l3_page_table
 };
 
-static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
+static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
+                                       p2m_type_t t,
+                                       mfn_t mfn,
                                        unsigned int level)
 {
     unsigned long flags;
@@ -92,8 +94,12 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
     default:
         return flags | _PAGE_NX_BIT;
     case p2m_grant_map_ro:
-    case p2m_ioreq_server:
         return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
+    case p2m_ioreq_server:
+        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
+        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
+            return flags & ~_PAGE_RW;
+        return flags;
     case p2m_ram_ro:
     case p2m_ram_logdirty:
     case p2m_ram_shared:
@@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
                               ? p2m_ram_logdirty : p2m_ram_rw;
             unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
+            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+                                                    _mfn(mfn), level);
 
             if ( level )
             {
@@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
             ? l3e_from_pfn(mfn_x(mfn),
-                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
+                           p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
 
@@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
-                                             p2m_type_to_flags(p2mt, mfn, 0));
+                                         p2m_type_to_flags(p2m, p2mt, mfn, 0));
         else
             entry_content = l1e_empty();
 
@@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt, mfn, 1) |
+                                       p2m_type_to_flags(p2m, p2mt, mfn, 1) |
                                        _PAGE_PSE);
         else
             l2e_content = l2e_empty();
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index a5651a3..dd4e477 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     else
         p2m_pt_init(p2m);
 
+    spin_lock_init(&p2m->ioreq.lock);
+
     return ret;
 }
 
@@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain *d)
     }
 }
 
+int p2m_set_ioreq_server(struct domain *d,
+                         unsigned int flags,
+                         struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+
+    /*
+     * Use lock to prevent concurrent setting attempts
+     * from multiple ioreq serers.
+     */
+    spin_lock(&p2m->ioreq.lock);
+
+    /* Unmap ioreq server from p2m type by passing flags with 0. */
+    if ( flags == 0 )
+    {
+        rc = -EINVAL;
+        if ( p2m->ioreq.server != s )
+            goto out;
+
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+    else
+    {
+        rc = -EBUSY;
+        if ( p2m->ioreq.server != NULL )
+            goto out;
+
+        p2m->ioreq.server = s;
+        p2m->ioreq.flags = flags;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock(&p2m->ioreq.lock);
+
+    return rc;
+}
+
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&p2m->ioreq.lock);
+
+    s = p2m->ioreq.server;
+    *flags = p2m->ioreq.flags;
+
+    spin_unlock(&p2m->ioreq.lock);
+    return s;
+}
+
+void p2m_destroy_ioreq_server(const struct domain *d,
+                              const struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    spin_lock(&p2m->ioreq.lock);
+
+    if ( p2m->ioreq.server == s )
+    {
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+
+    spin_unlock(&p2m->ioreq.lock);
+}
+
 void p2m_enable_hardware_log_dirty(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 7ea9d81..521b639 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3269,8 +3269,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     /* Need to hand off device-model MMIO to the device model */
-    if ( p2mt == p2m_mmio_dm
-         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
+    if ( p2mt == p2m_mmio_dm )
     {
         gpa = guest_walk_to_gpa(&gw);
         goto mmio;
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index fbf2c74..b43667a 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
 int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
                                          uint32_t type, uint64_t start,
                                          uint64_t end);
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags);
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 470d29d..3786680 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
                        | p2m_to_mask(p2m_ram_paging_out)      \
                        | p2m_to_mask(p2m_ram_paged)           \
                        | p2m_to_mask(p2m_ram_paging_in)       \
-                       | p2m_to_mask(p2m_ram_shared))
+                       | p2m_to_mask(p2m_ram_shared)          \
+                       | p2m_to_mask(p2m_ioreq_server))
 
 /* Types that represent a physmap hole that is ok to replace with a shared
  * entry */
@@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
 #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
                       | p2m_to_mask(p2m_ram_ro)         \
                       | p2m_to_mask(p2m_grant_map_ro)   \
-                      | p2m_to_mask(p2m_ram_shared)     \
-                      | p2m_to_mask(p2m_ioreq_server))
+                      | p2m_to_mask(p2m_ram_shared))
 
 /* Write-discard types, which should discard the write operations */
 #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
@@ -336,6 +336,20 @@ struct p2m_domain {
         struct ept_data ept;
         /* NPT-equivalent structure could be added here. */
     };
+
+     struct {
+         spinlock_t lock;
+         /*
+          * ioreq server who's responsible for the emulation of
+          * gfns with specific p2m type(for now, p2m_ioreq_server).
+          */
+         struct hvm_ioreq_server *server;
+         /*
+          * flags specifies whether read, write or both operations
+          * are to be emulated by an ioreq server.
+          */
+         unsigned int flags;
+     } ioreq;
 };
 
 /* get host p2m table */
@@ -827,6 +841,12 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
     return flags;
 }
 
+int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
+                         struct hvm_ioreq_server *s);
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags);
+void p2m_destroy_ioreq_server(const struct domain *d, const struct hvm_ioreq_server *s);
+
 #endif /* _XEN_ASM_X86_P2M_H */
 
 /*
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index f54cece..2a36833 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
     uint64_aligned_t addr;
 };
 
+/*
+ * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
+ *                                      to specific memroy type <type>
+ *                                      for specific accesses <flags>
+ *
+ * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
+ * which means only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in future.
+ */
+#define XEN_DMOP_map_mem_type_to_ioreq_server 15
+
+struct xen_dm_op_map_mem_type_to_ioreq_server {
+    ioservid_t id;      /* IN - ioreq server id */
+    uint16_t type;      /* IN - memory type */
+    uint32_t flags;     /* IN - types of accesses to be forwarded to the
+                           ioreq server. flags with 0 means to unmap the
+                           ioreq server */
+
+#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0)
+#define XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
+
+    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
+                           has to be set to zero by the caller */
+};
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -336,6 +362,8 @@ struct xen_dm_op {
         struct xen_dm_op_set_mem_type set_mem_type;
         struct xen_dm_op_inject_event inject_event;
         struct xen_dm_op_inject_msi inject_msi;
+        struct xen_dm_op_map_mem_type_to_ioreq_server
+                map_mem_type_to_ioreq_server;
     } u;
 };
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index bc00ef0..0bdafdf 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -93,7 +93,13 @@ typedef enum {
     HVMMEM_unused,             /* Placeholder; setting memory to this type
                                   will fail for code after 4.7.0 */
 #endif
-    HVMMEM_ioreq_server
+    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
+                                  changes to this value are only allowed after
+                                  an ioreq server has claimed its ownership.
+                                  Only pages with HVMMEM_ram_rw are allowed to
+                                  change to this type; conversely, pages with
+                                  this type are only allowed to be changed back
+                                  to HVMMEM_ram_rw. */
 } hvmmem_type_t;
 
 /* Hint from PV drivers for pagetable destruction. */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2017-03-21  2:52 ` [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
  2017-03-21  2:52 ` [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2017-03-21  2:52 ` Yu Zhang
  2017-03-22 14:22   ` Jan Beulich
  2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  4 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-21  2:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

In ept_handle_violation(), write violations are also treated as
read violations. And when a VM is accessing a write-protected
address with read-modify-write instructions, the read emulation
process is triggered first.

For p2m_ioreq_server pages, current ioreq server only forwards
the write operations to the device model. Therefore when such page
is being accessed by a read-modify-write instruction, the read
operations should be emulated here in hypervisor. This patch provides
such a handler to copy the data to the buffer.

Note: MMIOs with p2m_mmio_dm type do not need such special treatment
because both reads and writes will go to the device mode.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

changes in v3: 
  - According to comments from Jan: clarify comments in hvmemul_do_io().

changes in v2: 
  - According to comments from Jan: rename mem_ops to ioreq_server_ops.
  - According to comments from Jan: use hvm_copy_from_guest_phys() in
    ioreq_server_read(), instead of do it by myself.
---
 xen/arch/x86/hvm/emulate.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 37139e6..52c726e 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -94,6 +94,26 @@ static const struct hvm_io_handler null_handler = {
     .ops = &null_ops
 };
 
+static int ioreq_server_read(const struct hvm_io_handler *io_handler,
+                    uint64_t addr,
+                    uint32_t size,
+                    uint64_t *data)
+{
+    if ( hvm_copy_from_guest_phys(data, addr, size) != HVMCOPY_okay )
+        return X86EMUL_UNHANDLEABLE;
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops ioreq_server_ops = {
+    .read = ioreq_server_read,
+    .write = null_write
+};
+
+static const struct hvm_io_handler ioreq_server_handler = {
+    .ops = &ioreq_server_ops
+};
+
 static int hvmemul_do_io(
     bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
@@ -193,6 +213,9 @@ static int hvmemul_do_io(
          *   - If the domain ioreq_server is NULL, assume there is a
          *   race between the unbinding of ioreq server and guest fault
          *   so re-try the instruction.
+         *
+         *   - If the accesss is a read, this could be part of a
+         *   read-modify-write instruction, emulate the read first.
          */
         struct hvm_ioreq_server *s = NULL;
         p2m_type_t p2mt = p2m_invalid;
@@ -226,6 +249,17 @@ static int hvmemul_do_io(
                     vio->io_req.state = STATE_IOREQ_NONE;
                     break;
                 }
+
+                /*
+                 * This is part of a read-modify-write instruction.
+                 * Emulate the read part so we have the value cached.
+                 */
+                if ( dir == IOREQ_READ )
+                {
+                    rc = hvm_process_io_intercept(&ioreq_server_handler, &p);
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
             }
         }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (2 preceding siblings ...)
  2017-03-21  2:52 ` [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
@ 2017-03-21  2:52 ` Yu Zhang
  2017-03-21 10:05   ` Paul Durrant
                     ` (2 more replies)
  2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  4 siblings, 3 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-21  2:52 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Paul Durrant, zhiyuan.lv, Jan Beulich

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

This patch also disallows live migration, when there's still any
outstanding p2m_ioreq_server entry left. The core reason is our
current implementation of p2m_change_entry_type_global() can not
tell the state of p2m_ioreq_server entries(can not decide if an
entry is to be emulated or to be resynced).

Note: new field entry_count is introduced in struct p2m_domain,
to record the number of p2m_ioreq_server p2m page table entries.
One nature of these entries is that they only point to 4K sized
page frames, because all p2m_ioreq_server entries are originated
from p2m_ram_rw ones in p2m_change_type_one(). We do not need to
worry about the counting for 2M/1G sized pages.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

changes in v4: 
  - According to comments from Jan: use ASSERT() instead of 'if'
    condition in p2m_change_type_one().
  - According to comments from Jan: commit message changes to mention
    the p2m_ioreq_server are all based on 4K sized pages.

changes in v3: 
  - Move the synchronously resetting logic into patch 5.
  - According to comments from Jan: introduce p2m_check_changeable()
    to clarify the p2m type change code.
  - According to comments from George: use locks in the same order
    to avoid deadlock, call p2m_change_entry_type_global() after unmap
    of the ioreq server is finished.

changes in v2: 
  - Move the calculation of ioreq server page entry_cout into 
    p2m_change_type_one() so that we do not need a seperate lock.
    Note: entry_count is also calculated in resolve_misconfig()/
    do_recalc(), fortunately callers of both routines have p2m 
    lock protected already.
  - Simplify logic in hvmop_set_mem_type().
  - Introduce routine p2m_finish_type_change() to walk the p2m 
    table and do the p2m reset.
---
 xen/arch/x86/hvm/ioreq.c  |  8 ++++++++
 xen/arch/x86/mm/hap/hap.c |  9 +++++++++
 xen/arch/x86/mm/p2m-ept.c |  8 +++++++-
 xen/arch/x86/mm/p2m-pt.c  | 13 +++++++++++--
 xen/arch/x86/mm/p2m.c     | 20 ++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  9 ++++++++-
 6 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 746799f..102c6c2 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
 
     spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
 
+    if ( rc == 0 && flags == 0 )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..6ec950a 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
  */
 static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /*
+     * Refuse to turn on global log-dirty mode if
+     * there's outstanding p2m_ioreq_server pages.
+     */
+    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+        return -EBUSY;
+
     /* turn on PG_log_dirty bit in paging mode */
     paging_lock(d);
     d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..1df3d09 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                     e.ipat = ipat;
                     if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
                     {
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                         {
+                             p2m->ioreq.entry_count--;
+                             ASSERT(p2m->ioreq.entry_count >= 0);
+                         }
+
                          e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
                                      ? p2m_ram_logdirty : p2m_ram_rw;
                          ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
@@ -965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
     if ( is_epte_valid(ept_entry) )
     {
         if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
+             p2m_check_changeable(ept_entry->sa_p2mt) )
             *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
                                                       : p2m_ram_rw;
         else
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index f6c45ec..169de75 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -436,11 +436,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
          needs_recalc(l1, *pent) )
     {
         l1_pgentry_t e = *pent;
+        p2m_type_t p2mt_old;
 
         if ( !valid_recalc(l1, e) )
             P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                       p2m->domain->domain_id, gfn, level);
-        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
+        p2mt_old = p2m_flags_to_type(l1e_get_flags(e));
+        if ( p2m_is_changeable(p2mt_old) )
         {
             unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
@@ -460,6 +462,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                      mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
+
+            if ( p2mt_old == p2m_ioreq_server )
+            {
+                p2m->ioreq.entry_count--;
+                ASSERT(p2m->ioreq.entry_count >= 0);
+            }
+
             e = l1e_from_pfn(mfn, flags);
             p2m_add_iommu_flags(&e, level,
                                 (p2mt == p2m_ram_rw)
@@ -729,7 +738,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
                                      struct p2m_domain *p2m, unsigned long gfn)
 {
-    if ( !recalc || !p2m_is_changeable(t) )
+    if ( !recalc || !p2m_check_changeable(t) )
         return t;
     return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
                                                 : p2m_ram_rw;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index dd4e477..e3e54f1 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -954,6 +954,26 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
                          p2m->default_access)
          : -EBUSY;
 
+    if ( !rc )
+    {
+        switch ( nt )
+        {
+        case p2m_ram_rw:
+            if ( ot == p2m_ioreq_server )
+            {
+                p2m->ioreq.entry_count--;
+                ASSERT(p2m->ioreq.entry_count >= 0);
+            }
+            break;
+        case p2m_ioreq_server:
+            ASSERT(ot == p2m_ram_rw);
+            p2m->ioreq.entry_count++;
+            break;
+        default:
+            break;
+        }
+    }
+
     gfn_unlock(p2m, gfn, 0);
 
     return rc;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 3786680..395f125 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
 
 /* Types that can be subject to bulk transitions. */
 #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
-                              | p2m_to_mask(p2m_ram_logdirty) )
+                              | p2m_to_mask(p2m_ram_logdirty) \
+                              | p2m_to_mask(p2m_ioreq_server) )
+
+#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
 
 #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
 
@@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
 #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
 #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
+#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
 #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
 #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
 /* Grant types are *not* considered valid, because they can be
@@ -178,6 +182,8 @@ typedef unsigned int p2m_query_t;
 
 #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) & P2M_INVALID_MFN_TYPES)
 
+#define p2m_check_changeable(t) (p2m_is_changeable(t) && !p2m_is_ioreq(t))
+
 typedef enum {
     p2m_host,
     p2m_nested,
@@ -349,6 +355,7 @@ struct p2m_domain {
           * are to be emulated by an ioreq server.
           */
          unsigned int flags;
+         long entry_count;
      } ioreq;
 };
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (3 preceding siblings ...)
  2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
@ 2017-03-21  2:52 ` Yu Zhang
  2017-03-21 10:00   ` Paul Durrant
                     ` (2 more replies)
  4 siblings, 3 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-21  2:52 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
synchronously by iterating the p2m table.

The synchronous resetting is necessary because we need to guarantee
the p2m table is clean before another ioreq server is mapped. And
since the sweeping of p2m table could be time consuming, it is done
with hypercall continuation.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

changes in v2: 
  - According to comments from Jan and Andrew: do not use the
    HVMOP type hypercall continuation method. Instead, adding
    an opaque in xen_dm_op_map_mem_type_to_ioreq_server to
    store the gfn.
  - According to comments from Jan: change routine's comments
    and name of parameters of p2m_finish_type_change().

changes in v1: 
  - This patch is splitted from patch 4 of last version.
  - According to comments from Jan: update the gfn_start for 
    when use hypercall continuation to reset the p2m type.
  - According to comments from Jan: use min() to compare gfn_end
    and max mapped pfn in p2m_finish_type_change()
---
 xen/arch/x86/hvm/dm.c     | 41 ++++++++++++++++++++++++++++++++++++++---
 xen/arch/x86/mm/p2m.c     | 29 +++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  7 +++++++
 3 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 3f9484d..a24d0f8 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
 
     case XEN_DMOP_map_mem_type_to_ioreq_server:
     {
-        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
+        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
             &op.u.map_mem_type_to_ioreq_server;
+        unsigned long first_gfn = data->opaque;
+        unsigned long last_gfn;
+
+        const_op = false;
 
         rc = -EOPNOTSUPP;
         /* Only support for HAP enabled hvm. */
         if ( !hap_enabled(d) )
             break;
 
-        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
-                                              data->type, data->flags);
+        if ( first_gfn == 0 )
+            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                                  data->type, data->flags);
+        /*
+         * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
+         * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
+         */
+        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
+        {
+            struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+            while ( read_atomic(&p2m->ioreq.entry_count) &&
+                    first_gfn <= p2m->max_mapped_pfn )
+            {
+                /* Iterate p2m table for 256 gfns each time. */
+                last_gfn = first_gfn + 0xff;
+
+                p2m_finish_type_change(d, first_gfn, last_gfn,
+                                       p2m_ioreq_server, p2m_ram_rw);
+
+                first_gfn = last_gfn + 1;
+
+                /* Check for continuation if it's not the last iteration. */
+                if ( first_gfn <= p2m->max_mapped_pfn &&
+                     hypercall_preempt_check() )
+                {
+                    rc = -ERESTART;
+                    data->opaque = first_gfn;
+                    break;
+                }
+            }
+        }
+
         break;
     }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e3e54f1..0a2f276 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
+void p2m_finish_type_change(struct domain *d,
+                            unsigned long first_gfn, unsigned long last_gfn,
+                            p2m_type_t ot, p2m_type_t nt)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    p2m_type_t t;
+    unsigned long gfn = first_gfn;
+
+    ASSERT(first_gfn <= last_gfn);
+    ASSERT(ot != nt);
+    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
+
+    p2m_lock(p2m);
+
+    last_gfn = min(last_gfn, p2m->max_mapped_pfn);
+    while ( gfn <= last_gfn )
+    {
+        get_gfn_query_unlocked(d, gfn, &t);
+
+        if ( t == ot )
+            p2m_change_type_one(d, gfn, t, nt);
+
+        gfn++;
+    }
+
+    p2m_unlock(p2m);
+}
+
 /*
  * Returns:
  *    0              for success
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 395f125..3d665e8 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -611,6 +611,13 @@ void p2m_change_type_range(struct domain *d,
 int p2m_change_type_one(struct domain *d, unsigned long gfn,
                         p2m_type_t ot, p2m_type_t nt);
 
+/* Synchronously change the p2m type for a range of gfns:
+ * [first_gfn ... last_gfn]. */
+void p2m_finish_type_change(struct domain *d,
+                            unsigned long first_gfn,
+                            unsigned long last_gfn,
+                            p2m_type_t ot, p2m_type_t nt);
+
 /* Report a change affecting memory types. */
 void p2m_memory_type_changed(struct domain *d);
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
@ 2017-03-21 10:00   ` Paul Durrant
  2017-03-21 11:15     ` Yu Zhang
  2017-03-22  8:28   ` Tian, Kevin
  2017-03-22 14:39   ` Jan Beulich
  2 siblings, 1 reply; 42+ messages in thread
From: Paul Durrant @ 2017-03-21 10:00 UTC (permalink / raw)
  To: 'Yu Zhang', xen-devel
  Cc: Andrew Cooper, zhiyuan.lv, Jan Beulich, George Dunlap

> -----Original Message-----
> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 21 March 2017 02:53
> To: xen-devel@lists.xen.org
> Cc: zhiyuan.lv@intel.com; Paul Durrant <Paul.Durrant@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>
> Subject: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding
> p2m_ioreq_server entries when an ioreq server unmaps.
> 
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> synchronously by iterating the p2m table.
> 
> The synchronous resetting is necessary because we need to guarantee
> the p2m table is clean before another ioreq server is mapped. And
> since the sweeping of p2m table could be time consuming, it is done
> with hypercall continuation.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> 
> changes in v2:
>   - According to comments from Jan and Andrew: do not use the
>     HVMOP type hypercall continuation method. Instead, adding
>     an opaque in xen_dm_op_map_mem_type_to_ioreq_server to
>     store the gfn.
>   - According to comments from Jan: change routine's comments
>     and name of parameters of p2m_finish_type_change().
> 
> changes in v1:
>   - This patch is splitted from patch 4 of last version.
>   - According to comments from Jan: update the gfn_start for
>     when use hypercall continuation to reset the p2m type.
>   - According to comments from Jan: use min() to compare gfn_end
>     and max mapped pfn in p2m_finish_type_change()
> ---
>  xen/arch/x86/hvm/dm.c     | 41
> ++++++++++++++++++++++++++++++++++++++---
>  xen/arch/x86/mm/p2m.c     | 29 +++++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h |  7 +++++++
>  3 files changed, 74 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
> index 3f9484d..a24d0f8 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
> 
>      case XEN_DMOP_map_mem_type_to_ioreq_server:
>      {
> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>              &op.u.map_mem_type_to_ioreq_server;
> +        unsigned long first_gfn = data->opaque;
> +        unsigned long last_gfn;
> +
> +        const_op = false;
> 
>          rc = -EOPNOTSUPP;
>          /* Only support for HAP enabled hvm. */
>          if ( !hap_enabled(d) )
>              break;
> 
> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> -                                              data->type, data->flags);
> +        if ( first_gfn == 0 )
> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> +                                                  data->type, data->flags);
> +        /*
> +         * Iterate p2m table when an ioreq server unmaps from
> p2m_ioreq_server,
> +         * and reset the remaining p2m_ioreq_server entries back to
> p2m_ram_rw.
> +         */
> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
> +        {
> +            struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +            while ( read_atomic(&p2m->ioreq.entry_count) &&
> +                    first_gfn <= p2m->max_mapped_pfn )
> +            {
> +                /* Iterate p2m table for 256 gfns each time. */
> +                last_gfn = first_gfn + 0xff;
> +

Might be worth a comment here to sat that p2m_finish_type_change() limits last_gfn appropriately because it kind of looks wrong to be blindly calling it with first_gfn + 0xff. Or perhaps, rather than passing last_gfn, pass a 'max_nr' parameter of 256 instead. Then you can drop last_gfn altogether. If you prefer the parameters as they are then at least limit the scope of last_gfn to this while loop.

> +                p2m_finish_type_change(d, first_gfn, last_gfn,
> +                                       p2m_ioreq_server, p2m_ram_rw);
> +
> +                first_gfn = last_gfn + 1;
> +
> +                /* Check for continuation if it's not the last iteration. */
> +                if ( first_gfn <= p2m->max_mapped_pfn &&
> +                     hypercall_preempt_check() )
> +                {
> +                    rc = -ERESTART;
> +                    data->opaque = first_gfn;
> +                    break;
> +                }
> +            }
> +        }
> +
>          break;
>      }
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index e3e54f1..0a2f276 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>      p2m_unlock(p2m);
>  }
> 
> +/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
> +void p2m_finish_type_change(struct domain *d,

As I said above, consider a 'max_nr' parameter here rather than last_gfn.

  Paul

> +                            unsigned long first_gfn, unsigned long last_gfn,
> +                            p2m_type_t ot, p2m_type_t nt)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    p2m_type_t t;
> +    unsigned long gfn = first_gfn;
> +
> +    ASSERT(first_gfn <= last_gfn);
> +    ASSERT(ot != nt);
> +    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
> +
> +    p2m_lock(p2m);
> +
> +    last_gfn = min(last_gfn, p2m->max_mapped_pfn);
> +    while ( gfn <= last_gfn )
> +    {
> +        get_gfn_query_unlocked(d, gfn, &t);
> +
> +        if ( t == ot )
> +            p2m_change_type_one(d, gfn, t, nt);
> +
> +        gfn++;
> +    }
> +
> +    p2m_unlock(p2m);
> +}
> +
>  /*
>   * Returns:
>   *    0              for success
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 395f125..3d665e8 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -611,6 +611,13 @@ void p2m_change_type_range(struct domain *d,
>  int p2m_change_type_one(struct domain *d, unsigned long gfn,
>                          p2m_type_t ot, p2m_type_t nt);
> 
> +/* Synchronously change the p2m type for a range of gfns:
> + * [first_gfn ... last_gfn]. */
> +void p2m_finish_type_change(struct domain *d,
> +                            unsigned long first_gfn,
> +                            unsigned long last_gfn,
> +                            p2m_type_t ot, p2m_type_t nt);
> +
>  /* Report a change affecting memory types. */
>  void p2m_memory_type_changed(struct domain *d);
> 
> --
> 1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
@ 2017-03-21 10:05   ` Paul Durrant
  2017-03-22  8:10   ` Tian, Kevin
  2017-03-22 14:29   ` Jan Beulich
  2 siblings, 0 replies; 42+ messages in thread
From: Paul Durrant @ 2017-03-21 10:05 UTC (permalink / raw)
  To: 'Yu Zhang', xen-devel
  Cc: Kevin Tian, Jan Beulich, Andrew Cooper, George Dunlap,
	zhiyuan.lv, Jun Nakajima

> -----Original Message-----
> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 21 March 2017 02:53
> To: xen-devel@lists.xen.org
> Cc: zhiyuan.lv@intel.com; Paul Durrant <Paul.Durrant@citrix.com>; Jan
> Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>
> Subject: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding
> p2m_ioreq_server entries.
> 
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global()
> interface.
> 
> This patch also disallows live migration, when there's still any
> outstanding p2m_ioreq_server entry left. The core reason is our
> current implementation of p2m_change_entry_type_global() can not
> tell the state of p2m_ioreq_server entries(can not decide if an
> entry is to be emulated or to be resynced).
> 
> Note: new field entry_count is introduced in struct p2m_domain,
> to record the number of p2m_ioreq_server p2m page table entries.
> One nature of these entries is that they only point to 4K sized
> page frames, because all p2m_ioreq_server entries are originated
> from p2m_ram_rw ones in p2m_change_type_one(). We do not need to
> worry about the counting for 2M/1G sized pages.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> changes in v4:
>   - According to comments from Jan: use ASSERT() instead of 'if'
>     condition in p2m_change_type_one().
>   - According to comments from Jan: commit message changes to mention
>     the p2m_ioreq_server are all based on 4K sized pages.
> 
> changes in v3:
>   - Move the synchronously resetting logic into patch 5.
>   - According to comments from Jan: introduce p2m_check_changeable()
>     to clarify the p2m type change code.
>   - According to comments from George: use locks in the same order
>     to avoid deadlock, call p2m_change_entry_type_global() after unmap
>     of the ioreq server is finished.
> 
> changes in v2:
>   - Move the calculation of ioreq server page entry_cout into
>     p2m_change_type_one() so that we do not need a seperate lock.
>     Note: entry_count is also calculated in resolve_misconfig()/
>     do_recalc(), fortunately callers of both routines have p2m
>     lock protected already.
>   - Simplify logic in hvmop_set_mem_type().
>   - Introduce routine p2m_finish_type_change() to walk the p2m
>     table and do the p2m reset.
> ---
>  xen/arch/x86/hvm/ioreq.c  |  8 ++++++++
>  xen/arch/x86/mm/hap/hap.c |  9 +++++++++
>  xen/arch/x86/mm/p2m-ept.c |  8 +++++++-
>  xen/arch/x86/mm/p2m-pt.c  | 13 +++++++++++--
>  xen/arch/x86/mm/p2m.c     | 20 ++++++++++++++++++++
>  xen/include/asm-x86/p2m.h |  9 ++++++++-
>  6 files changed, 63 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 746799f..102c6c2 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct
> domain *d, ioservid_t id,
> 
>      spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> 
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server,
> p2m_ram_rw);
> +    }
> +
>      return rc;
>  }
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index a57b385..6ec950a 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -187,6 +187,15 @@ out:
>   */
>  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>  {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    /*
> +     * Refuse to turn on global log-dirty mode if
> +     * there's outstanding p2m_ioreq_server pages.
> +     */
> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
> +        return -EBUSY;
> +
>      /* turn on PG_log_dirty bit in paging mode */
>      paging_lock(d);
>      d->arch.paging.mode |= PG_log_dirty;
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index cc1eb21..1df3d09 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
> *p2m, unsigned long gfn)
>                      e.ipat = ipat;
>                      if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>                      {
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             p2m->ioreq.entry_count--;
> +                             ASSERT(p2m->ioreq.entry_count >= 0);
> +                         }
> +
>                           e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>                                       ? p2m_ram_logdirty : p2m_ram_rw;
>                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
> @@ -965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_domain
> *p2m,
>      if ( is_epte_valid(ept_entry) )
>      {
>          if ( (recalc || ept_entry->recalc) &&
> -             p2m_is_changeable(ept_entry->sa_p2mt) )
> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>              *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
>                                                        : p2m_ram_rw;
>          else
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index f6c45ec..169de75 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -436,11 +436,13 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>           needs_recalc(l1, *pent) )
>      {
>          l1_pgentry_t e = *pent;
> +        p2m_type_t p2mt_old;
> 
>          if ( !valid_recalc(l1, e) )
>              P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
>                        p2m->domain->domain_id, gfn, level);
> -        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(e));
> +        if ( p2m_is_changeable(p2mt_old) )
>          {
>              unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
>              p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn |
> ~mask)
> @@ -460,6 +462,13 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>                       mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
>                  flags |= _PAGE_PSE;
>              }
> +
> +            if ( p2mt_old == p2m_ioreq_server )
> +            {
> +                p2m->ioreq.entry_count--;
> +                ASSERT(p2m->ioreq.entry_count >= 0);
> +            }
> +
>              e = l1e_from_pfn(mfn, flags);
>              p2m_add_iommu_flags(&e, level,
>                                  (p2mt == p2m_ram_rw)
> @@ -729,7 +738,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> unsigned long gfn, mfn_t mfn,
>  static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
>                                       struct p2m_domain *p2m, unsigned long gfn)
>  {
> -    if ( !recalc || !p2m_is_changeable(t) )
> +    if ( !recalc || !p2m_check_changeable(t) )
>          return t;
>      return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
>                                                  : p2m_ram_rw;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index dd4e477..e3e54f1 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -954,6 +954,26 @@ int p2m_change_type_one(struct domain *d,
> unsigned long gfn,
>                           p2m->default_access)
>           : -EBUSY;
> 
> +    if ( !rc )
> +    {
> +        switch ( nt )
> +        {
> +        case p2m_ram_rw:
> +            if ( ot == p2m_ioreq_server )
> +            {
> +                p2m->ioreq.entry_count--;
> +                ASSERT(p2m->ioreq.entry_count >= 0);
> +            }
> +            break;
> +        case p2m_ioreq_server:
> +            ASSERT(ot == p2m_ram_rw);
> +            p2m->ioreq.entry_count++;
> +            break;
> +        default:
> +            break;
> +        }
> +    }
> +
>      gfn_unlock(p2m, gfn, 0);
> 
>      return rc;
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 3786680..395f125 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
> 
>  /* Types that can be subject to bulk transitions. */
>  #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
> -                              | p2m_to_mask(p2m_ram_logdirty) )
> +                              | p2m_to_mask(p2m_ram_logdirty) \
> +                              | p2m_to_mask(p2m_ioreq_server) )
> +
> +#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
> 
>  #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
> 
> @@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
>  #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
>  #define p2m_is_discard_write(_t) (p2m_to_mask(_t) &
> P2M_DISCARD_WRITE_TYPES)
>  #define p2m_is_changeable(_t) (p2m_to_mask(_t) &
> P2M_CHANGEABLE_TYPES)
> +#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
>  #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
>  #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
>  /* Grant types are *not* considered valid, because they can be
> @@ -178,6 +182,8 @@ typedef unsigned int p2m_query_t;
> 
>  #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) &
> P2M_INVALID_MFN_TYPES)
> 
> +#define p2m_check_changeable(t) (p2m_is_changeable(t) &&
> !p2m_is_ioreq(t))
> +
>  typedef enum {
>      p2m_host,
>      p2m_nested,
> @@ -349,6 +355,7 @@ struct p2m_domain {
>            * are to be emulated by an ioreq server.
>            */
>           unsigned int flags;
> +         long entry_count;
>       } ioreq;
>  };
> 
> --
> 1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21 10:00   ` Paul Durrant
@ 2017-03-21 11:15     ` Yu Zhang
  2017-03-21 13:49       ` Paul Durrant
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-21 11:15 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Andrew Cooper, zhiyuan.lv, Jan Beulich, George Dunlap



On 3/21/2017 6:00 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: 21 March 2017 02:53
>> To: xen-devel@lists.xen.org
>> Cc: zhiyuan.lv@intel.com; Paul Durrant <Paul.Durrant@citrix.com>; Jan
>> Beulich <jbeulich@suse.com>; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; George Dunlap
>> <George.Dunlap@citrix.com>
>> Subject: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding
>> p2m_ioreq_server entries when an ioreq server unmaps.
>>
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> synchronously by iterating the p2m table.
>>
>> The synchronous resetting is necessary because we need to guarantee
>> the p2m table is clean before another ioreq server is mapped. And
>> since the sweeping of p2m table could be time consuming, it is done
>> with hypercall continuation.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>> Cc: Paul Durrant <paul.durrant@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> changes in v2:
>>    - According to comments from Jan and Andrew: do not use the
>>      HVMOP type hypercall continuation method. Instead, adding
>>      an opaque in xen_dm_op_map_mem_type_to_ioreq_server to
>>      store the gfn.
>>    - According to comments from Jan: change routine's comments
>>      and name of parameters of p2m_finish_type_change().
>>
>> changes in v1:
>>    - This patch is splitted from patch 4 of last version.
>>    - According to comments from Jan: update the gfn_start for
>>      when use hypercall continuation to reset the p2m type.
>>    - According to comments from Jan: use min() to compare gfn_end
>>      and max mapped pfn in p2m_finish_type_change()
>> ---
>>   xen/arch/x86/hvm/dm.c     | 41
>> ++++++++++++++++++++++++++++++++++++++---
>>   xen/arch/x86/mm/p2m.c     | 29 +++++++++++++++++++++++++++++
>>   xen/include/asm-x86/p2m.h |  7 +++++++
>>   3 files changed, 74 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
>> index 3f9484d..a24d0f8 100644
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
>>
>>       case XEN_DMOP_map_mem_type_to_ioreq_server:
>>       {
>> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>               &op.u.map_mem_type_to_ioreq_server;
>> +        unsigned long first_gfn = data->opaque;
>> +        unsigned long last_gfn;
>> +
>> +        const_op = false;
>>
>>           rc = -EOPNOTSUPP;
>>           /* Only support for HAP enabled hvm. */
>>           if ( !hap_enabled(d) )
>>               break;
>>
>> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> -                                              data->type, data->flags);
>> +        if ( first_gfn == 0 )
>> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> +                                                  data->type, data->flags);
>> +        /*
>> +         * Iterate p2m table when an ioreq server unmaps from
>> p2m_ioreq_server,
>> +         * and reset the remaining p2m_ioreq_server entries back to
>> p2m_ram_rw.
>> +         */
>> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
>> +        {
>> +            struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +            while ( read_atomic(&p2m->ioreq.entry_count) &&
>> +                    first_gfn <= p2m->max_mapped_pfn )
>> +            {
>> +                /* Iterate p2m table for 256 gfns each time. */
>> +                last_gfn = first_gfn + 0xff;
>> +
> Might be worth a comment here to sat that p2m_finish_type_change() limits last_gfn appropriately because it kind of looks wrong to be blindly calling it with first_gfn + 0xff. Or perhaps, rather than passing last_gfn, pass a 'max_nr' parameter of 256 instead. Then you can drop last_gfn altogether. If you prefer the parameters as they are then at least limit the scope of last_gfn to this while loop.
Thanks for your comments, Paul. :)
Well, setting last_gfn with first_gfn+0xff looks a bit awkward. But why 
using a 'max_nr' with a magic number, say 256, looks better? Or any 
other benefits? :-)

Yu
>
>> +                p2m_finish_type_change(d, first_gfn, last_gfn,
>> +                                       p2m_ioreq_server, p2m_ram_rw);
>> +
>> +                first_gfn = last_gfn + 1;
>> +
>> +                /* Check for continuation if it's not the last iteration. */
>> +                if ( first_gfn <= p2m->max_mapped_pfn &&
>> +                     hypercall_preempt_check() )
>> +                {
>> +                    rc = -ERESTART;
>> +                    data->opaque = first_gfn;
>> +                    break;
>> +                }
>> +            }
>> +        }
>> +
>>           break;
>>       }
>>
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>> index e3e54f1..0a2f276 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>>       p2m_unlock(p2m);
>>   }
>>
>> +/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
>> +void p2m_finish_type_change(struct domain *d,
> As I said above, consider a 'max_nr' parameter here rather than last_gfn.
>
>    Paul
>
>> +                            unsigned long first_gfn, unsigned long last_gfn,
>> +                            p2m_type_t ot, p2m_type_t nt)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    p2m_type_t t;
>> +    unsigned long gfn = first_gfn;
>> +
>> +    ASSERT(first_gfn <= last_gfn);
>> +    ASSERT(ot != nt);
>> +    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
>> +
>> +    p2m_lock(p2m);
>> +
>> +    last_gfn = min(last_gfn, p2m->max_mapped_pfn);
>> +    while ( gfn <= last_gfn )
>> +    {
>> +        get_gfn_query_unlocked(d, gfn, &t);
>> +
>> +        if ( t == ot )
>> +            p2m_change_type_one(d, gfn, t, nt);
>> +
>> +        gfn++;
>> +    }
>> +
>> +    p2m_unlock(p2m);
>> +}
>> +
>>   /*
>>    * Returns:
>>    *    0              for success
>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>> index 395f125..3d665e8 100644
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -611,6 +611,13 @@ void p2m_change_type_range(struct domain *d,
>>   int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>                           p2m_type_t ot, p2m_type_t nt);
>>
>> +/* Synchronously change the p2m type for a range of gfns:
>> + * [first_gfn ... last_gfn]. */
>> +void p2m_finish_type_change(struct domain *d,
>> +                            unsigned long first_gfn,
>> +                            unsigned long last_gfn,
>> +                            p2m_type_t ot, p2m_type_t nt);
>> +
>>   /* Report a change affecting memory types. */
>>   void p2m_memory_type_changed(struct domain *d);
>>
>> --
>> 1.9.1
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21 11:15     ` Yu Zhang
@ 2017-03-21 13:49       ` Paul Durrant
  2017-03-21 14:14         ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Durrant @ 2017-03-21 13:49 UTC (permalink / raw)
  To: 'Yu Zhang', xen-devel
  Cc: Andrew Cooper, zhiyuan.lv, Jan Beulich, George Dunlap

> -----Original Message-----
[snip]
> >> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
> >> +        {
> >> +            struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +
> >> +            while ( read_atomic(&p2m->ioreq.entry_count) &&
> >> +                    first_gfn <= p2m->max_mapped_pfn )
> >> +            {
> >> +                /* Iterate p2m table for 256 gfns each time. */
> >> +                last_gfn = first_gfn + 0xff;
> >> +
> > Might be worth a comment here to sat that p2m_finish_type_change()
> limits last_gfn appropriately because it kind of looks wrong to be blindly
> calling it with first_gfn + 0xff. Or perhaps, rather than passing last_gfn, pass a
> 'max_nr' parameter of 256 instead. Then you can drop last_gfn altogether. If
> you prefer the parameters as they are then at least limit the scope of
> last_gfn to this while loop.
> Thanks for your comments, Paul. :)
> Well, setting last_gfn with first_gfn+0xff looks a bit awkward. But why
> using a 'max_nr' with a magic number, say 256, looks better? Or any
> other benefits? :-)
> 

Well, to my eyes calling it max_nr in the function would make it clear it's a limit rather than a definite count and then passing 256 in the call would make it clear that it is the chosen batch size.

Does that make sense?

  Paul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21 13:49       ` Paul Durrant
@ 2017-03-21 14:14         ` Yu Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-21 14:14 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Andrew Cooper, zhiyuan.lv, Jan Beulich, George Dunlap



On 3/21/2017 9:49 PM, Paul Durrant wrote:
>> -----Original Message-----
> [snip]
>>>> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
>>>> +        {
>>>> +            struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +
>>>> +            while ( read_atomic(&p2m->ioreq.entry_count) &&
>>>> +                    first_gfn <= p2m->max_mapped_pfn )
>>>> +            {
>>>> +                /* Iterate p2m table for 256 gfns each time. */
>>>> +                last_gfn = first_gfn + 0xff;
>>>> +
>>> Might be worth a comment here to sat that p2m_finish_type_change()
>> limits last_gfn appropriately because it kind of looks wrong to be blindly
>> calling it with first_gfn + 0xff. Or perhaps, rather than passing last_gfn, pass a
>> 'max_nr' parameter of 256 instead. Then you can drop last_gfn altogether. If
>> you prefer the parameters as they are then at least limit the scope of
>> last_gfn to this while loop.
>> Thanks for your comments, Paul. :)
>> Well, setting last_gfn with first_gfn+0xff looks a bit awkward. But why
>> using a 'max_nr' with a magic number, say 256, looks better? Or any
>> other benefits? :-)
>>
> Well, to my eyes calling it max_nr in the function would make it clear it's a limit rather than a definite count and then passing 256 in the call would make it clear that it is the chosen batch size.
>
> Does that make sense?

Sounds reasonable. Thanks! :-)
Yu
>    Paul
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-21  2:52 ` [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2017-03-22  7:49   ` Tian, Kevin
  2017-03-22 10:12     ` Yu Zhang
  2017-03-22 14:21   ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2017-03-22  7:49 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Tim Deegan,
	Paul Durrant, Lv, Zhiyuan, Jan Beulich

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Tuesday, March 21, 2017 10:53 AM
> 
> A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to let
> one ioreq server claim/disclaim its responsibility for the handling of guest
> pages with p2m type p2m_ioreq_server. Users of this DMOP can specify
> which kind of operation is supposed to be emulated in a parameter named
> flags. Currently, this DMOP only support the emulation of write operations.
> And it can be further extended to support the emulation of read ones if an
> ioreq server has such requirement in the future.

p2m_ioreq_server was already introduced before. Do you want to
give some background how current state is around that type which
will be helpful about purpose of this patch?

> 
> For now, we only support one ioreq server for this p2m type, so once an
> ioreq server has claimed its ownership, subsequent calls of the
> XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can also
> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
> triggering this new DMOP, with ioreq server id set to the current owner's and
> flags parameter set to 0.
> 
> Note both XEN_DMOP_map_mem_type_to_ioreq_server and
> p2m_ioreq_server are only supported for HVMs with HAP enabled.
> 
> Also note that only after one ioreq server claims its ownership of
> p2m_ioreq_server, will the p2m type change to p2m_ioreq_server be
> allowed.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Acked-by: Tim Deegan <tim@xen.org>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Tim Deegan <tim@xen.org>
> 
> changes in v8:
>   - According to comments from Jan & Paul: comments changes in
> hvmemul_do_io().
>   - According to comments from Jan: remove the redundant code which
> would only
>     be useful for read emulations.
>   - According to comments from Jan: change interface which maps mem type
> to
>     ioreq server, removed uint16_t pad and added an uint64_t opaque.
>   - Address other comments from Jan, i.e. correct return values; remove stray
>     cast.
> 
> changes in v7:
>   - Use new ioreq server interface -
> XEN_DMOP_map_mem_type_to_ioreq_server.
>   - According to comments from George: removed domain_pause/unpause()
> in
>     hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>     and we can avoid the:
>     a> deadlock issue existed in v6 patch, between p2m lock and ioreq server
>        lock by using these locks in the same order - solved in patch 4;
>     b> for race condition between vm exit and ioreq server unbinding, we can
>        just retry this instruction.
>   - According to comments from Jan and George: continue to clarify logic in
>     hvmemul_do_io().
>   - According to comments from Jan: clarify comment in
> p2m_set_ioreq_server().
> 
> changes in v6:
>   - Clarify logic in hvmemul_do_io().
>   - Use recursive lock for ioreq server lock.
>   - Remove debug print when mapping ioreq server.
>   - Clarify code in ept_p2m_type_to_flags() for consistency.
>   - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>   - Add comments for HVMMEM_ioreq_server to note only changes
>     to/from HVMMEM_ram_rw are permitted.
>   - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
>     to avoid the race condition when a vm exit happens on a write-
>     protected page, just to find the ioreq server has been unmapped
>     already.
>   - Introduce a seperate patch to delay the release of p2m
>     lock to avoid the race condition.
>   - Introduce a seperate patch to handle the read-modify-write
>     operations on a write protected page.
> 
> changes in v5:
>   - Simplify logic in hvmemul_do_io().
>   - Use natual width types instead of fixed width types when possible.
>   - Do not grant executable permission for p2m_ioreq_server entries.
>   - Clarify comments and commit message.
>   - Introduce a seperate patch to recalculate the p2m types after
>     the ioreq server unmaps the p2m_ioreq_server.
> 
> changes in v4:
>   - According to Paul's advice, add comments around the definition
>     of HVMMEM_iore_server in hvm_op.h.
>   - According to Wei Liu's comments, change the format of the commit
>     message.
> 
> changes in v3:
>   - Only support write emulation in this patch;
>   - Remove the code to handle race condition in hvmemul_do_io(),
>   - No need to reset the p2m type after an ioreq server has disclaimed
>     its ownership of p2m_ioreq_server;
>   - Only allow p2m type change to p2m_ioreq_server after an ioreq
>     server has claimed its ownership of p2m_ioreq_server;
>   - Only allow p2m type change to p2m_ioreq_server from pages with type
>     p2m_ram_rw, and vice versa;
>   - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
>     instead of enum to specify the memory type;
>   - Function prototype change to p2m_get_ioreq_server();
>   - Coding style changes;
>   - Commit message changes;
>   - Add Tim's Acked-by.
> 
> changes in v2:
>   - Only support HAP enabled HVMs;
>   - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
>     to reset the p2m type, when an ioreq server tries to claim/disclaim
>     its ownership of p2m_ioreq_server;
>   - Comments changes.
> ---
>  xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>  xen/arch/x86/hvm/emulate.c       | 65
> ++++++++++++++++++++++++++++++++---
>  xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>  xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>  xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>  xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>  xen/arch/x86/mm/p2m.c            | 74
> ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/shadow/multi.c   |  3 +-
>  xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>  xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>  xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>  xen/include/public/hvm/hvm_op.h  |  8 ++++-
>  12 files changed, 290 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
> 333c884..3f9484d 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
> 
>  static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)  {
> +    if ( new == p2m_ioreq_server )
> +        return old == p2m_ram_rw;
> +
> +    if ( old == p2m_ioreq_server )
> +        return new == p2m_ram_rw;
> +
>      return p2m_is_ram(old) ||
> -           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
> -           (old == p2m_ioreq_server && new == p2m_ram_rw);
> +           (p2m_is_hole(old) && new == p2m_mmio_dm);
>  }
> 
>  static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@ static int
> set_mem_type(struct domain *d,
>           unlikely(data->mem_type == HVMMEM_unused) )
>          return -EINVAL;
> 
> +    if ( data->mem_type  == HVMMEM_ioreq_server )
> +    {
> +        unsigned int flags;
> +
> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */
> +        if ( !hap_enabled(d) )
> +            return -EOPNOTSUPP;
> +
> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped.
> */
> +        if ( !p2m_get_ioreq_server(d, &flags) )
> +            return -EINVAL;
> +    }
> +
>      while ( iter < data->nr )
>      {
>          unsigned long pfn = data->first_pfn + iter; @@ -365,6 +383,21 @@
> static int dm_op(domid_t domid,
>          break;
>      }
> 
> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
> +    {
> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> +            &op.u.map_mem_type_to_ioreq_server;
> +
> +        rc = -EOPNOTSUPP;
> +        /* Only support for HAP enabled hvm. */

Isn't it obvious from code?

> +        if ( !hap_enabled(d) )
> +            break;
> +
> +        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> +                                              data->type, data->flags);
> +        break;
> +    }
> +
>      case XEN_DMOP_set_ioreq_server_state:
>      {
>          const struct xen_dm_op_set_ioreq_server_state *data = diff --git
> a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index
> f36d7c9..37139e6 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
>      uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)  {
>      struct vcpu *curr = current;
> +    struct domain *currd = curr->domain;
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      ioreq_t p = {
>          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@ -140,7
> +141,7 @@ static int hvmemul_do_io(
>               (p.dir != dir) ||
>               (p.df != df) ||
>               (p.data_is_ptr != data_is_addr) )
> -            domain_crash(curr->domain);
> +            domain_crash(currd);
> 
>          if ( data_is_addr )
>              return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,64 @@ static int
> hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> -        struct hvm_ioreq_server *s =
> -            hvm_select_ioreq_server(curr->domain, &p);
> +        /*
> +         * Xen isn't emulating the instruction internally, so see if
> +         * there's an ioreq server that can handle it. Rules:
> +         *
> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()

why highlights "normal" here? What does a "abnormal" MMIO
mean here? p2m_ioreq_server type?

> +         * to choose the ioreq server by range. If no server is found,
> +         * the access is ignored.
> +         *
> +         * - p2m_ioreq_server accesses are handled by the designated
> +         * ioreq_server for the domain, but there are some corner
> +         * cases:

since only one case is listed, "there is a corner case"

> +         *
> +         *   - If the domain ioreq_server is NULL, assume there is a
> +         *   race between the unbinding of ioreq server and guest fault
> +         *   so re-try the instruction.
> +         */
> +        struct hvm_ioreq_server *s = NULL;
> +        p2m_type_t p2mt = p2m_invalid;
> +
> +        if ( is_mmio )
> +        {
> +            unsigned long gmfn = paddr_to_pfn(addr);
> +
> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
> +
> +            if ( p2mt == p2m_ioreq_server )
> +            {
> +                unsigned int flags;
> +
> +                /*
> +                 * Value of s could be stale, when we lost a race

better describe it in higher level, e.g. just "no ioreq server is
found".

what's the meaning of "lost a race"? shouldn't it mean
"likely we suffer from a race with..."?

> +                 * with dm_op which unmaps p2m_ioreq_server from the
> +                 * ioreq server. Yet there's no cheap way to avoid

again, not talking about specific code, focus on the operation,
e.g. "race with an unmap operation on the ioreq server"

> +                 * this, so device model need to do the check.
> +                 */

How is above comment related to below line?

> +                s = p2m_get_ioreq_server(currd, &flags);
> +
> +                /*
> +                 * If p2mt is ioreq_server but ioreq_server is NULL,

p2mt is definitely ioreq_server within this if condition.

> +                 * we probably lost a race with unbinding of ioreq
> +                 * server, just retry the access.
> +                 */

looks redundant to earlier comment. Or earlier one should
be just removed?

> +                if ( s == NULL )
> +                {
> +                    rc = X86EMUL_RETRY;
> +                    vio->io_req.state = STATE_IOREQ_NONE;
> +                    break;
> +                }
> +            }
> +        }
> +
> +        /*
> +         * Value of s could be stale, when we lost a race with dm_op
> +         * which unmaps this PIO/MMIO address from the ioreq server.
> +         * The device model side need to do the check.
> +         */

another duplicated comment. below code is actually for 'normal'
MMIO case...

> +        if ( !s )
> +            s = hvm_select_ioreq_server(currd, &p);
> 
>          /* If there is no suitable backing DM, just ignore accesses */
>          if ( !s )
> @@ -189,7 +246,7 @@ static int hvmemul_do_io(
>          else
>          {
>              rc = hvm_send_ioreq(s, &p, 0);
> -            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
> +            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
>                  vio->io_req.state = STATE_IOREQ_NONE;
>              else if ( data_is_addr )
>                  rc = X86EMUL_OKAY;
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
> ad2edad..746799f 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d,
> ioservid_t id)
> 
>          domain_pause(d);
> 
> +        p2m_destroy_ioreq_server(d, s);
> +
>          hvm_ioreq_server_disable(s, 0);
> 
>          list_del(&s->list_entry);
> @@ -914,6 +916,42 @@ int
> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>      return rc;
>  }
> 
> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint32_t type, uint32_t flags) {
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    /* For now, only HVMMEM_ioreq_server is supported. */

obvious comment

> +    if ( type != HVMMEM_ioreq_server )
> +        return -EINVAL;
> +
> +    /* For now, only write emulation is supported. */

ditto. 

> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
> +        return -EINVAL;
> +
> +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;

any reason why we cannot let default server to claim this
new type?

> +
> +        if ( s->id == id )
> +        {
> +            rc = p2m_set_ioreq_server(d, flags, s);
> +            break;
> +        }
> +    }
> +
> +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    return rc;
> +}
> +
>  int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>                                 bool_t enabled)  { diff --git
> a/xen/arch/x86/mm/hap/nested_hap.c
> b/xen/arch/x86/mm/hap/nested_hap.c
> index 162afed..408ea7f 100644
> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m,
> paddr_t L1_gpa, paddr_t *L0_gpa,
>      if ( *p2mt == p2m_mmio_direct )
>          goto direct_mmio_out;
>      rc = NESTEDHVM_PAGEFAULT_MMIO;
> -    if ( *p2mt == p2m_mmio_dm )
> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>          goto out;
> 
>      rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 568944f..cc1eb21 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct
> p2m_domain *p2m, ept_entry_t *entry,
>              entry->r = entry->w = entry->x = 1;
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
> +        case p2m_ioreq_server:
> +            entry->r = 1;
> +            entry->w = !(p2m->ioreq.flags &
> XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> +            entry->x = 0;
> +            entry->a = !!cpu_has_vmx_ept_ad;
> +            entry->d = entry->w && entry->a;
> +            break;
>          case p2m_mmio_direct:
>              entry->r = entry->x = 1;
>              entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
> @@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct
> p2m_domain *p2m, ept_entry_t *entry,
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
>          case p2m_grant_map_ro:
> -        case p2m_ioreq_server:
>              entry->r = 1;
>              entry->w = entry->x = 0;
>              entry->a = !!cpu_has_vmx_ept_ad; diff --git
> a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index
> 07e2ccd..f6c45ec 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
>      PGT_l3_page_table
>  };
> 
> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
> +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
> +                                       p2m_type_t t,
> +                                       mfn_t mfn,
>                                         unsigned int level)  {
>      unsigned long flags;
> @@ -92,8 +94,12 @@ static unsigned long p2m_type_to_flags(p2m_type_t t,
> mfn_t mfn,
>      default:
>          return flags | _PAGE_NX_BIT;
>      case p2m_grant_map_ro:
> -    case p2m_ioreq_server:
>          return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> +    case p2m_ioreq_server:
> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
> +            return flags & ~_PAGE_RW;
> +        return flags;
>      case p2m_ram_ro:
>      case p2m_ram_logdirty:
>      case p2m_ram_shared:
> @@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>              p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn |
> ~mask)
>                                ? p2m_ram_logdirty : p2m_ram_rw;
>              unsigned long mfn = l1e_get_pfn(e);
> -            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
> +            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
> +                                                    _mfn(mfn), level);
> 
>              if ( level )
>              {
> @@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> unsigned long gfn, mfn_t mfn,
>          ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>          l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
>              ? l3e_from_pfn(mfn_x(mfn),
> -                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
> +                           p2m_type_to_flags(p2m, p2mt, mfn, 2) |
> + _PAGE_PSE)
>              : l3e_empty();
>          entry_content.l1 = l3e_content.l3;
> 
> @@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> unsigned long gfn, mfn_t mfn,
> 
>          if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
>              entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
> -                                             p2m_type_to_flags(p2mt, mfn, 0));
> +                                         p2m_type_to_flags(p2m, p2mt,
> + mfn, 0));
>          else
>              entry_content = l1e_empty();
> 
> @@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> unsigned long gfn, mfn_t mfn,
>          ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>          if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
>              l2e_content = l2e_from_pfn(mfn_x(mfn),
> -                                       p2m_type_to_flags(p2mt, mfn, 1) |
> +                                       p2m_type_to_flags(p2m, p2mt,
> + mfn, 1) |
>                                         _PAGE_PSE);
>          else
>              l2e_content = l2e_empty();
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> a5651a3..dd4e477 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct
> p2m_domain *p2m)
>      else
>          p2m_pt_init(p2m);
> 
> +    spin_lock_init(&p2m->ioreq.lock);
> +
>      return ret;
>  }
> 
> @@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain *d)
>      }
>  }
> 
> +int p2m_set_ioreq_server(struct domain *d,
> +                         unsigned int flags,
> +                         struct hvm_ioreq_server *s) {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int rc;
> +
> +    /*
> +     * Use lock to prevent concurrent setting attempts
> +     * from multiple ioreq serers.

serers -> servers

> +     */
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    /* Unmap ioreq server from p2m type by passing flags with 0. */
> +    if ( flags == 0 )
> +    {
> +        rc = -EINVAL;
> +        if ( p2m->ioreq.server != s )
> +            goto out;
> +
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +    else
> +    {
> +        rc = -EBUSY;
> +        if ( p2m->ioreq.server != NULL )
> +            goto out;
> +
> +        p2m->ioreq.server = s;
> +        p2m->ioreq.flags = flags;
> +    }
> +
> +    rc = 0;
> +
> + out:
> +    spin_unlock(&p2m->ioreq.lock);
> +
> +    return rc;
> +}
> +
> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> +                                              unsigned int *flags) {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    struct hvm_ioreq_server *s;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    s = p2m->ioreq.server;
> +    *flags = p2m->ioreq.flags;
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +    return s;
> +}
> +
> +void p2m_destroy_ioreq_server(const struct domain *d,
> +                              const struct hvm_ioreq_server *s) {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    if ( p2m->ioreq.server == s )
> +    {
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +}
> +
>  void p2m_enable_hardware_log_dirty(struct domain *d)  {
>      struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git
> a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index 7ea9d81..521b639 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3269,8 +3269,7 @@ static int sh_page_fault(struct vcpu *v,
>      }
> 
>      /* Need to hand off device-model MMIO to the device model */
> -    if ( p2mt == p2m_mmio_dm
> -         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
> +    if ( p2mt == p2m_mmio_dm )
>      {
>          gpa = guest_walk_to_gpa(&gw);
>          goto mmio;
> diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-
> x86/hvm/ioreq.h index fbf2c74..b43667a 100644
> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain
> *d, ioservid_t id,  int hvm_unmap_io_range_from_ioreq_server(struct
> domain *d, ioservid_t id,
>                                           uint32_t type, uint64_t start,
>                                           uint64_t end);
> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint32_t type, uint32_t flags);
>  int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>                                 bool_t enabled);
> 
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index
> 470d29d..3786680 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
>                         | p2m_to_mask(p2m_ram_paging_out)      \
>                         | p2m_to_mask(p2m_ram_paged)           \
>                         | p2m_to_mask(p2m_ram_paging_in)       \
> -                       | p2m_to_mask(p2m_ram_shared))
> +                       | p2m_to_mask(p2m_ram_shared)          \
> +                       | p2m_to_mask(p2m_ioreq_server))
> 
>  /* Types that represent a physmap hole that is ok to replace with a shared
>   * entry */
> @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
>  #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
>                        | p2m_to_mask(p2m_ram_ro)         \
>                        | p2m_to_mask(p2m_grant_map_ro)   \
> -                      | p2m_to_mask(p2m_ram_shared)     \
> -                      | p2m_to_mask(p2m_ioreq_server))
> +                      | p2m_to_mask(p2m_ram_shared))
> 
>  /* Write-discard types, which should discard the write operations */
>  #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
> @@ -336,6 +336,20 @@ struct p2m_domain {
>          struct ept_data ept;
>          /* NPT-equivalent structure could be added here. */
>      };
> +
> +     struct {
> +         spinlock_t lock;
> +         /*
> +          * ioreq server who's responsible for the emulation of
> +          * gfns with specific p2m type(for now, p2m_ioreq_server).
> +          */
> +         struct hvm_ioreq_server *server;
> +         /*
> +          * flags specifies whether read, write or both operations
> +          * are to be emulated by an ioreq server.
> +          */
> +         unsigned int flags;
> +     } ioreq;
>  };
> 
>  /* get host p2m table */
> @@ -827,6 +841,12 @@ static inline unsigned int
> p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>      return flags;
>  }
> 
> +int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
> +                         struct hvm_ioreq_server *s); struct
> +hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> +                                              unsigned int *flags);
> +void p2m_destroy_ioreq_server(const struct domain *d, const struct
> +hvm_ioreq_server *s);
> +
>  #endif /* _XEN_ASM_X86_P2M_H */
> 
>  /*
> diff --git a/xen/include/public/hvm/dm_op.h
> b/xen/include/public/hvm/dm_op.h index f54cece..2a36833 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
>      uint64_aligned_t addr;
>  };
> 
> +/*
> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the
> IOREQ Server <id>
> + *                                      to specific memroy type <type>

memroy->memory

> + *                                      for specific accesses <flags>
> + *
> + * For now, flags only accept the value of
> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
> + * which means only write operations are to be forwarded to an ioreq
> server.
> + * Support for the emulation of read operations can be added when an
> +ioreq
> + * server has such requirement in future.
> + */
> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
> +
> +struct xen_dm_op_map_mem_type_to_ioreq_server {
> +    ioservid_t id;      /* IN - ioreq server id */
> +    uint16_t type;      /* IN - memory type */
> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
> +                           ioreq server. flags with 0 means to unmap the
> +                           ioreq server */
> +
> +#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0) #define
> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
> +
> +    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
> +                           has to be set to zero by the caller */ };
> +
>  struct xen_dm_op {
>      uint32_t op;
>      uint32_t pad;
> @@ -336,6 +362,8 @@ struct xen_dm_op {
>          struct xen_dm_op_set_mem_type set_mem_type;
>          struct xen_dm_op_inject_event inject_event;
>          struct xen_dm_op_inject_msi inject_msi;
> +        struct xen_dm_op_map_mem_type_to_ioreq_server
> +                map_mem_type_to_ioreq_server;
>      } u;
>  };
> 
> diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h index bc00ef0..0bdafdf 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -93,7 +93,13 @@ typedef enum {
>      HVMMEM_unused,             /* Placeholder; setting memory to this type
>                                    will fail for code after 4.7.0 */  #endif
> -    HVMMEM_ioreq_server
> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server;
> type
> +                                  changes to this value are only allowed after
> +                                  an ioreq server has claimed its ownership.
> +                                  Only pages with HVMMEM_ram_rw are allowed to
> +                                  change to this type; conversely, pages with
> +                                  this type are only allowed to be changed back
> +                                  to HVMMEM_ram_rw. */
>  } hvmmem_type_t;
> 
>  /* Hint from PV drivers for pagetable destruction. */
> --
> 1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-03-21 10:05   ` Paul Durrant
@ 2017-03-22  8:10   ` Tian, Kevin
  2017-03-22 10:12     ` Yu Zhang
  2017-03-22 14:29   ` Jan Beulich
  2 siblings, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2017-03-22  8:10 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Paul Durrant, Lv,
	Zhiyuan, Jan Beulich

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Tuesday, March 21, 2017 10:53 AM
> 
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global() interface.
> 
> This patch also disallows live migration, when there's still any outstanding
> p2m_ioreq_server entry left. The core reason is our current implementation
> of p2m_change_entry_type_global() can not tell the state of
> p2m_ioreq_server entries(can not decide if an entry is to be emulated or to
> be resynced).

Don't quite get this point. change_global is triggered only upon
unmap. At that point there is no ioreq server to emulate the
write operations on those entries. All the things required is
just to change the type. What's the exact decision required here?

btw does it mean that live migration can be still supported as long as
device model proactively unmaps write-protected pages before
starting live migration?


> 
> Note: new field entry_count is introduced in struct p2m_domain, to record
> the number of p2m_ioreq_server p2m page table entries.
> One nature of these entries is that they only point to 4K sized page frames,
> because all p2m_ioreq_server entries are originated from p2m_ram_rw ones
> in p2m_change_type_one(). We do not need to worry about the counting for
> 2M/1G sized pages.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> changes in v4:
>   - According to comments from Jan: use ASSERT() instead of 'if'
>     condition in p2m_change_type_one().
>   - According to comments from Jan: commit message changes to mention
>     the p2m_ioreq_server are all based on 4K sized pages.
> 
> changes in v3:
>   - Move the synchronously resetting logic into patch 5.
>   - According to comments from Jan: introduce p2m_check_changeable()
>     to clarify the p2m type change code.
>   - According to comments from George: use locks in the same order
>     to avoid deadlock, call p2m_change_entry_type_global() after unmap
>     of the ioreq server is finished.
> 
> changes in v2:
>   - Move the calculation of ioreq server page entry_cout into
>     p2m_change_type_one() so that we do not need a seperate lock.
>     Note: entry_count is also calculated in resolve_misconfig()/
>     do_recalc(), fortunately callers of both routines have p2m
>     lock protected already.
>   - Simplify logic in hvmop_set_mem_type().
>   - Introduce routine p2m_finish_type_change() to walk the p2m
>     table and do the p2m reset.
> ---
>  xen/arch/x86/hvm/ioreq.c  |  8 ++++++++  xen/arch/x86/mm/hap/hap.c |  9
> +++++++++  xen/arch/x86/mm/p2m-ept.c |  8 +++++++-
> xen/arch/x86/mm/p2m-pt.c  | 13 +++++++++++--
>  xen/arch/x86/mm/p2m.c     | 20 ++++++++++++++++++++
>  xen/include/asm-x86/p2m.h |  9 ++++++++-
>  6 files changed, 63 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
> 746799f..102c6c2 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct
> domain *d, ioservid_t id,
> 
>      spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> 
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> +    }
> +
>      return rc;
>  }
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index a57b385..6ec950a 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -187,6 +187,15 @@ out:
>   */
>  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)  {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    /*
> +     * Refuse to turn on global log-dirty mode if
> +     * there's outstanding p2m_ioreq_server pages.
> +     */
> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
> +        return -EBUSY;

I know this has been discussed before, but didn't remember
the detail reason - why cannot allow log-dirty mode when 
there are still outstanding p2m_ioreq_server pages? Cannot
we mark related page as dirty when forwarding write emulation
request to corresponding ioreq server?

> +
>      /* turn on PG_log_dirty bit in paging mode */
>      paging_lock(d);
>      d->arch.paging.mode |= PG_log_dirty; diff --git a/xen/arch/x86/mm/p2m-
> ept.c b/xen/arch/x86/mm/p2m-ept.c index cc1eb21..1df3d09 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
> *p2m, unsigned long gfn)
>                      e.ipat = ipat;
>                      if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>                      {
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             p2m->ioreq.entry_count--;
> +                             ASSERT(p2m->ioreq.entry_count >= 0);
> +                         }
> +
>                           e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>                                       ? p2m_ram_logdirty : p2m_ram_rw;
>                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access); @@ -
> 965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>      if ( is_epte_valid(ept_entry) )
>      {
>          if ( (recalc || ept_entry->recalc) &&
> -             p2m_is_changeable(ept_entry->sa_p2mt) )
> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>              *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
>                                                        : p2m_ram_rw;
>          else
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index
> f6c45ec..169de75 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -436,11 +436,13 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>           needs_recalc(l1, *pent) )
>      {
>          l1_pgentry_t e = *pent;
> +        p2m_type_t p2mt_old;
> 
>          if ( !valid_recalc(l1, e) )
>              P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
>                        p2m->domain->domain_id, gfn, level);
> -        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(e));
> +        if ( p2m_is_changeable(p2mt_old) )
>          {
>              unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
>              p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn |
> ~mask) @@ -460,6 +462,13 @@ static int do_recalc(struct p2m_domain
> *p2m, unsigned long gfn)
>                       mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
>                  flags |= _PAGE_PSE;
>              }
> +
> +            if ( p2mt_old == p2m_ioreq_server )
> +            {
> +                p2m->ioreq.entry_count--;
> +                ASSERT(p2m->ioreq.entry_count >= 0);
> +            }
> +
>              e = l1e_from_pfn(mfn, flags);
>              p2m_add_iommu_flags(&e, level,
>                                  (p2mt == p2m_ram_rw) @@ -729,7 +738,7 @@
> p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
> static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
>                                       struct p2m_domain *p2m, unsigned long gfn)  {
> -    if ( !recalc || !p2m_is_changeable(t) )
> +    if ( !recalc || !p2m_check_changeable(t) )
>          return t;
>      return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
>                                                  : p2m_ram_rw; diff --git
> a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> dd4e477..e3e54f1 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -954,6 +954,26 @@ int p2m_change_type_one(struct domain *d,
> unsigned long gfn,
>                           p2m->default_access)
>           : -EBUSY;
> 
> +    if ( !rc )
> +    {
> +        switch ( nt )
> +        {
> +        case p2m_ram_rw:
> +            if ( ot == p2m_ioreq_server )
> +            {
> +                p2m->ioreq.entry_count--;
> +                ASSERT(p2m->ioreq.entry_count >= 0);
> +            }
> +            break;
> +        case p2m_ioreq_server:
> +            ASSERT(ot == p2m_ram_rw);
> +            p2m->ioreq.entry_count++;
> +            break;
> +        default:
> +            break;
> +        }
> +    }
> +
>      gfn_unlock(p2m, gfn, 0);
> 
>      return rc;
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index
> 3786680..395f125 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
> 
>  /* Types that can be subject to bulk transitions. */  #define
> P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
> -                              | p2m_to_mask(p2m_ram_logdirty) )
> +                              | p2m_to_mask(p2m_ram_logdirty) \
> +                              | p2m_to_mask(p2m_ioreq_server) )
> +
> +#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
> 
>  #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
> 
> @@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;  #define
> p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)  #define
> p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
> #define p2m_is_changeable(_t) (p2m_to_mask(_t) &
> P2M_CHANGEABLE_TYPES)
> +#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
>  #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)  #define
> p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
>  /* Grant types are *not* considered valid, because they can be @@ -178,6
> +182,8 @@ typedef unsigned int p2m_query_t;
> 
>  #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) &
> P2M_INVALID_MFN_TYPES)
> 
> +#define p2m_check_changeable(t) (p2m_is_changeable(t) &&
> +!p2m_is_ioreq(t))
> +
>  typedef enum {
>      p2m_host,
>      p2m_nested,
> @@ -349,6 +355,7 @@ struct p2m_domain {
>            * are to be emulated by an ioreq server.
>            */
>           unsigned int flags;
> +         long entry_count;
>       } ioreq;
>  };
> 
> --
> 1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  2017-03-21 10:00   ` Paul Durrant
@ 2017-03-22  8:28   ` Tian, Kevin
  2017-03-22  8:54     ` Jan Beulich
  2017-03-22 14:39   ` Jan Beulich
  2 siblings, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2017-03-22  8:28 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, Lv, Zhiyuan, Jan Beulich

> From: Yu Zhang
> Sent: Tuesday, March 21, 2017 10:53 AM
> 
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> synchronously by iterating the p2m table.
> 
> The synchronous resetting is necessary because we need to guarantee the
> p2m table is clean before another ioreq server is mapped. And since the
> sweeping of p2m table could be time consuming, it is done with hypercall
> continuation.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> 
> changes in v2:
>   - According to comments from Jan and Andrew: do not use the
>     HVMOP type hypercall continuation method. Instead, adding
>     an opaque in xen_dm_op_map_mem_type_to_ioreq_server to
>     store the gfn.
>   - According to comments from Jan: change routine's comments
>     and name of parameters of p2m_finish_type_change().
> 
> changes in v1:
>   - This patch is splitted from patch 4 of last version.
>   - According to comments from Jan: update the gfn_start for
>     when use hypercall continuation to reset the p2m type.
>   - According to comments from Jan: use min() to compare gfn_end
>     and max mapped pfn in p2m_finish_type_change()
> ---
>  xen/arch/x86/hvm/dm.c     | 41
> ++++++++++++++++++++++++++++++++++++++---
>  xen/arch/x86/mm/p2m.c     | 29 +++++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h |  7 +++++++
>  3 files changed, 74 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
> 3f9484d..a24d0f8 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
> 
>      case XEN_DMOP_map_mem_type_to_ioreq_server:
>      {
> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>              &op.u.map_mem_type_to_ioreq_server;
> +        unsigned long first_gfn = data->opaque;
> +        unsigned long last_gfn;
> +
> +        const_op = false;
> 
>          rc = -EOPNOTSUPP;
>          /* Only support for HAP enabled hvm. */
>          if ( !hap_enabled(d) )
>              break;
> 
> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> -                                              data->type, data->flags);
> +        if ( first_gfn == 0 )
> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> +                                                  data->type, data->flags);
> +        /*
> +         * Iterate p2m table when an ioreq server unmaps from
> p2m_ioreq_server,
> +         * and reset the remaining p2m_ioreq_server entries back to
> p2m_ram_rw.
> +         */

can you elaborate how device model is expected to use this
new extension, i.e. on deciding first_gfn?

> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
> +        {
> +            struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +            while ( read_atomic(&p2m->ioreq.entry_count) &&
> +                    first_gfn <= p2m->max_mapped_pfn )
> +            {
> +                /* Iterate p2m table for 256 gfns each time. */
> +                last_gfn = first_gfn + 0xff;
> +
> +                p2m_finish_type_change(d, first_gfn, last_gfn,
> +                                       p2m_ioreq_server, p2m_ram_rw);
> +
> +                first_gfn = last_gfn + 1;
> +
> +                /* Check for continuation if it's not the last iteration. */
> +                if ( first_gfn <= p2m->max_mapped_pfn &&
> +                     hypercall_preempt_check() )
> +                {
> +                    rc = -ERESTART;
> +                    data->opaque = first_gfn;
> +                    break;
> +                }
> +            }
> +        }
> +
>          break;
>      }
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> e3e54f1..0a2f276 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>      p2m_unlock(p2m);
>  }
> 
> +/* Synchronously modify the p2m type for a range of gfns from ot to nt.
> +*/ void p2m_finish_type_change(struct domain *d,
> +                            unsigned long first_gfn, unsigned long last_gfn,
> +                            p2m_type_t ot, p2m_type_t nt) {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    p2m_type_t t;
> +    unsigned long gfn = first_gfn;
> +
> +    ASSERT(first_gfn <= last_gfn);
> +    ASSERT(ot != nt);
> +    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
> +
> +    p2m_lock(p2m);
> +
> +    last_gfn = min(last_gfn, p2m->max_mapped_pfn);
> +    while ( gfn <= last_gfn )
> +    {
> +        get_gfn_query_unlocked(d, gfn, &t);
> +
> +        if ( t == ot )
> +            p2m_change_type_one(d, gfn, t, nt);
> +
> +        gfn++;
> +    }
> +
> +    p2m_unlock(p2m);
> +}
> +
>  /*
>   * Returns:
>   *    0              for success
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index
> 395f125..3d665e8 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -611,6 +611,13 @@ void p2m_change_type_range(struct domain *d,
> int p2m_change_type_one(struct domain *d, unsigned long gfn,
>                          p2m_type_t ot, p2m_type_t nt);
> 
> +/* Synchronously change the p2m type for a range of gfns:
> + * [first_gfn ... last_gfn]. */
> +void p2m_finish_type_change(struct domain *d,
> +                            unsigned long first_gfn,
> +                            unsigned long last_gfn,
> +                            p2m_type_t ot, p2m_type_t nt);
> +
>  /* Report a change affecting memory types. */  void
> p2m_memory_type_changed(struct domain *d);
> 
> --
> 1.9.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-22  8:28   ` Tian, Kevin
@ 2017-03-22  8:54     ` Jan Beulich
  2017-03-22  9:02       ` Tian, Kevin
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-22  8:54 UTC (permalink / raw)
  To: Kevin Tian
  Cc: George Dunlap, Andrew Cooper, xen-devel, Paul Durrant, Yu Zhang,
	Zhiyuan Lv

>>> On 22.03.17 at 09:28, <kevin.tian@intel.com> wrote:
>>  From: Yu Zhang
>> Sent: Tuesday, March 21, 2017 10:53 AM
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
>> 
>>      case XEN_DMOP_map_mem_type_to_ioreq_server:
>>      {
>> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>              &op.u.map_mem_type_to_ioreq_server;
>> +        unsigned long first_gfn = data->opaque;
>> +        unsigned long last_gfn;
>> +
>> +        const_op = false;
>> 
>>          rc = -EOPNOTSUPP;
>>          /* Only support for HAP enabled hvm. */
>>          if ( !hap_enabled(d) )
>>              break;
>> 
>> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> -                                              data->type, data->flags);
>> +        if ( first_gfn == 0 )
>> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> +                                                  data->type, data->flags);
>> +        /*
>> +         * Iterate p2m table when an ioreq server unmaps from
>> p2m_ioreq_server,
>> +         * and reset the remaining p2m_ioreq_server entries back to
>> p2m_ram_rw.
>> +         */
> 
> can you elaborate how device model is expected to use this
> new extension, i.e. on deciding first_gfn?

The device model doesn't decide anything here (hence the field's
name being "opaque"), it simply has to pass zero for correct
operation.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-22  8:54     ` Jan Beulich
@ 2017-03-22  9:02       ` Tian, Kevin
  0 siblings, 0 replies; 42+ messages in thread
From: Tian, Kevin @ 2017-03-22  9:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, xen-devel, Paul Durrant, Yu Zhang,
	Lv, Zhiyuan

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, March 22, 2017 4:54 PM
> 
> >>> On 22.03.17 at 09:28, <kevin.tian@intel.com> wrote:
> >>  From: Yu Zhang
> >> Sent: Tuesday, March 21, 2017 10:53 AM
> >> --- a/xen/arch/x86/hvm/dm.c
> >> +++ b/xen/arch/x86/hvm/dm.c
> >> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
> >>
> >>      case XEN_DMOP_map_mem_type_to_ioreq_server:
> >>      {
> >> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> >> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> >>              &op.u.map_mem_type_to_ioreq_server;
> >> +        unsigned long first_gfn = data->opaque;
> >> +        unsigned long last_gfn;
> >> +
> >> +        const_op = false;
> >>
> >>          rc = -EOPNOTSUPP;
> >>          /* Only support for HAP enabled hvm. */
> >>          if ( !hap_enabled(d) )
> >>              break;
> >>
> >> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> >> -                                              data->type, data->flags);
> >> +        if ( first_gfn == 0 )
> >> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> >> +                                                  data->type, data->flags);
> >> +        /*
> >> +         * Iterate p2m table when an ioreq server unmaps from
> >> p2m_ioreq_server,
> >> +         * and reset the remaining p2m_ioreq_server entries back to
> >> p2m_ram_rw.
> >> +         */
> >
> > can you elaborate how device model is expected to use this new
> > extension, i.e. on deciding first_gfn?
> 
> The device model doesn't decide anything here (hence the field's name being
> "opaque"), it simply has to pass zero for correct operation.
> 

Got it. It's for hypercall continuation. :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-22  8:10   ` Tian, Kevin
@ 2017-03-22 10:12     ` Yu Zhang
  2017-03-24  9:37       ` Tian, Kevin
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-22 10:12 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Paul Durrant, Lv,
	Zhiyuan, Jan Beulich



On 3/22/2017 4:10 PM, Tian, Kevin wrote:
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: Tuesday, March 21, 2017 10:53 AM
>>
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> asynchronously with the current p2m_change_entry_type_global() interface.
>>
>> This patch also disallows live migration, when there's still any outstanding
>> p2m_ioreq_server entry left. The core reason is our current implementation
>> of p2m_change_entry_type_global() can not tell the state of
>> p2m_ioreq_server entries(can not decide if an entry is to be emulated or to
>> be resynced).
> Don't quite get this point. change_global is triggered only upon
> unmap. At that point there is no ioreq server to emulate the
> write operations on those entries. All the things required is
> just to change the type. What's the exact decision required here?

Well, one situation I can recall is that if another ioreq server maps to 
this type,
and live migration happens later. The resolve_misconfig() code cannot 
differentiate
if an p2m_ioreq_server page is an obsolete to be synced, or a new one to 
be only
emulated.

I gave some explanation on this issue in discussion during Jun 20 - 22 last
year.

http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
on Jun 20
and
http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
on Jun 21

> btw does it mean that live migration can be still supported as long as
> device model proactively unmaps write-protected pages before
> starting live migration?
>

Yes.

>> Note: new field entry_count is introduced in struct p2m_domain, to record
>> the number of p2m_ioreq_server p2m page table entries.
>> One nature of these entries is that they only point to 4K sized page frames,
>> because all p2m_ioreq_server entries are originated from p2m_ram_rw ones
>> in p2m_change_type_one(). We do not need to worry about the counting for
>> 2M/1G sized pages.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>> Cc: Paul Durrant <paul.durrant@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>>
>> changes in v4:
>>    - According to comments from Jan: use ASSERT() instead of 'if'
>>      condition in p2m_change_type_one().
>>    - According to comments from Jan: commit message changes to mention
>>      the p2m_ioreq_server are all based on 4K sized pages.
>>
>> changes in v3:
>>    - Move the synchronously resetting logic into patch 5.
>>    - According to comments from Jan: introduce p2m_check_changeable()
>>      to clarify the p2m type change code.
>>    - According to comments from George: use locks in the same order
>>      to avoid deadlock, call p2m_change_entry_type_global() after unmap
>>      of the ioreq server is finished.
>>
>> changes in v2:
>>    - Move the calculation of ioreq server page entry_cout into
>>      p2m_change_type_one() so that we do not need a seperate lock.
>>      Note: entry_count is also calculated in resolve_misconfig()/
>>      do_recalc(), fortunately callers of both routines have p2m
>>      lock protected already.
>>    - Simplify logic in hvmop_set_mem_type().
>>    - Introduce routine p2m_finish_type_change() to walk the p2m
>>      table and do the p2m reset.
>> ---
>>   xen/arch/x86/hvm/ioreq.c  |  8 ++++++++  xen/arch/x86/mm/hap/hap.c |  9
>> +++++++++  xen/arch/x86/mm/p2m-ept.c |  8 +++++++-
>> xen/arch/x86/mm/p2m-pt.c  | 13 +++++++++++--
>>   xen/arch/x86/mm/p2m.c     | 20 ++++++++++++++++++++
>>   xen/include/asm-x86/p2m.h |  9 ++++++++-
>>   6 files changed, 63 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
>> 746799f..102c6c2 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct
>> domain *d, ioservid_t id,
>>
>>       spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>
>> +    if ( rc == 0 && flags == 0 )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>> +    }
>> +
>>       return rc;
>>   }
>>
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index a57b385..6ec950a 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -187,6 +187,15 @@ out:
>>    */
>>   static int hap_enable_log_dirty(struct domain *d, bool_t log_global)  {
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +    /*
>> +     * Refuse to turn on global log-dirty mode if
>> +     * there's outstanding p2m_ioreq_server pages.
>> +     */
>> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
>> +        return -EBUSY;
> I know this has been discussed before, but didn't remember
> the detail reason - why cannot allow log-dirty mode when
> there are still outstanding p2m_ioreq_server pages? Cannot
> we mark related page as dirty when forwarding write emulation
> request to corresponding ioreq server?

IIUC, changing a paging to p2m_log_dirty will only mark it as read-only 
once,
and after it is marked as dirty, it will be changed back to p2m_ram_rw. 
Also,
handling of a p2m_log_dirty page is quite different, we'd like the 
p2m_ioreq_server
ones be handled like p2m_mmio_dm.

Thanks
Yu

[snip]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-22  7:49   ` Tian, Kevin
@ 2017-03-22 10:12     ` Yu Zhang
  2017-03-24  9:26       ` Tian, Kevin
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-22 10:12 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Tim Deegan,
	Paul Durrant, Lv, Zhiyuan, Jan Beulich



On 3/22/2017 3:49 PM, Tian, Kevin wrote:
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: Tuesday, March 21, 2017 10:53 AM
>>
>> A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to let
>> one ioreq server claim/disclaim its responsibility for the handling of guest
>> pages with p2m type p2m_ioreq_server. Users of this DMOP can specify
>> which kind of operation is supposed to be emulated in a parameter named
>> flags. Currently, this DMOP only support the emulation of write operations.
>> And it can be further extended to support the emulation of read ones if an
>> ioreq server has such requirement in the future.
> p2m_ioreq_server was already introduced before. Do you want to
> give some background how current state is around that type which
> will be helpful about purpose of this patch?

Sorry? I thought the background is described in the cover letter.
Previously p2m_ioreq_server is only for write-protection, and is tracked 
in an ioreq server's
rangeset, this patch is to bind the p2m type with an ioreq server directly.

>> For now, we only support one ioreq server for this p2m type, so once an
>> ioreq server has claimed its ownership, subsequent calls of the
>> XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can also
>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>> triggering this new DMOP, with ioreq server id set to the current owner's and
>> flags parameter set to 0.
>>
>> Note both XEN_DMOP_map_mem_type_to_ioreq_server and
>> p2m_ioreq_server are only supported for HVMs with HAP enabled.
>>
>> Also note that only after one ioreq server claims its ownership of
>> p2m_ioreq_server, will the p2m type change to p2m_ioreq_server be
>> allowed.
>>
>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> Acked-by: Tim Deegan <tim@xen.org>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Paul Durrant <paul.durrant@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Tim Deegan <tim@xen.org>
>>
>> changes in v8:
>>    - According to comments from Jan & Paul: comments changes in
>> hvmemul_do_io().
>>    - According to comments from Jan: remove the redundant code which
>> would only
>>      be useful for read emulations.
>>    - According to comments from Jan: change interface which maps mem type
>> to
>>      ioreq server, removed uint16_t pad and added an uint64_t opaque.
>>    - Address other comments from Jan, i.e. correct return values; remove stray
>>      cast.
>>
>> changes in v7:
>>    - Use new ioreq server interface -
>> XEN_DMOP_map_mem_type_to_ioreq_server.
>>    - According to comments from George: removed domain_pause/unpause()
>> in
>>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>>      and we can avoid the:
>>      a> deadlock issue existed in v6 patch, between p2m lock and ioreq server
>>         lock by using these locks in the same order - solved in patch 4;
>>      b> for race condition between vm exit and ioreq server unbinding, we can
>>         just retry this instruction.
>>    - According to comments from Jan and George: continue to clarify logic in
>>      hvmemul_do_io().
>>    - According to comments from Jan: clarify comment in
>> p2m_set_ioreq_server().
>>
>> changes in v6:
>>    - Clarify logic in hvmemul_do_io().
>>    - Use recursive lock for ioreq server lock.
>>    - Remove debug print when mapping ioreq server.
>>    - Clarify code in ept_p2m_type_to_flags() for consistency.
>>    - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>    - Add comments for HVMMEM_ioreq_server to note only changes
>>      to/from HVMMEM_ram_rw are permitted.
>>    - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
>>      to avoid the race condition when a vm exit happens on a write-
>>      protected page, just to find the ioreq server has been unmapped
>>      already.
>>    - Introduce a seperate patch to delay the release of p2m
>>      lock to avoid the race condition.
>>    - Introduce a seperate patch to handle the read-modify-write
>>      operations on a write protected page.
>>
>> changes in v5:
>>    - Simplify logic in hvmemul_do_io().
>>    - Use natual width types instead of fixed width types when possible.
>>    - Do not grant executable permission for p2m_ioreq_server entries.
>>    - Clarify comments and commit message.
>>    - Introduce a seperate patch to recalculate the p2m types after
>>      the ioreq server unmaps the p2m_ioreq_server.
>>
>> changes in v4:
>>    - According to Paul's advice, add comments around the definition
>>      of HVMMEM_iore_server in hvm_op.h.
>>    - According to Wei Liu's comments, change the format of the commit
>>      message.
>>
>> changes in v3:
>>    - Only support write emulation in this patch;
>>    - Remove the code to handle race condition in hvmemul_do_io(),
>>    - No need to reset the p2m type after an ioreq server has disclaimed
>>      its ownership of p2m_ioreq_server;
>>    - Only allow p2m type change to p2m_ioreq_server after an ioreq
>>      server has claimed its ownership of p2m_ioreq_server;
>>    - Only allow p2m type change to p2m_ioreq_server from pages with type
>>      p2m_ram_rw, and vice versa;
>>    - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
>>      instead of enum to specify the memory type;
>>    - Function prototype change to p2m_get_ioreq_server();
>>    - Coding style changes;
>>    - Commit message changes;
>>    - Add Tim's Acked-by.
>>
>> changes in v2:
>>    - Only support HAP enabled HVMs;
>>    - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
>>      to reset the p2m type, when an ioreq server tries to claim/disclaim
>>      its ownership of p2m_ioreq_server;
>>    - Comments changes.
>> ---
>>   xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>>   xen/arch/x86/hvm/emulate.c       | 65
>> ++++++++++++++++++++++++++++++++---
>>   xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>>   xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>>   xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>>   xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>>   xen/arch/x86/mm/p2m.c            | 74
>> ++++++++++++++++++++++++++++++++++++++++
>>   xen/arch/x86/mm/shadow/multi.c   |  3 +-
>>   xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>>   xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>>   xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>>   xen/include/public/hvm/hvm_op.h  |  8 ++++-
>>   12 files changed, 290 insertions(+), 20 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
>> 333c884..3f9484d 100644
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
>>
>>   static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)  {
>> +    if ( new == p2m_ioreq_server )
>> +        return old == p2m_ram_rw;
>> +
>> +    if ( old == p2m_ioreq_server )
>> +        return new == p2m_ram_rw;
>> +
>>       return p2m_is_ram(old) ||
>> -           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
>> -           (old == p2m_ioreq_server && new == p2m_ram_rw);
>> +           (p2m_is_hole(old) && new == p2m_mmio_dm);
>>   }
>>
>>   static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@ static int
>> set_mem_type(struct domain *d,
>>            unlikely(data->mem_type == HVMMEM_unused) )
>>           return -EINVAL;
>>
>> +    if ( data->mem_type  == HVMMEM_ioreq_server )
>> +    {
>> +        unsigned int flags;
>> +
>> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */
>> +        if ( !hap_enabled(d) )
>> +            return -EOPNOTSUPP;
>> +
>> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped.
>> */
>> +        if ( !p2m_get_ioreq_server(d, &flags) )
>> +            return -EINVAL;
>> +    }
>> +
>>       while ( iter < data->nr )
>>       {
>>           unsigned long pfn = data->first_pfn + iter; @@ -365,6 +383,21 @@
>> static int dm_op(domid_t domid,
>>           break;
>>       }
>>
>> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
>> +    {
>> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>> +            &op.u.map_mem_type_to_ioreq_server;
>> +
>> +        rc = -EOPNOTSUPP;
>> +        /* Only support for HAP enabled hvm. */
> Isn't it obvious from code?

Yes. Can be removed.
>> +        if ( !hap_enabled(d) )
>> +            break;
>> +
>> +        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> +                                              data->type, data->flags);
>> +        break;
>> +    }
>> +
>>       case XEN_DMOP_set_ioreq_server_state:
>>       {
>>           const struct xen_dm_op_set_ioreq_server_state *data = diff --git
>> a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index
>> f36d7c9..37139e6 100644
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
>>       uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)  {
>>       struct vcpu *curr = current;
>> +    struct domain *currd = curr->domain;
>>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>>       ioreq_t p = {
>>           .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@ -140,7
>> +141,7 @@ static int hvmemul_do_io(
>>                (p.dir != dir) ||
>>                (p.df != df) ||
>>                (p.data_is_ptr != data_is_addr) )
>> -            domain_crash(curr->domain);
>> +            domain_crash(currd);
>>
>>           if ( data_is_addr )
>>               return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,64 @@ static int
>> hvmemul_do_io(
>>           break;
>>       case X86EMUL_UNHANDLEABLE:
>>       {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
>> +        /*
>> +         * Xen isn't emulating the instruction internally, so see if
>> +         * there's an ioreq server that can handle it. Rules:
>> +         *
>> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
> why highlights "normal" here? What does a "abnormal" MMIO
> mean here? p2m_ioreq_server type?

Yes, it's just to differentiate the MMIO and the p2m_ioreq_server 
address,  copied from George's previous comments.
We can remove the "normal" here.


>> +         * to choose the ioreq server by range. If no server is found,
>> +         * the access is ignored.
>> +         *
>> +         * - p2m_ioreq_server accesses are handled by the designated
>> +         * ioreq_server for the domain, but there are some corner
>> +         * cases:
> since only one case is listed, "there is a corner case"

Another corner case is in patch 3/5 - handling the read-modify-write 
situations.
Maybe the correct thing is to use word "case" in this patch and change 
it to "cases" in next patch. :-)

>> +         *
>> +         *   - If the domain ioreq_server is NULL, assume there is a
>> +         *   race between the unbinding of ioreq server and guest fault
>> +         *   so re-try the instruction.
>> +         */
>> +        struct hvm_ioreq_server *s = NULL;
>> +        p2m_type_t p2mt = p2m_invalid;
>> +
>> +        if ( is_mmio )
>> +        {
>> +            unsigned long gmfn = paddr_to_pfn(addr);
>> +
>> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
>> +
>> +            if ( p2mt == p2m_ioreq_server )
>> +            {
>> +                unsigned int flags;
>> +
>> +                /*
>> +                 * Value of s could be stale, when we lost a race
> better describe it in higher level, e.g. just "no ioreq server is
> found".
>
> what's the meaning of "lost a race"? shouldn't it mean
> "likely we suffer from a race with..."?
>
>> +                 * with dm_op which unmaps p2m_ioreq_server from the
>> +                 * ioreq server. Yet there's no cheap way to avoid
> again, not talking about specific code, focus on the operation,
> e.g. "race with an unmap operation on the ioreq server"
>
>> +                 * this, so device model need to do the check.
>> +                 */
> How is above comment related to below line?

Well, the 's' returned by p2m_get_ioreq_server() can be stale - if the 
ioreq server is unmapped
after p2m_get_ioreq_server() returns. Current rangeset code also has 
such issue if the PIO/MMIO
is removed from the rangeset of the ioreq server after 
hvm_select_ioreq_server() returns.

Since using spinlock or domain_pause/unpause is too heavy weighted, we 
suggest the device
model side check whether the received ioreq is a valid one.

Above comments are added, according to Jan & Paul's suggestion in v7, to 
let developer know
we do not grantee the validity of 's' returned by 
p2m_get_ioreq_server/hvm_select_ioreq_server().

"Value of s could be stale, when we lost a race with..." is not for s 
being NULL, it's about s being
not valid. For a NULL returned, it is...

>
>> +                s = p2m_get_ioreq_server(currd, &flags);
>> +
>> +                /*
>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
> p2mt is definitely ioreq_server within this if condition.
>
>> +                 * we probably lost a race with unbinding of ioreq
>> +                 * server, just retry the access.
>> +                 */
> looks redundant to earlier comment. Or earlier one should
> be just removed?

... described here, to just retry the access.

>> +                if ( s == NULL )
>> +                {
>> +                    rc = X86EMUL_RETRY;
>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>> +                    break;
>> +                }
>> +            }
>> +        }
>> +
>> +        /*
>> +         * Value of s could be stale, when we lost a race with dm_op
>> +         * which unmaps this PIO/MMIO address from the ioreq server.
>> +         * The device model side need to do the check.
>> +         */
> another duplicated comment. below code is actually for 'normal'
> MMIO case...

This is for another possible situation when 's' returned by 
hvm_select_ioreq_server()
becomes stale later, when the PIO/MMIO is removed from the rangeset.

Logic in this hvmemul_do_io() has always been a bit mixed. I mean, many 
corner cases
and race conditions:
  - between the mapping/unmapping of PIO/MMIO from rangeset
  - between mapping/unmapping of ioreq server from p2m_ioreq_server
I tried to give much comments as I can when this patchset evolves, yet 
to find I just
introduced more confusion...

Any suggestions?

>> +        if ( !s )
>> +            s = hvm_select_ioreq_server(currd, &p);
>>
>>           /* If there is no suitable backing DM, just ignore accesses */
>>           if ( !s )
>> @@ -189,7 +246,7 @@ static int hvmemul_do_io(
>>           else
>>           {
>>               rc = hvm_send_ioreq(s, &p, 0);
>> -            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
>> +            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
>>                   vio->io_req.state = STATE_IOREQ_NONE;
>>               else if ( data_is_addr )
>>                   rc = X86EMUL_OKAY;
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
>> ad2edad..746799f 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d,
>> ioservid_t id)
>>
>>           domain_pause(d);
>>
>> +        p2m_destroy_ioreq_server(d, s);
>> +
>>           hvm_ioreq_server_disable(s, 0);
>>
>>           list_del(&s->list_entry);
>> @@ -914,6 +916,42 @@ int
>> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>       return rc;
>>   }
>>
>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>> +                                     uint32_t type, uint32_t flags) {
>> +    struct hvm_ioreq_server *s;
>> +    int rc;
>> +
>> +    /* For now, only HVMMEM_ioreq_server is supported. */
> obvious comment

IIRC, this comment(and the below one) is another changes that made 
according to
some review comments, to remind that we can add new mem type in the future.
So how about we add a line - "For the future, we can support other mem 
types"?

But that also sounds redundant to me. :)
So I am also OK to remove this and below comments.

>> +    if ( type != HVMMEM_ioreq_server )
>> +        return -EINVAL;
>> +
>> +    /* For now, only write emulation is supported. */
> ditto.
>
>> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
>> +        return -EINVAL;
>> +
>> +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>> +
>> +    rc = -ENOENT;
>> +    list_for_each_entry ( s,
>> +                          &d->arch.hvm_domain.ioreq_server.list,
>> +                          list_entry )
>> +    {
>> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
>> +            continue;
> any reason why we cannot let default server to claim this
> new type?

Well, my understanding about default ioreq server is that it is only for 
legacy
qemu and is not even created in the dm op hypercall. Latest device models(
including qemu) are all not default ioreq server now.

>> +
>> +        if ( s->id == id )
>> +        {
>> +            rc = p2m_set_ioreq_server(d, flags, s);
>> +            break;
>> +        }
>> +    }
>> +
>> +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>> +
>> +    return rc;
>> +}
>> +
>>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>>                                  bool_t enabled)  { diff --git
>> a/xen/arch/x86/mm/hap/nested_hap.c
>> b/xen/arch/x86/mm/hap/nested_hap.c
>> index 162afed..408ea7f 100644
>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m,
>> paddr_t L1_gpa, paddr_t *L0_gpa,
>>       if ( *p2mt == p2m_mmio_direct )
>>           goto direct_mmio_out;
>>       rc = NESTEDHVM_PAGEFAULT_MMIO;
>> -    if ( *p2mt == p2m_mmio_dm )
>> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>>           goto out;
>>
>>       rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index 568944f..cc1eb21 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct
>> p2m_domain *p2m, ept_entry_t *entry,
>>               entry->r = entry->w = entry->x = 1;
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>> +        case p2m_ioreq_server:
>> +            entry->r = 1;
>> +            entry->w = !(p2m->ioreq.flags &
>> XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>> +            entry->x = 0;
>> +            entry->a = !!cpu_has_vmx_ept_ad;
>> +            entry->d = entry->w && entry->a;
>> +            break;
>>           case p2m_mmio_direct:
>>               entry->r = entry->x = 1;
>>               entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
>> @@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct
>> p2m_domain *p2m, ept_entry_t *entry,
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>>           case p2m_grant_map_ro:
>> -        case p2m_ioreq_server:
>>               entry->r = 1;
>>               entry->w = entry->x = 0;
>>               entry->a = !!cpu_has_vmx_ept_ad; diff --git
>> a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index
>> 07e2ccd..f6c45ec 100644
>> --- a/xen/arch/x86/mm/p2m-pt.c
>> +++ b/xen/arch/x86/mm/p2m-pt.c
>> @@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
>>       PGT_l3_page_table
>>   };
>>
>> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>> +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
>> +                                       p2m_type_t t,
>> +                                       mfn_t mfn,
>>                                          unsigned int level)  {
>>       unsigned long flags;
>> @@ -92,8 +94,12 @@ static unsigned long p2m_type_to_flags(p2m_type_t t,
>> mfn_t mfn,
>>       default:
>>           return flags | _PAGE_NX_BIT;
>>       case p2m_grant_map_ro:
>> -    case p2m_ioreq_server:
>>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>> +    case p2m_ioreq_server:
>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
>> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
>> +            return flags & ~_PAGE_RW;
>> +        return flags;
>>       case p2m_ram_ro:
>>       case p2m_ram_logdirty:
>>       case p2m_ram_shared:
>> @@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m,
>> unsigned long gfn)
>>               p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn |
>> ~mask)
>>                                 ? p2m_ram_logdirty : p2m_ram_rw;
>>               unsigned long mfn = l1e_get_pfn(e);
>> -            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
>> +            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
>> +                                                    _mfn(mfn), level);
>>
>>               if ( level )
>>               {
>> @@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>> unsigned long gfn, mfn_t mfn,
>>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>>           l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
>>               ? l3e_from_pfn(mfn_x(mfn),
>> -                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
>> +                           p2m_type_to_flags(p2m, p2mt, mfn, 2) |
>> + _PAGE_PSE)
>>               : l3e_empty();
>>           entry_content.l1 = l3e_content.l3;
>>
>> @@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>> unsigned long gfn, mfn_t mfn,
>>
>>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
>>               entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
>> -                                             p2m_type_to_flags(p2mt, mfn, 0));
>> +                                         p2m_type_to_flags(p2m, p2mt,
>> + mfn, 0));
>>           else
>>               entry_content = l1e_empty();
>>
>> @@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>> unsigned long gfn, mfn_t mfn,
>>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
>>               l2e_content = l2e_from_pfn(mfn_x(mfn),
>> -                                       p2m_type_to_flags(p2mt, mfn, 1) |
>> +                                       p2m_type_to_flags(p2m, p2mt,
>> + mfn, 1) |
>>                                          _PAGE_PSE);
>>           else
>>               l2e_content = l2e_empty();
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
>> a5651a3..dd4e477 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct
>> p2m_domain *p2m)
>>       else
>>           p2m_pt_init(p2m);
>>
>> +    spin_lock_init(&p2m->ioreq.lock);
>> +
>>       return ret;
>>   }
>>
>> @@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain *d)
>>       }
>>   }
>>
>> +int p2m_set_ioreq_server(struct domain *d,
>> +                         unsigned int flags,
>> +                         struct hvm_ioreq_server *s) {
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int rc;
>> +
>> +    /*
>> +     * Use lock to prevent concurrent setting attempts
>> +     * from multiple ioreq serers.
> serers -> servers

Got it. Thanks.

>> +     */
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    /* Unmap ioreq server from p2m type by passing flags with 0. */
>> +    if ( flags == 0 )
>> +    {
>> +        rc = -EINVAL;
>> +        if ( p2m->ioreq.server != s )
>> +            goto out;
>> +
>> +        p2m->ioreq.server = NULL;
>> +        p2m->ioreq.flags = 0;
>> +    }
>> +    else
>> +    {
>> +        rc = -EBUSY;
>> +        if ( p2m->ioreq.server != NULL )
>> +            goto out;
>> +
>> +        p2m->ioreq.server = s;
>> +        p2m->ioreq.flags = flags;
>> +    }
>> +
>> +    rc = 0;
>> +
>> + out:
>> +    spin_unlock(&p2m->ioreq.lock);
>> +
>> +    return rc;
>> +}
>> +
>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>> +                                              unsigned int *flags) {
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    struct hvm_ioreq_server *s;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    s = p2m->ioreq.server;
>> +    *flags = p2m->ioreq.flags;
>> +
>> +    spin_unlock(&p2m->ioreq.lock);
>> +    return s;
>> +}
>> +
>> +void p2m_destroy_ioreq_server(const struct domain *d,
>> +                              const struct hvm_ioreq_server *s) {
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    if ( p2m->ioreq.server == s )
>> +    {
>> +        p2m->ioreq.server = NULL;
>> +        p2m->ioreq.flags = 0;
>> +    }
>> +
>> +    spin_unlock(&p2m->ioreq.lock);
>> +}
>> +
>>   void p2m_enable_hardware_log_dirty(struct domain *d)  {
>>       struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git
>> a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
>> index 7ea9d81..521b639 100644
>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -3269,8 +3269,7 @@ static int sh_page_fault(struct vcpu *v,
>>       }
>>
>>       /* Need to hand off device-model MMIO to the device model */
>> -    if ( p2mt == p2m_mmio_dm
>> -         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
>> +    if ( p2mt == p2m_mmio_dm )
>>       {
>>           gpa = guest_walk_to_gpa(&gw);
>>           goto mmio;
>> diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-
>> x86/hvm/ioreq.h index fbf2c74..b43667a 100644
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain
>> *d, ioservid_t id,  int hvm_unmap_io_range_from_ioreq_server(struct
>> domain *d, ioservid_t id,
>>                                            uint32_t type, uint64_t start,
>>                                            uint64_t end);
>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>> +                                     uint32_t type, uint32_t flags);
>>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
>>                                  bool_t enabled);
>>
>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index
>> 470d29d..3786680 100644
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
>>                          | p2m_to_mask(p2m_ram_paging_out)      \
>>                          | p2m_to_mask(p2m_ram_paged)           \
>>                          | p2m_to_mask(p2m_ram_paging_in)       \
>> -                       | p2m_to_mask(p2m_ram_shared))
>> +                       | p2m_to_mask(p2m_ram_shared)          \
>> +                       | p2m_to_mask(p2m_ioreq_server))
>>
>>   /* Types that represent a physmap hole that is ok to replace with a shared
>>    * entry */
>> @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
>>   #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
>>                         | p2m_to_mask(p2m_ram_ro)         \
>>                         | p2m_to_mask(p2m_grant_map_ro)   \
>> -                      | p2m_to_mask(p2m_ram_shared)     \
>> -                      | p2m_to_mask(p2m_ioreq_server))
>> +                      | p2m_to_mask(p2m_ram_shared))
>>
>>   /* Write-discard types, which should discard the write operations */
>>   #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
>> @@ -336,6 +336,20 @@ struct p2m_domain {
>>           struct ept_data ept;
>>           /* NPT-equivalent structure could be added here. */
>>       };
>> +
>> +     struct {
>> +         spinlock_t lock;
>> +         /*
>> +          * ioreq server who's responsible for the emulation of
>> +          * gfns with specific p2m type(for now, p2m_ioreq_server).
>> +          */
>> +         struct hvm_ioreq_server *server;
>> +         /*
>> +          * flags specifies whether read, write or both operations
>> +          * are to be emulated by an ioreq server.
>> +          */
>> +         unsigned int flags;
>> +     } ioreq;
>>   };
>>
>>   /* get host p2m table */
>> @@ -827,6 +841,12 @@ static inline unsigned int
>> p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>>       return flags;
>>   }
>>
>> +int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
>> +                         struct hvm_ioreq_server *s); struct
>> +hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>> +                                              unsigned int *flags);
>> +void p2m_destroy_ioreq_server(const struct domain *d, const struct
>> +hvm_ioreq_server *s);
>> +
>>   #endif /* _XEN_ASM_X86_P2M_H */
>>
>>   /*
>> diff --git a/xen/include/public/hvm/dm_op.h
>> b/xen/include/public/hvm/dm_op.h index f54cece..2a36833 100644
>> --- a/xen/include/public/hvm/dm_op.h
>> +++ b/xen/include/public/hvm/dm_op.h
>> @@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
>>       uint64_aligned_t addr;
>>   };
>>
>> +/*
>> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the
>> IOREQ Server <id>
>> + *                                      to specific memroy type <type>
> memroy->memory

Right. Thanks. :)

B.R.
Yu
>> + *                                      for specific accesses <flags>
>> + *
>> + * For now, flags only accept the value of
>> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
>> + * which means only write operations are to be forwarded to an ioreq
>> server.
>> + * Support for the emulation of read operations can be added when an
>> +ioreq
>> + * server has such requirement in future.
>> + */
>> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
>> +
>> +struct xen_dm_op_map_mem_type_to_ioreq_server {
>> +    ioservid_t id;      /* IN - ioreq server id */
>> +    uint16_t type;      /* IN - memory type */
>> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
>> +                           ioreq server. flags with 0 means to unmap the
>> +                           ioreq server */
>> +
>> +#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0) #define
>> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
>> +
>> +    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
>> +                           has to be set to zero by the caller */ };
>> +
>>   struct xen_dm_op {
>>       uint32_t op;
>>       uint32_t pad;
>> @@ -336,6 +362,8 @@ struct xen_dm_op {
>>           struct xen_dm_op_set_mem_type set_mem_type;
>>           struct xen_dm_op_inject_event inject_event;
>>           struct xen_dm_op_inject_msi inject_msi;
>> +        struct xen_dm_op_map_mem_type_to_ioreq_server
>> +                map_mem_type_to_ioreq_server;
>>       } u;
>>   };
>>
>> diff --git a/xen/include/public/hvm/hvm_op.h
>> b/xen/include/public/hvm/hvm_op.h index bc00ef0..0bdafdf 100644
>> --- a/xen/include/public/hvm/hvm_op.h
>> +++ b/xen/include/public/hvm/hvm_op.h
>> @@ -93,7 +93,13 @@ typedef enum {
>>       HVMMEM_unused,             /* Placeholder; setting memory to this type
>>                                     will fail for code after 4.7.0 */  #endif
>> -    HVMMEM_ioreq_server
>> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server;
>> type
>> +                                  changes to this value are only allowed after
>> +                                  an ioreq server has claimed its ownership.
>> +                                  Only pages with HVMMEM_ram_rw are allowed to
>> +                                  change to this type; conversely, pages with
>> +                                  this type are only allowed to be changed back
>> +                                  to HVMMEM_ram_rw. */
>>   } hvmmem_type_t;
>>
>>   /* Hint from PV drivers for pagetable destruction. */
>> --
>> 1.9.1
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-21  2:52 ` [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
  2017-03-22  7:49   ` Tian, Kevin
@ 2017-03-22 14:21   ` Jan Beulich
  2017-03-23  3:23     ` Yu Zhang
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-22 14:21 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
> ---
>  xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>  xen/arch/x86/hvm/emulate.c       | 65 ++++++++++++++++++++++++++++++++---
>  xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>  xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>  xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>  xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>  xen/arch/x86/mm/p2m.c            | 74 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/shadow/multi.c   |  3 +-
>  xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>  xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>  xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>  xen/include/public/hvm/hvm_op.h  |  8 ++++-
>  12 files changed, 290 insertions(+), 20 deletions(-)

Btw., isn't there a libdevicemodel wrapper missing here for this new
sub-op?

> @@ -177,8 +178,64 @@ static int hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> -        struct hvm_ioreq_server *s =
> -            hvm_select_ioreq_server(curr->domain, &p);
> +        /*
> +         * Xen isn't emulating the instruction internally, so see if
> +         * there's an ioreq server that can handle it. Rules:
> +         *
> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
> +         * to choose the ioreq server by range. If no server is found,
> +         * the access is ignored.
> +         *
> +         * - p2m_ioreq_server accesses are handled by the designated
> +         * ioreq_server for the domain, but there are some corner
> +         * cases:
> +         *
> +         *   - If the domain ioreq_server is NULL, assume there is a
> +         *   race between the unbinding of ioreq server and guest fault
> +         *   so re-try the instruction.

And that retry won't come back here because of? (The answer
should not include any behavior added by subsequent patches.)

> +         */
> +        struct hvm_ioreq_server *s = NULL;
> +        p2m_type_t p2mt = p2m_invalid;
> +
> +        if ( is_mmio )
> +        {
> +            unsigned long gmfn = paddr_to_pfn(addr);
> +
> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
> +
> +            if ( p2mt == p2m_ioreq_server )
> +            {
> +                unsigned int flags;
> +
> +                /*
> +                 * Value of s could be stale, when we lost a race
> +                 * with dm_op which unmaps p2m_ioreq_server from the
> +                 * ioreq server. Yet there's no cheap way to avoid
> +                 * this, so device model need to do the check.
> +                 */
> +                s = p2m_get_ioreq_server(currd, &flags);
> +
> +                /*
> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
> +                 * we probably lost a race with unbinding of ioreq
> +                 * server, just retry the access.
> +                 */

This repeats the earlier comment - please settle on where to state
this, but don't say the exact same thing twice within a few lines of
code.

> +                if ( s == NULL )
> +                {
> +                    rc = X86EMUL_RETRY;
> +                    vio->io_req.state = STATE_IOREQ_NONE;
> +                    break;
> +                }
> +            }
> +        }
> +
> +        /*
> +         * Value of s could be stale, when we lost a race with dm_op
> +         * which unmaps this PIO/MMIO address from the ioreq server.
> +         * The device model side need to do the check.

I think "will do" would be more natural here, or add "anyway" to
the end of the sentence.

> @@ -914,6 +916,42 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint32_t type, uint32_t flags)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +
> +    /* For now, only HVMMEM_ioreq_server is supported. */
> +    if ( type != HVMMEM_ioreq_server )
> +        return -EINVAL;
> +
> +    /* For now, only write emulation is supported. */
> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )

Stray parentheses.

> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>      if ( *p2mt == p2m_mmio_direct )
>          goto direct_mmio_out;
>      rc = NESTEDHVM_PAGEFAULT_MMIO;
> -    if ( *p2mt == p2m_mmio_dm )
> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )

Btw., how does this addition match up with the rc value being
assigned right before the if()?

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>              entry->r = entry->w = entry->x = 1;
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
> +        case p2m_ioreq_server:
> +            entry->r = 1;
> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);

Is this effectively open coded p2m_get_ioreq_server() actually
okay? If so, why does the function need to be used elsewhere,
instead of doing direct, lock-free accesses?

> +void p2m_destroy_ioreq_server(const struct domain *d,
> +                              const struct hvm_ioreq_server *s)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    if ( p2m->ioreq.server == s )
> +    {
> +        p2m->ioreq.server = NULL;
> +        p2m->ioreq.flags = 0;
> +    }
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +}

Is this function really needed? I.e. can't the caller simply call
p2m_set_ioreq_server(d, 0, s) instead?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-21  2:52 ` [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
@ 2017-03-22 14:22   ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2017-03-22 14:22 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
> @@ -226,6 +249,17 @@ static int hvmemul_do_io(
>                      vio->io_req.state = STATE_IOREQ_NONE;
>                      break;
>                  }
> +
> +                /*
> +                 * This is part of a read-modify-write instruction.
> +                 * Emulate the read part so we have the value cached.

s/cached/available/ ?

Other than that
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-03-21 10:05   ` Paul Durrant
  2017-03-22  8:10   ` Tian, Kevin
@ 2017-03-22 14:29   ` Jan Beulich
  2017-03-23  3:23     ` Yu Zhang
  2 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-22 14:29 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>  
>      spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>  
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> +    }

If you do this after dropping the lock, don't you risk a race with
another server mapping the type to itself?

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, 
> unsigned long gfn)
>                      e.ipat = ipat;
>                      if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>                      {
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             p2m->ioreq.entry_count--;
> +                             ASSERT(p2m->ioreq.entry_count >= 0);

If you did the ASSERT() first (using > 0), you wouldn't need the
type be a signed one, doubling the valid value range (even if
right now the full 64 bits can't be used anyway, but it would be
one less thing to worry about once we get 6-level page tables).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  2017-03-21 10:00   ` Paul Durrant
  2017-03-22  8:28   ` Tian, Kevin
@ 2017-03-22 14:39   ` Jan Beulich
  2017-03-23  3:23     ` Yu Zhang
  2 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-22 14:39 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
>  
>      case XEN_DMOP_map_mem_type_to_ioreq_server:
>      {
> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>              &op.u.map_mem_type_to_ioreq_server;
> +        unsigned long first_gfn = data->opaque;
> +        unsigned long last_gfn;
> +
> +        const_op = false;
>  
>          rc = -EOPNOTSUPP;
>          /* Only support for HAP enabled hvm. */
>          if ( !hap_enabled(d) )
>              break;
>  
> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> -                                              data->type, data->flags);
> +        if ( first_gfn == 0 )
> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> +                                                  data->type, data->flags);
> +        /*
> +         * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
> +         * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
> +         */
> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )

Instead of putting the rc check on the right side, please do

        if ( rc == 0 && (first_gfn > 0) || data->flags == 0) )

That'll require setting rc to zero in an else to the previous if(),
but that's needed anyway afaics in order to not return
-EOPNOTSUPP once no further continuation is necessary.

I further wonder why the if() here needs to look at first_gfn at
all - data->flags is supposed to remain at zero for continuations
(unless we have a misbehaving caller, in which case it'll harm
the guest only afaict). It seems to me, however, that this may
have been discussed once already, a long time ago. I'm sorry
for not remembering the outcome, if so.

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>      p2m_unlock(p2m);
>  }
>  
> +/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
> +void p2m_finish_type_change(struct domain *d,
> +                            unsigned long first_gfn, unsigned long last_gfn,

I think we'd prefer new functions to properly use gfn_t.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-22 14:21   ` Jan Beulich
@ 2017-03-23  3:23     ` Yu Zhang
  2017-03-23  8:57       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-23  3:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/22/2017 10:21 PM, Jan Beulich wrote:
>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>> ---
>>   xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>>   xen/arch/x86/hvm/emulate.c       | 65 ++++++++++++++++++++++++++++++++---
>>   xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>>   xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>>   xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>>   xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>>   xen/arch/x86/mm/p2m.c            | 74 ++++++++++++++++++++++++++++++++++++++++
>>   xen/arch/x86/mm/shadow/multi.c   |  3 +-
>>   xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>>   xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>>   xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>>   xen/include/public/hvm/hvm_op.h  |  8 ++++-
>>   12 files changed, 290 insertions(+), 20 deletions(-)
> Btw., isn't there a libdevicemodel wrapper missing here for this new
> sub-op?

Yes. I planed to add the wrapper code in another patch after this series 
is accepted.
Is this a must in this patchset?

>> @@ -177,8 +178,64 @@ static int hvmemul_do_io(
>>           break;
>>       case X86EMUL_UNHANDLEABLE:
>>       {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
>> +        /*
>> +         * Xen isn't emulating the instruction internally, so see if
>> +         * there's an ioreq server that can handle it. Rules:
>> +         *
>> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
>> +         * to choose the ioreq server by range. If no server is found,
>> +         * the access is ignored.
>> +         *
>> +         * - p2m_ioreq_server accesses are handled by the designated
>> +         * ioreq_server for the domain, but there are some corner
>> +         * cases:
>> +         *
>> +         *   - If the domain ioreq_server is NULL, assume there is a
>> +         *   race between the unbinding of ioreq server and guest fault
>> +         *   so re-try the instruction.
> And that retry won't come back here because of? (The answer
> should not include any behavior added by subsequent patches.)

You got me. :)
In this patch, retry will come back here. It should be after patch 4 or 
patch 5 that the retry
will be ignored(p2m type changed back to p2m_ram_rw after the unbinding).

>> +         */
>> +        struct hvm_ioreq_server *s = NULL;
>> +        p2m_type_t p2mt = p2m_invalid;
>> +
>> +        if ( is_mmio )
>> +        {
>> +            unsigned long gmfn = paddr_to_pfn(addr);
>> +
>> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
>> +
>> +            if ( p2mt == p2m_ioreq_server )
>> +            {
>> +                unsigned int flags;
>> +
>> +                /*
>> +                 * Value of s could be stale, when we lost a race
>> +                 * with dm_op which unmaps p2m_ioreq_server from the
>> +                 * ioreq server. Yet there's no cheap way to avoid
>> +                 * this, so device model need to do the check.
>> +                 */
>> +                s = p2m_get_ioreq_server(currd, &flags);
>> +
>> +                /*
>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
>> +                 * we probably lost a race with unbinding of ioreq
>> +                 * server, just retry the access.
>> +                 */
> This repeats the earlier comment - please settle on where to state
> this, but don't say the exact same thing twice within a few lines of
> code.

Thanks, will remove this comment.

>> +                if ( s == NULL )
>> +                {
>> +                    rc = X86EMUL_RETRY;
>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>> +                    break;
>> +                }
>> +            }
>> +        }
>> +
>> +        /*
>> +         * Value of s could be stale, when we lost a race with dm_op
>> +         * which unmaps this PIO/MMIO address from the ioreq server.
>> +         * The device model side need to do the check.
> I think "will do" would be more natural here, or add "anyway" to
> the end of the sentence.
>

Got it. Thanks.

>> @@ -914,6 +916,42 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>       return rc;
>>   }
>>   
>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>> +                                     uint32_t type, uint32_t flags)
>> +{
>> +    struct hvm_ioreq_server *s;
>> +    int rc;
>> +
>> +    /* For now, only HVMMEM_ioreq_server is supported. */
>> +    if ( type != HVMMEM_ioreq_server )
>> +        return -EINVAL;
>> +
>> +    /* For now, only write emulation is supported. */
>> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
> Stray parentheses.

Got it.
>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>>       if ( *p2mt == p2m_mmio_direct )
>>           goto direct_mmio_out;
>>       rc = NESTEDHVM_PAGEFAULT_MMIO;
>> -    if ( *p2mt == p2m_mmio_dm )
>> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
> Btw., how does this addition match up with the rc value being
> assigned right before the if()?

Well returning a NESTEDHVM_PAGEFAULT_MMIO in such case will trigger 
handle_mmio() later in
hvm_hap_nested_page_fault(). Guess that is what we expected.

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>               entry->r = entry->w = entry->x = 1;
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>> +        case p2m_ioreq_server:
>> +            entry->r = 1;
>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> Is this effectively open coded p2m_get_ioreq_server() actually
> okay? If so, why does the function need to be used elsewhere,
> instead of doing direct, lock-free accesses?

Maybe your comments is about whether it is necessary to use the lock in 
p2m_get_ioreq_server()?
I still believe so, it does not only protect the value of ioreq server, 
but also the flag together with it.

Besides, it is used not only in the emulation process, but also the 
hypercall to set the mem type.
So the lock can still provide some kind protection against the 
p2m_set_ioreq_server() - even it does
not always do so.

>> +void p2m_destroy_ioreq_server(const struct domain *d,
>> +                              const struct hvm_ioreq_server *s)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    if ( p2m->ioreq.server == s )
>> +    {
>> +        p2m->ioreq.server = NULL;
>> +        p2m->ioreq.flags = 0;
>> +    }
>> +
>> +    spin_unlock(&p2m->ioreq.lock);
>> +}
> Is this function really needed? I.e. can't the caller simply call
> p2m_set_ioreq_server(d, 0, s) instead?

You are right, we can use p2m_set_ioreq_server(d, 0, s). :)

Yu
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-22 14:29   ` Jan Beulich
@ 2017-03-23  3:23     ` Yu Zhang
  2017-03-23  9:00       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-23  3:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/22/2017 10:29 PM, Jan Beulich wrote:
>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>   
>>       spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>   
>> +    if ( rc == 0 && flags == 0 )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>> +    }
> If you do this after dropping the lock, don't you risk a race with
> another server mapping the type to itself?

I believe it's OK. Remaining p2m_ioreq_server entries still needs to be 
cleaned anyway.

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>> unsigned long gfn)
>>                       e.ipat = ipat;
>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>                       {
>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>> +                         {
>> +                             p2m->ioreq.entry_count--;
>> +                             ASSERT(p2m->ioreq.entry_count >= 0);
> If you did the ASSERT() first (using > 0), you wouldn't need the
> type be a signed one, doubling the valid value range (even if
> right now the full 64 bits can't be used anyway, but it would be
> one less thing to worry about once we get 6-level page tables).

Well, entry_count counts only for 4K pages, so even if the guest 
physical address
width is extended up to 64 bit in the future, entry_count will not 
exceed 2^52(
2^64/2^12).

Yu
> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-22 14:39   ` Jan Beulich
@ 2017-03-23  3:23     ` Yu Zhang
  2017-03-23  9:02       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-23  3:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/22/2017 10:39 PM, Jan Beulich wrote:
>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
>>   
>>       case XEN_DMOP_map_mem_type_to_ioreq_server:
>>       {
>> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>               &op.u.map_mem_type_to_ioreq_server;
>> +        unsigned long first_gfn = data->opaque;
>> +        unsigned long last_gfn;
>> +
>> +        const_op = false;
>>   
>>           rc = -EOPNOTSUPP;
>>           /* Only support for HAP enabled hvm. */
>>           if ( !hap_enabled(d) )
>>               break;
>>   
>> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> -                                              data->type, data->flags);
>> +        if ( first_gfn == 0 )
>> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>> +                                                  data->type, data->flags);
>> +        /*
>> +         * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
>> +         * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
>> +         */
>> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
> Instead of putting the rc check on the right side, please do
>
>          if ( rc == 0 && (first_gfn > 0) || data->flags == 0) )
>
> That'll require setting rc to zero in an else to the previous if(),
> but that's needed anyway afaics in order to not return
> -EOPNOTSUPP once no further continuation is necessary.
>
> I further wonder why the if() here needs to look at first_gfn at
> all - data->flags is supposed to remain at zero for continuations
> (unless we have a misbehaving caller, in which case it'll harm
> the guest only afaict). It seems to me, however, that this may
> have been discussed once already, a long time ago. I'm sorry
> for not remembering the outcome, if so.

We have not discussed this. Our previous discussion is about the if 
condition before
calling hvm_map_mem_type_to_ioreq_server(). :-)

Maybe above code should be changed to:
@@ -400,11 +400,14 @@ static int dm_op(domid_t domid,
          if ( first_gfn == 0 )
              rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
                                                    data->type, 
data->flags);
+       else
+           rc = 0;
+
          /*
           * Iterate p2m table when an ioreq server unmaps from 
p2m_ioreq_server,
           * and reset the remaining p2m_ioreq_server entries back to 
p2m_ram_rw.
           */
-        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
+        if ( data->flags == 0 && rc == 0 )
          {
              struct p2m_domain *p2m = p2m_get_hostp2m(d);

>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>>       p2m_unlock(p2m);
>>   }
>>   
>> +/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
>> +void p2m_finish_type_change(struct domain *d,
>> +                            unsigned long first_gfn, unsigned long last_gfn,
> I think we'd prefer new functions to properly use gfn_t.
Sorry? I do not get it.
Paul suggested we replace last_gfn with max_nr, which sounds reasonable 
to me. Guess you mean
something else?

Thanks
Yu

> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-23  3:23     ` Yu Zhang
@ 2017-03-23  8:57       ` Jan Beulich
  2017-03-24  9:05         ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-23  8:57 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
> On 3/22/2017 10:21 PM, Jan Beulich wrote:
>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>> ---
>>>   xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>>>   xen/arch/x86/hvm/emulate.c       | 65 ++++++++++++++++++++++++++++++++---
>>>   xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>>>   xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>>>   xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>>>   xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>>>   xen/arch/x86/mm/p2m.c            | 74 ++++++++++++++++++++++++++++++++++++++++
>>>   xen/arch/x86/mm/shadow/multi.c   |  3 +-
>>>   xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>>>   xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>>>   xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>>>   xen/include/public/hvm/hvm_op.h  |  8 ++++-
>>>   12 files changed, 290 insertions(+), 20 deletions(-)
>> Btw., isn't there a libdevicemodel wrapper missing here for this new
>> sub-op?
> 
> Yes. I planed to add the wrapper code in another patch after this series 
> is accepted.
> Is this a must in this patchset?

I think so, or else the code you add is effectively dead. We should
avoid encouraging people to bypass libxc.

>>> @@ -177,8 +178,64 @@ static int hvmemul_do_io(
>>>           break;
>>>       case X86EMUL_UNHANDLEABLE:
>>>       {
>>> -        struct hvm_ioreq_server *s =
>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>> +        /*
>>> +         * Xen isn't emulating the instruction internally, so see if
>>> +         * there's an ioreq server that can handle it. Rules:
>>> +         *
>>> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
>>> +         * to choose the ioreq server by range. If no server is found,
>>> +         * the access is ignored.
>>> +         *
>>> +         * - p2m_ioreq_server accesses are handled by the designated
>>> +         * ioreq_server for the domain, but there are some corner
>>> +         * cases:
>>> +         *
>>> +         *   - If the domain ioreq_server is NULL, assume there is a
>>> +         *   race between the unbinding of ioreq server and guest fault
>>> +         *   so re-try the instruction.
>> And that retry won't come back here because of? (The answer
>> should not include any behavior added by subsequent patches.)
> 
> You got me. :)
> In this patch, retry will come back here. It should be after patch 4 or 
> patch 5 that the retry
> will be ignored(p2m type changed back to p2m_ram_rw after the unbinding).

In which case I think we shouldn't insist on you to change things, but
you should spell out very clearly that this patch should not go in
without the others going in at the same time.

>>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>>> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>>>       if ( *p2mt == p2m_mmio_direct )
>>>           goto direct_mmio_out;
>>>       rc = NESTEDHVM_PAGEFAULT_MMIO;
>>> -    if ( *p2mt == p2m_mmio_dm )
>>> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>> Btw., how does this addition match up with the rc value being
>> assigned right before the if()?
> 
> Well returning a NESTEDHVM_PAGEFAULT_MMIO in such case will trigger 
> handle_mmio() later in
> hvm_hap_nested_page_fault(). Guess that is what we expected.

That's probably what is expected, but it's no MMIO which we're
doing in that case. And note that we've stopped abusing
handle_mmio() for non-MMIO purposes a little while ago (commit
3dd00f7b56 ["x86/HVM: restrict permitted instructions during
special purpose emulation"]).

>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>               entry->r = entry->w = entry->x = 1;
>>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>               break;
>>> +        case p2m_ioreq_server:
>>> +            entry->r = 1;
>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>> Is this effectively open coded p2m_get_ioreq_server() actually
>> okay? If so, why does the function need to be used elsewhere,
>> instead of doing direct, lock-free accesses?
> 
> Maybe your comments is about whether it is necessary to use the lock in 
> p2m_get_ioreq_server()?
> I still believe so, it does not only protect the value of ioreq server, 
> but also the flag together with it.
> 
> Besides, it is used not only in the emulation process, but also the 
> hypercall to set the mem type.
> So the lock can still provide some kind protection against the 
> p2m_set_ioreq_server() - even it does
> not always do so.

The question, fundamentally, is about consistency: The same
access model should be followed universally, unless there is an
explicit reason for an exception.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-23  3:23     ` Yu Zhang
@ 2017-03-23  9:00       ` Jan Beulich
  2017-03-24  9:05         ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-23  9:00 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
> On 3/22/2017 10:29 PM, Jan Beulich wrote:
>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/hvm/ioreq.c
>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>   
>>>       spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>>   
>>> +    if ( rc == 0 && flags == 0 )
>>> +    {
>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +
>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> +    }
>> If you do this after dropping the lock, don't you risk a race with
>> another server mapping the type to itself?
> 
> I believe it's OK. Remaining p2m_ioreq_server entries still needs to be 
> cleaned anyway.

Are you refusing a new server mapping the type before being
done with the cleanup?

>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>>> unsigned long gfn)
>>>                       e.ipat = ipat;
>>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>                       {
>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>> +                         {
>>> +                             p2m->ioreq.entry_count--;
>>> +                             ASSERT(p2m->ioreq.entry_count >= 0);
>> If you did the ASSERT() first (using > 0), you wouldn't need the
>> type be a signed one, doubling the valid value range (even if
>> right now the full 64 bits can't be used anyway, but it would be
>> one less thing to worry about once we get 6-level page tables).
> 
> Well, entry_count counts only for 4K pages, so even if the guest 
> physical address
> width is extended up to 64 bit in the future, entry_count will not 
> exceed 2^52(
> 2^64/2^12).

Oh, true. Still I'd prefer if you used an unsigned type for a count
when that's easily possible.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-23  3:23     ` Yu Zhang
@ 2017-03-23  9:02       ` Jan Beulich
  2017-03-24  9:05         ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-23  9:02 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 3/22/2017 10:39 PM, Jan Beulich wrote:
>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/hvm/dm.c
>>> +++ b/xen/arch/x86/hvm/dm.c
>>> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
>>>   
>>>       case XEN_DMOP_map_mem_type_to_ioreq_server:
>>>       {
>>> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>>               &op.u.map_mem_type_to_ioreq_server;
>>> +        unsigned long first_gfn = data->opaque;
>>> +        unsigned long last_gfn;
>>> +
>>> +        const_op = false;
>>>   
>>>           rc = -EOPNOTSUPP;
>>>           /* Only support for HAP enabled hvm. */
>>>           if ( !hap_enabled(d) )
>>>               break;
>>>   
>>> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>>> -                                              data->type, data->flags);
>>> +        if ( first_gfn == 0 )
>>> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>>> +                                                  data->type, data->flags);
>>> +        /*
>>> +         * Iterate p2m table when an ioreq server unmaps from 
> p2m_ioreq_server,
>>> +         * and reset the remaining p2m_ioreq_server entries back to 
> p2m_ram_rw.
>>> +         */
>>> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
>> Instead of putting the rc check on the right side, please do
>>
>>          if ( rc == 0 && (first_gfn > 0) || data->flags == 0) )
>>
>> That'll require setting rc to zero in an else to the previous if(),
>> but that's needed anyway afaics in order to not return
>> -EOPNOTSUPP once no further continuation is necessary.
>>
>> I further wonder why the if() here needs to look at first_gfn at
>> all - data->flags is supposed to remain at zero for continuations
>> (unless we have a misbehaving caller, in which case it'll harm
>> the guest only afaict). It seems to me, however, that this may
>> have been discussed once already, a long time ago. I'm sorry
>> for not remembering the outcome, if so.
> 
> We have not discussed this. Our previous discussion is about the if 
> condition before
> calling hvm_map_mem_type_to_ioreq_server(). :-)
> 
> Maybe above code should be changed to:
> @@ -400,11 +400,14 @@ static int dm_op(domid_t domid,
>           if ( first_gfn == 0 )
>               rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>                                                     data->type, 
> data->flags);
> +       else
> +           rc = 0;
> +
>           /*
>            * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
>            * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
>            */
> -        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
> +        if ( data->flags == 0 && rc == 0 )
>           {
>               struct p2m_domain *p2m = p2m_get_hostp2m(d);

Yes, that's what I was trying to hint at.

>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>>>       p2m_unlock(p2m);
>>>   }
>>>   
>>> +/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
>>> +void p2m_finish_type_change(struct domain *d,
>>> +                            unsigned long first_gfn, unsigned long last_gfn,
>> I think we'd prefer new functions to properly use gfn_t.
> Sorry? I do not get it.
> Paul suggested we replace last_gfn with max_nr, which sounds reasonable 
> to me. Guess you mean
> something else?

Indeed - even with Paul's suggestion, first_gfn would remain as a
parameter, and it should be of type gfn_t.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-23  8:57       ` Jan Beulich
@ 2017-03-24  9:05         ` Yu Zhang
  2017-03-24 10:19           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-24  9:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/23/2017 4:57 PM, Jan Beulich wrote:
>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/22/2017 10:21 PM, Jan Beulich wrote:
>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>> ---
>>>>    xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>>>>    xen/arch/x86/hvm/emulate.c       | 65 ++++++++++++++++++++++++++++++++---
>>>>    xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>>>>    xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>>>>    xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>>>>    xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>>>>    xen/arch/x86/mm/p2m.c            | 74 ++++++++++++++++++++++++++++++++++++++++
>>>>    xen/arch/x86/mm/shadow/multi.c   |  3 +-
>>>>    xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>>>>    xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>>>>    xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>>>>    xen/include/public/hvm/hvm_op.h  |  8 ++++-
>>>>    12 files changed, 290 insertions(+), 20 deletions(-)
>>> Btw., isn't there a libdevicemodel wrapper missing here for this new
>>> sub-op?
>> Yes. I planed to add the wrapper code in another patch after this series
>> is accepted.
>> Is this a must in this patchset?
> I think so, or else the code you add is effectively dead. We should
> avoid encouraging people to bypass libxc.

OK. I'll try to add another patch to do so, along with the existing 
ones. Thanks.
>>>> @@ -177,8 +178,64 @@ static int hvmemul_do_io(
>>>>            break;
>>>>        case X86EMUL_UNHANDLEABLE:
>>>>        {
>>>> -        struct hvm_ioreq_server *s =
>>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>>> +        /*
>>>> +         * Xen isn't emulating the instruction internally, so see if
>>>> +         * there's an ioreq server that can handle it. Rules:
>>>> +         *
>>>> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
>>>> +         * to choose the ioreq server by range. If no server is found,
>>>> +         * the access is ignored.
>>>> +         *
>>>> +         * - p2m_ioreq_server accesses are handled by the designated
>>>> +         * ioreq_server for the domain, but there are some corner
>>>> +         * cases:
>>>> +         *
>>>> +         *   - If the domain ioreq_server is NULL, assume there is a
>>>> +         *   race between the unbinding of ioreq server and guest fault
>>>> +         *   so re-try the instruction.
>>> And that retry won't come back here because of? (The answer
>>> should not include any behavior added by subsequent patches.)
>> You got me. :)
>> In this patch, retry will come back here. It should be after patch 4 or
>> patch 5 that the retry
>> will be ignored(p2m type changed back to p2m_ram_rw after the unbinding).
> In which case I think we shouldn't insist on you to change things, but
> you should spell out very clearly that this patch should not go in
> without the others going in at the same time.

So maybe it would be better we leave the retry part to a later patch, 
say patch 4/5 or patch 5/5,
and return unhandleable in this patch?

>>>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>>>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>>>> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>>>>        if ( *p2mt == p2m_mmio_direct )
>>>>            goto direct_mmio_out;
>>>>        rc = NESTEDHVM_PAGEFAULT_MMIO;
>>>> -    if ( *p2mt == p2m_mmio_dm )
>>>> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>>> Btw., how does this addition match up with the rc value being
>>> assigned right before the if()?
>> Well returning a NESTEDHVM_PAGEFAULT_MMIO in such case will trigger
>> handle_mmio() later in
>> hvm_hap_nested_page_fault(). Guess that is what we expected.
> That's probably what is expected, but it's no MMIO which we're
> doing in that case. And note that we've stopped abusing
> handle_mmio() for non-MMIO purposes a little while ago (commit
> 3dd00f7b56 ["x86/HVM: restrict permitted instructions during
> special purpose emulation"]).

OK. So what about we just remove this "*p2mt == p2m_ioreq_server"?

>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>                entry->r = entry->w = entry->x = 1;
>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>                break;
>>>> +        case p2m_ioreq_server:
>>>> +            entry->r = 1;
>>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>>> Is this effectively open coded p2m_get_ioreq_server() actually
>>> okay? If so, why does the function need to be used elsewhere,
>>> instead of doing direct, lock-free accesses?
>> Maybe your comments is about whether it is necessary to use the lock in
>> p2m_get_ioreq_server()?
>> I still believe so, it does not only protect the value of ioreq server,
>> but also the flag together with it.
>>
>> Besides, it is used not only in the emulation process, but also the
>> hypercall to set the mem type.
>> So the lock can still provide some kind protection against the
>> p2m_set_ioreq_server() - even it does
>> not always do so.
> The question, fundamentally, is about consistency: The same
> access model should be followed universally, unless there is an
> explicit reason for an exception.

Sorry, I do not quite understand. Why the consistency is broken?
I think this lock at least protects the ioreq server and the flag. The 
only exception
is the one you mentioned - s could become stale which we agreed to let 
the device
model do the check. Without this lock, things would become more complex 
- more
race conditions...

Thanks
Yu
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-23  9:00       ` Jan Beulich
@ 2017-03-24  9:05         ` Yu Zhang
  2017-03-24 10:37           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-24  9:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/23/2017 5:00 PM, Jan Beulich wrote:
>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/22/2017 10:29 PM, Jan Beulich wrote:
>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/hvm/ioreq.c
>>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>>> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>>    
>>>>        spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>>>    
>>>> +    if ( rc == 0 && flags == 0 )
>>>> +    {
>>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +
>>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>> +    }
>>> If you do this after dropping the lock, don't you risk a race with
>>> another server mapping the type to itself?
>> I believe it's OK. Remaining p2m_ioreq_server entries still needs to be
>> cleaned anyway.
> Are you refusing a new server mapping the type before being
> done with the cleanup?

No. I meant even a new server is mapped, we can still sweep the p2m 
table later asynchronously.
But this reminds me other point - will a dm op be interrupted by another 
one, or should it?
Since we have patch 5/5 which sweep the p2m table right after the unmap 
happens, maybe
we should refuse any mapping requirement if there's remaining 
p2m_ioreq_server entries.

>
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>>>> unsigned long gfn)
>>>>                        e.ipat = ipat;
>>>>                        if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>                        {
>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>> +                         {
>>>> +                             p2m->ioreq.entry_count--;
>>>> +                             ASSERT(p2m->ioreq.entry_count >= 0);
>>> If you did the ASSERT() first (using > 0), you wouldn't need the
>>> type be a signed one, doubling the valid value range (even if
>>> right now the full 64 bits can't be used anyway, but it would be
>>> one less thing to worry about once we get 6-level page tables).
>> Well, entry_count counts only for 4K pages, so even if the guest
>> physical address
>> width is extended up to 64 bit in the future, entry_count will not
>> exceed 2^52(
>> 2^64/2^12).
> Oh, true. Still I'd prefer if you used an unsigned type for a count
> when that's easily possible.

Got it. :)

Yu
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-23  9:02       ` Jan Beulich
@ 2017-03-24  9:05         ` Yu Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-24  9:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/23/2017 5:02 PM, Jan Beulich wrote:
>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/22/2017 10:39 PM, Jan Beulich wrote:
>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/hvm/dm.c
>>>> +++ b/xen/arch/x86/hvm/dm.c
>>>> @@ -385,16 +385,51 @@ static int dm_op(domid_t domid,
>>>>    
>>>>        case XEN_DMOP_map_mem_type_to_ioreq_server:
>>>>        {
>>>> -        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>>> +        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>>>                &op.u.map_mem_type_to_ioreq_server;
>>>> +        unsigned long first_gfn = data->opaque;
>>>> +        unsigned long last_gfn;
>>>> +
>>>> +        const_op = false;
>>>>    
>>>>            rc = -EOPNOTSUPP;
>>>>            /* Only support for HAP enabled hvm. */
>>>>            if ( !hap_enabled(d) )
>>>>                break;
>>>>    
>>>> -        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>>>> -                                              data->type, data->flags);
>>>> +        if ( first_gfn == 0 )
>>>> +            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>>>> +                                                  data->type, data->flags);
>>>> +        /*
>>>> +         * Iterate p2m table when an ioreq server unmaps from
>> p2m_ioreq_server,
>>>> +         * and reset the remaining p2m_ioreq_server entries back to
>> p2m_ram_rw.
>>>> +         */
>>>> +        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
>>> Instead of putting the rc check on the right side, please do
>>>
>>>           if ( rc == 0 && (first_gfn > 0) || data->flags == 0) )
>>>
>>> That'll require setting rc to zero in an else to the previous if(),
>>> but that's needed anyway afaics in order to not return
>>> -EOPNOTSUPP once no further continuation is necessary.
>>>
>>> I further wonder why the if() here needs to look at first_gfn at
>>> all - data->flags is supposed to remain at zero for continuations
>>> (unless we have a misbehaving caller, in which case it'll harm
>>> the guest only afaict). It seems to me, however, that this may
>>> have been discussed once already, a long time ago. I'm sorry
>>> for not remembering the outcome, if so.
>> We have not discussed this. Our previous discussion is about the if
>> condition before
>> calling hvm_map_mem_type_to_ioreq_server(). :-)
>>
>> Maybe above code should be changed to:
>> @@ -400,11 +400,14 @@ static int dm_op(domid_t domid,
>>            if ( first_gfn == 0 )
>>                rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>>                                                      data->type,
>> data->flags);
>> +       else
>> +           rc = 0;
>> +
>>            /*
>>             * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
>>             * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
>>             */
>> -        if ( (first_gfn > 0) || (data->flags == 0 && rc == 0) )
>> +        if ( data->flags == 0 && rc == 0 )
>>            {
>>                struct p2m_domain *p2m = p2m_get_hostp2m(d);
> Yes, that's what I was trying to hint at.

Great. Thanks.
>>>> --- a/xen/arch/x86/mm/p2m.c
>>>> +++ b/xen/arch/x86/mm/p2m.c
>>>> @@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
>>>>        p2m_unlock(p2m);
>>>>    }
>>>>    
>>>> +/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
>>>> +void p2m_finish_type_change(struct domain *d,
>>>> +                            unsigned long first_gfn, unsigned long last_gfn,
>>> I think we'd prefer new functions to properly use gfn_t.
>> Sorry? I do not get it.
>> Paul suggested we replace last_gfn with max_nr, which sounds reasonable
>> to me. Guess you mean
>> something else?
> Indeed - even with Paul's suggestion, first_gfn would remain as a
> parameter, and it should be of type gfn_t.

Oh. I see, you mean change the type of first_gfn to gfn_t.

Thanks
Yu
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-22 10:12     ` Yu Zhang
@ 2017-03-24  9:26       ` Tian, Kevin
  2017-03-24 12:34         ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2017-03-24  9:26 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Tim Deegan,
	Paul Durrant, Lv, Zhiyuan, Jan Beulich

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Wednesday, March 22, 2017 6:13 PM
> 
> On 3/22/2017 3:49 PM, Tian, Kevin wrote:
> >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> >> Sent: Tuesday, March 21, 2017 10:53 AM
> >>
> >> A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to
> let
> >> one ioreq server claim/disclaim its responsibility for the handling
> >> of guest pages with p2m type p2m_ioreq_server. Users of this DMOP can
> >> specify which kind of operation is supposed to be emulated in a
> >> parameter named flags. Currently, this DMOP only support the emulation
> of write operations.
> >> And it can be further extended to support the emulation of read ones
> >> if an ioreq server has such requirement in the future.
> > p2m_ioreq_server was already introduced before. Do you want to give
> > some background how current state is around that type which will be
> > helpful about purpose of this patch?
> 
> Sorry? I thought the background is described in the cover letter.
> Previously p2m_ioreq_server is only for write-protection, and is tracked in an
> ioreq server's rangeset, this patch is to bind the p2m type with an ioreq
> server directly.

cover letter will not be in git repo. Better you can include it to make
this commit along complete.

> 
> >> For now, we only support one ioreq server for this p2m type, so once
> >> an ioreq server has claimed its ownership, subsequent calls of the
> >> XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can also
> >> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
> >> triggering this new DMOP, with ioreq server id set to the current
> >> owner's and flags parameter set to 0.
> >>
> >> Note both XEN_DMOP_map_mem_type_to_ioreq_server and
> p2m_ioreq_server
> >> are only supported for HVMs with HAP enabled.
> >>
> >> Also note that only after one ioreq server claims its ownership of
> >> p2m_ioreq_server, will the p2m type change to p2m_ioreq_server be
> >> allowed.
> >>
> >> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> >> Acked-by: Tim Deegan <tim@xen.org>
> >> ---
> >> Cc: Jan Beulich <jbeulich@suse.com>
> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Cc: Paul Durrant <paul.durrant@citrix.com>
> >> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> >> Cc: Jun Nakajima <jun.nakajima@intel.com>
> >> Cc: Kevin Tian <kevin.tian@intel.com>
> >> Cc: Tim Deegan <tim@xen.org>
> >>
> >> changes in v8:
> >>    - According to comments from Jan & Paul: comments changes in
> >> hvmemul_do_io().
> >>    - According to comments from Jan: remove the redundant code which
> >> would only
> >>      be useful for read emulations.
> >>    - According to comments from Jan: change interface which maps mem
> >> type to
> >>      ioreq server, removed uint16_t pad and added an uint64_t opaque.
> >>    - Address other comments from Jan, i.e. correct return values; remove
> stray
> >>      cast.
> >>
> >> changes in v7:
> >>    - Use new ioreq server interface -
> >> XEN_DMOP_map_mem_type_to_ioreq_server.
> >>    - According to comments from George: removed
> >> domain_pause/unpause() in
> >>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
> >>      and we can avoid the:
> >>      a> deadlock issue existed in v6 patch, between p2m lock and ioreq
> server
> >>         lock by using these locks in the same order - solved in patch 4;
> >>      b> for race condition between vm exit and ioreq server unbinding, we
> can
> >>         just retry this instruction.
> >>    - According to comments from Jan and George: continue to clarify logic
> in
> >>      hvmemul_do_io().
> >>    - According to comments from Jan: clarify comment in
> >> p2m_set_ioreq_server().
> >>
> >> changes in v6:
> >>    - Clarify logic in hvmemul_do_io().
> >>    - Use recursive lock for ioreq server lock.
> >>    - Remove debug print when mapping ioreq server.
> >>    - Clarify code in ept_p2m_type_to_flags() for consistency.
> >>    - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
> >>    - Add comments for HVMMEM_ioreq_server to note only changes
> >>      to/from HVMMEM_ram_rw are permitted.
> >>    - Add domain_pause/unpause() in
> hvm_map_mem_type_to_ioreq_server()
> >>      to avoid the race condition when a vm exit happens on a write-
> >>      protected page, just to find the ioreq server has been unmapped
> >>      already.
> >>    - Introduce a seperate patch to delay the release of p2m
> >>      lock to avoid the race condition.
> >>    - Introduce a seperate patch to handle the read-modify-write
> >>      operations on a write protected page.
> >>
> >> changes in v5:
> >>    - Simplify logic in hvmemul_do_io().
> >>    - Use natual width types instead of fixed width types when possible.
> >>    - Do not grant executable permission for p2m_ioreq_server entries.
> >>    - Clarify comments and commit message.
> >>    - Introduce a seperate patch to recalculate the p2m types after
> >>      the ioreq server unmaps the p2m_ioreq_server.
> >>
> >> changes in v4:
> >>    - According to Paul's advice, add comments around the definition
> >>      of HVMMEM_iore_server in hvm_op.h.
> >>    - According to Wei Liu's comments, change the format of the commit
> >>      message.
> >>
> >> changes in v3:
> >>    - Only support write emulation in this patch;
> >>    - Remove the code to handle race condition in hvmemul_do_io(),
> >>    - No need to reset the p2m type after an ioreq server has disclaimed
> >>      its ownership of p2m_ioreq_server;
> >>    - Only allow p2m type change to p2m_ioreq_server after an ioreq
> >>      server has claimed its ownership of p2m_ioreq_server;
> >>    - Only allow p2m type change to p2m_ioreq_server from pages with
> type
> >>      p2m_ram_rw, and vice versa;
> >>    - HVMOP_map_mem_type_to_ioreq_server interface change - use
> uint16,
> >>      instead of enum to specify the memory type;
> >>    - Function prototype change to p2m_get_ioreq_server();
> >>    - Coding style changes;
> >>    - Commit message changes;
> >>    - Add Tim's Acked-by.
> >>
> >> changes in v2:
> >>    - Only support HAP enabled HVMs;
> >>    - Replace p2m_mem_type_changed() with
> p2m_change_entry_type_global()
> >>      to reset the p2m type, when an ioreq server tries to claim/disclaim
> >>      its ownership of p2m_ioreq_server;
> >>    - Comments changes.
> >> ---
> >>   xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
> >>   xen/arch/x86/hvm/emulate.c       | 65
> >> ++++++++++++++++++++++++++++++++---
> >>   xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
> >>   xen/arch/x86/mm/hap/nested_hap.c |  2 +-
> >>   xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
> >>   xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
> >>   xen/arch/x86/mm/p2m.c            | 74
> >> ++++++++++++++++++++++++++++++++++++++++
> >>   xen/arch/x86/mm/shadow/multi.c   |  3 +-
> >>   xen/include/asm-x86/hvm/ioreq.h  |  2 ++
> >>   xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
> >>   xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
> >>   xen/include/public/hvm/hvm_op.h  |  8 ++++-
> >>   12 files changed, 290 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
> >> 333c884..3f9484d 100644
> >> --- a/xen/arch/x86/hvm/dm.c
> >> +++ b/xen/arch/x86/hvm/dm.c
> >> @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
> >>
> >>   static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
> >> {
> >> +    if ( new == p2m_ioreq_server )
> >> +        return old == p2m_ram_rw;
> >> +
> >> +    if ( old == p2m_ioreq_server )
> >> +        return new == p2m_ram_rw;
> >> +
> >>       return p2m_is_ram(old) ||
> >> -           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
> >> -           (old == p2m_ioreq_server && new == p2m_ram_rw);
> >> +           (p2m_is_hole(old) && new == p2m_mmio_dm);
> >>   }
> >>
> >>   static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@
> >> static int set_mem_type(struct domain *d,
> >>            unlikely(data->mem_type == HVMMEM_unused) )
> >>           return -EINVAL;
> >>
> >> +    if ( data->mem_type  == HVMMEM_ioreq_server )
> >> +    {
> >> +        unsigned int flags;
> >> +
> >> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm.
> */
> >> +        if ( !hap_enabled(d) )
> >> +            return -EOPNOTSUPP;
> >> +
> >> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server
> mapped.
> >> */
> >> +        if ( !p2m_get_ioreq_server(d, &flags) )
> >> +            return -EINVAL;
> >> +    }
> >> +
> >>       while ( iter < data->nr )
> >>       {
> >>           unsigned long pfn = data->first_pfn + iter; @@ -365,6
> >> +383,21 @@ static int dm_op(domid_t domid,
> >>           break;
> >>       }
> >>
> >> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
> >> +    {
> >> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> >> +            &op.u.map_mem_type_to_ioreq_server;
> >> +
> >> +        rc = -EOPNOTSUPP;
> >> +        /* Only support for HAP enabled hvm. */
> > Isn't it obvious from code?
> 
> Yes. Can be removed.
> >> +        if ( !hap_enabled(d) )
> >> +            break;
> >> +
> >> +        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
> >> +                                              data->type, data->flags);
> >> +        break;
> >> +    }
> >> +
> >>       case XEN_DMOP_set_ioreq_server_state:
> >>       {
> >>           const struct xen_dm_op_set_ioreq_server_state *data = diff
> >> --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index
> >> f36d7c9..37139e6 100644
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
> >>       uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)  {
> >>       struct vcpu *curr = current;
> >> +    struct domain *currd = curr->domain;
> >>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
> >>       ioreq_t p = {
> >>           .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@
> >> -140,7
> >> +141,7 @@ static int hvmemul_do_io(
> >>                (p.dir != dir) ||
> >>                (p.df != df) ||
> >>                (p.data_is_ptr != data_is_addr) )
> >> -            domain_crash(curr->domain);
> >> +            domain_crash(currd);
> >>
> >>           if ( data_is_addr )
> >>               return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,64 @@
> >> static int hvmemul_do_io(
> >>           break;
> >>       case X86EMUL_UNHANDLEABLE:
> >>       {
> >> -        struct hvm_ioreq_server *s =
> >> -            hvm_select_ioreq_server(curr->domain, &p);
> >> +        /*
> >> +         * Xen isn't emulating the instruction internally, so see if
> >> +         * there's an ioreq server that can handle it. Rules:
> >> +         *
> >> +         * - PIO and "normal" MMIO run through
> >> + hvm_select_ioreq_server()
> > why highlights "normal" here? What does a "abnormal" MMIO mean here?
> > p2m_ioreq_server type?
> 
> Yes, it's just to differentiate the MMIO and the p2m_ioreq_server address,
> copied from George's previous comments.
> We can remove the "normal" here.

then you need add such explanation otherwise it's difficult for
other code reader to know its meaning.

> 
> 
> >> +         * to choose the ioreq server by range. If no server is found,
> >> +         * the access is ignored.
> >> +         *
> >> +         * - p2m_ioreq_server accesses are handled by the designated
> >> +         * ioreq_server for the domain, but there are some corner
> >> +         * cases:
> > since only one case is listed, "there is a corner case"
> 
> Another corner case is in patch 3/5 - handling the read-modify-write
> situations.
> Maybe the correct thing is to use word "case" in this patch and change
> it to "cases" in next patch. :-)

then leave it be. Not a big matter.

> 
> >> +         *
> >> +         *   - If the domain ioreq_server is NULL, assume there is a
> >> +         *   race between the unbinding of ioreq server and guest fault
> >> +         *   so re-try the instruction.
> >> +         */
> >> +        struct hvm_ioreq_server *s = NULL;
> >> +        p2m_type_t p2mt = p2m_invalid;
> >> +
> >> +        if ( is_mmio )
> >> +        {
> >> +            unsigned long gmfn = paddr_to_pfn(addr);
> >> +
> >> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
> >> +
> >> +            if ( p2mt == p2m_ioreq_server )
> >> +            {
> >> +                unsigned int flags;
> >> +
> >> +                /*
> >> +                 * Value of s could be stale, when we lost a race
> > better describe it in higher level, e.g. just "no ioreq server is
> > found".
> >
> > what's the meaning of "lost a race"? shouldn't it mean
> > "likely we suffer from a race with..."?
> >
> >> +                 * with dm_op which unmaps p2m_ioreq_server from the
> >> +                 * ioreq server. Yet there's no cheap way to avoid
> > again, not talking about specific code, focus on the operation,
> > e.g. "race with an unmap operation on the ioreq server"
> >
> >> +                 * this, so device model need to do the check.
> >> +                 */
> > How is above comment related to below line?
> 
> Well, the 's' returned by p2m_get_ioreq_server() can be stale - if the
> ioreq server is unmapped
> after p2m_get_ioreq_server() returns. Current rangeset code also has
> such issue if the PIO/MMIO
> is removed from the rangeset of the ioreq server after
> hvm_select_ioreq_server() returns.
> 
> Since using spinlock or domain_pause/unpause is too heavy weighted, we
> suggest the device
> model side check whether the received ioreq is a valid one.
> 
> Above comments are added, according to Jan & Paul's suggestion in v7, to
> let developer know
> we do not grantee the validity of 's' returned by
> p2m_get_ioreq_server/hvm_select_ioreq_server().
> 
> "Value of s could be stale, when we lost a race with..." is not for s
> being NULL, it's about s being
> not valid. For a NULL returned, it is...

Then it makes sense.

> 
> >
> >> +                s = p2m_get_ioreq_server(currd, &flags);
> >> +
> >> +                /*
> >> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
> > p2mt is definitely ioreq_server within this if condition.
> >
> >> +                 * we probably lost a race with unbinding of ioreq
> >> +                 * server, just retry the access.
> >> +                 */
> > looks redundant to earlier comment. Or earlier one should
> > be just removed?
> 
> ... described here, to just retry the access.
> 
> >> +                if ( s == NULL )
> >> +                {
> >> +                    rc = X86EMUL_RETRY;
> >> +                    vio->io_req.state = STATE_IOREQ_NONE;
> >> +                    break;
> >> +                }
> >> +            }
> >> +        }
> >> +
> >> +        /*
> >> +         * Value of s could be stale, when we lost a race with dm_op
> >> +         * which unmaps this PIO/MMIO address from the ioreq server.
> >> +         * The device model side need to do the check.
> >> +         */
> > another duplicated comment. below code is actually for 'normal'
> > MMIO case...
> 
> This is for another possible situation when 's' returned by
> hvm_select_ioreq_server()
> becomes stale later, when the PIO/MMIO is removed from the rangeset.
> 
> Logic in this hvmemul_do_io() has always been a bit mixed. I mean, many
> corner cases
> and race conditions:
>   - between the mapping/unmapping of PIO/MMIO from rangeset
>   - between mapping/unmapping of ioreq server from p2m_ioreq_server
> I tried to give much comments as I can when this patchset evolves, yet
> to find I just
> introduced more confusion...
> 
> Any suggestions?

maybe describe the whole story before the whole p2m_ioreq_server
branch?

	/* comment */
	if ( p2mt == p2m_ioreq_server )

Then you don't need duplicate a lot in specific code line?

> 
> >> +        if ( !s )
> >> +            s = hvm_select_ioreq_server(currd, &p);
> >>
> >>           /* If there is no suitable backing DM, just ignore accesses */
> >>           if ( !s )
> >> @@ -189,7 +246,7 @@ static int hvmemul_do_io(
> >>           else
> >>           {
> >>               rc = hvm_send_ioreq(s, &p, 0);
> >> -            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
> >> +            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
> >>                   vio->io_req.state = STATE_IOREQ_NONE;
> >>               else if ( data_is_addr )
> >>                   rc = X86EMUL_OKAY;
> >> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
> >> ad2edad..746799f 100644
> >> --- a/xen/arch/x86/hvm/ioreq.c
> >> +++ b/xen/arch/x86/hvm/ioreq.c
> >> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d,
> >> ioservid_t id)
> >>
> >>           domain_pause(d);
> >>
> >> +        p2m_destroy_ioreq_server(d, s);
> >> +
> >>           hvm_ioreq_server_disable(s, 0);
> >>
> >>           list_del(&s->list_entry);
> >> @@ -914,6 +916,42 @@ int
> >> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> >>       return rc;
> >>   }
> >>
> >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
> id,
> >> +                                     uint32_t type, uint32_t flags) {
> >> +    struct hvm_ioreq_server *s;
> >> +    int rc;
> >> +
> >> +    /* For now, only HVMMEM_ioreq_server is supported. */
> > obvious comment
> 
> IIRC, this comment(and the below one) is another changes that made
> according to
> some review comments, to remind that we can add new mem type in the
> future.
> So how about we add a line - "For the future, we can support other mem
> types"?
> 
> But that also sounds redundant to me. :)
> So I am also OK to remove this and below comments.

or add a general comment for whole function, indicating those
checks are extensible in the future.

> 
> >> +    if ( type != HVMMEM_ioreq_server )
> >> +        return -EINVAL;
> >> +
> >> +    /* For now, only write emulation is supported. */
> > ditto.
> >
> >> +    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
> >> +        return -EINVAL;
> >> +
> >> +    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> >> +
> >> +    rc = -ENOENT;
> >> +    list_for_each_entry ( s,
> >> +                          &d->arch.hvm_domain.ioreq_server.list,
> >> +                          list_entry )
> >> +    {
> >> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> >> +            continue;
> > any reason why we cannot let default server to claim this
> > new type?
> 
> Well, my understanding about default ioreq server is that it is only for
> legacy
> qemu and is not even created in the dm op hypercall. Latest device models(
> including qemu) are all not default ioreq server now.

ah, didn't realize it. Thanks for letting me know.

> 
> >> +
> >> +        if ( s->id == id )
> >> +        {
> >> +            rc = p2m_set_ioreq_server(d, flags, s);
> >> +            break;
> >> +        }
> >> +    }
> >> +
> >> +    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> >>                                  bool_t enabled)  { diff --git
> >> a/xen/arch/x86/mm/hap/nested_hap.c
> >> b/xen/arch/x86/mm/hap/nested_hap.c
> >> index 162afed..408ea7f 100644
> >> --- a/xen/arch/x86/mm/hap/nested_hap.c
> >> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> >> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain
> *p2m,
> >> paddr_t L1_gpa, paddr_t *L0_gpa,
> >>       if ( *p2mt == p2m_mmio_direct )
> >>           goto direct_mmio_out;
> >>       rc = NESTEDHVM_PAGEFAULT_MMIO;
> >> -    if ( *p2mt == p2m_mmio_dm )
> >> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
> >>           goto out;
> >>
> >>       rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
> >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> >> index 568944f..cc1eb21 100644
> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct
> >> p2m_domain *p2m, ept_entry_t *entry,
> >>               entry->r = entry->w = entry->x = 1;
> >>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >>               break;
> >> +        case p2m_ioreq_server:
> >> +            entry->r = 1;
> >> +            entry->w = !(p2m->ioreq.flags &
> >> XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> >> +            entry->x = 0;
> >> +            entry->a = !!cpu_has_vmx_ept_ad;
> >> +            entry->d = entry->w && entry->a;
> >> +            break;
> >>           case p2m_mmio_direct:
> >>               entry->r = entry->x = 1;
> >>               entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
> >> @@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct
> >> p2m_domain *p2m, ept_entry_t *entry,
> >>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
> >>               break;
> >>           case p2m_grant_map_ro:
> >> -        case p2m_ioreq_server:
> >>               entry->r = 1;
> >>               entry->w = entry->x = 0;
> >>               entry->a = !!cpu_has_vmx_ept_ad; diff --git
> >> a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index
> >> 07e2ccd..f6c45ec 100644
> >> --- a/xen/arch/x86/mm/p2m-pt.c
> >> +++ b/xen/arch/x86/mm/p2m-pt.c
> >> @@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
> >>       PGT_l3_page_table
> >>   };
> >>
> >> -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
> >> +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
> >> +                                       p2m_type_t t,
> >> +                                       mfn_t mfn,
> >>                                          unsigned int level)  {
> >>       unsigned long flags;
> >> @@ -92,8 +94,12 @@ static unsigned long
> p2m_type_to_flags(p2m_type_t t,
> >> mfn_t mfn,
> >>       default:
> >>           return flags | _PAGE_NX_BIT;
> >>       case p2m_grant_map_ro:
> >> -    case p2m_ioreq_server:
> >>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> >> +    case p2m_ioreq_server:
> >> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
> >> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
> >> +            return flags & ~_PAGE_RW;
> >> +        return flags;
> >>       case p2m_ram_ro:
> >>       case p2m_ram_logdirty:
> >>       case p2m_ram_shared:
> >> @@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m,
> >> unsigned long gfn)
> >>               p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn
> |
> >> ~mask)
> >>                                 ? p2m_ram_logdirty : p2m_ram_rw;
> >>               unsigned long mfn = l1e_get_pfn(e);
> >> -            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
> >> +            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
> >> +                                                    _mfn(mfn), level);
> >>
> >>               if ( level )
> >>               {
> >> @@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>           l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
> >>               ? l3e_from_pfn(mfn_x(mfn),
> >> -                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
> >> +                           p2m_type_to_flags(p2m, p2mt, mfn, 2) |
> >> + _PAGE_PSE)
> >>               : l3e_empty();
> >>           entry_content.l1 = l3e_content.l3;
> >>
> >> @@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>
> >>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
> >>               entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
> >> -                                             p2m_type_to_flags(p2mt, mfn, 0));
> >> +                                         p2m_type_to_flags(p2m, p2mt,
> >> + mfn, 0));
> >>           else
> >>               entry_content = l1e_empty();
> >>
> >> @@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
> >> unsigned long gfn, mfn_t mfn,
> >>           ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>           if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
> >>               l2e_content = l2e_from_pfn(mfn_x(mfn),
> >> -                                       p2m_type_to_flags(p2mt, mfn, 1) |
> >> +                                       p2m_type_to_flags(p2m, p2mt,
> >> + mfn, 1) |
> >>                                          _PAGE_PSE);
> >>           else
> >>               l2e_content = l2e_empty();
> >> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> >> a5651a3..dd4e477 100644
> >> --- a/xen/arch/x86/mm/p2m.c
> >> +++ b/xen/arch/x86/mm/p2m.c
> >> @@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct
> >> p2m_domain *p2m)
> >>       else
> >>           p2m_pt_init(p2m);
> >>
> >> +    spin_lock_init(&p2m->ioreq.lock);
> >> +
> >>       return ret;
> >>   }
> >>
> >> @@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain
> *d)
> >>       }
> >>   }
> >>
> >> +int p2m_set_ioreq_server(struct domain *d,
> >> +                         unsigned int flags,
> >> +                         struct hvm_ioreq_server *s) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    int rc;
> >> +
> >> +    /*
> >> +     * Use lock to prevent concurrent setting attempts
> >> +     * from multiple ioreq serers.
> > serers -> servers
> 
> Got it. Thanks.
> 
> >> +     */
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    /* Unmap ioreq server from p2m type by passing flags with 0. */
> >> +    if ( flags == 0 )
> >> +    {
> >> +        rc = -EINVAL;
> >> +        if ( p2m->ioreq.server != s )
> >> +            goto out;
> >> +
> >> +        p2m->ioreq.server = NULL;
> >> +        p2m->ioreq.flags = 0;
> >> +    }
> >> +    else
> >> +    {
> >> +        rc = -EBUSY;
> >> +        if ( p2m->ioreq.server != NULL )
> >> +            goto out;
> >> +
> >> +        p2m->ioreq.server = s;
> >> +        p2m->ioreq.flags = flags;
> >> +    }
> >> +
> >> +    rc = 0;
> >> +
> >> + out:
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >> +                                              unsigned int *flags) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +    struct hvm_ioreq_server *s;
> >> +
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    s = p2m->ioreq.server;
> >> +    *flags = p2m->ioreq.flags;
> >> +
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +    return s;
> >> +}
> >> +
> >> +void p2m_destroy_ioreq_server(const struct domain *d,
> >> +                              const struct hvm_ioreq_server *s) {
> >> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> +
> >> +    spin_lock(&p2m->ioreq.lock);
> >> +
> >> +    if ( p2m->ioreq.server == s )
> >> +    {
> >> +        p2m->ioreq.server = NULL;
> >> +        p2m->ioreq.flags = 0;
> >> +    }
> >> +
> >> +    spin_unlock(&p2m->ioreq.lock);
> >> +}
> >> +
> >>   void p2m_enable_hardware_log_dirty(struct domain *d)  {
> >>       struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git
> >> a/xen/arch/x86/mm/shadow/multi.c
> b/xen/arch/x86/mm/shadow/multi.c
> >> index 7ea9d81..521b639 100644
> >> --- a/xen/arch/x86/mm/shadow/multi.c
> >> +++ b/xen/arch/x86/mm/shadow/multi.c
> >> @@ -3269,8 +3269,7 @@ static int sh_page_fault(struct vcpu *v,
> >>       }
> >>
> >>       /* Need to hand off device-model MMIO to the device model */
> >> -    if ( p2mt == p2m_mmio_dm
> >> -         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
> >> +    if ( p2mt == p2m_mmio_dm )
> >>       {
> >>           gpa = guest_walk_to_gpa(&gw);
> >>           goto mmio;
> >> diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-
> >> x86/hvm/ioreq.h index fbf2c74..b43667a 100644
> >> --- a/xen/include/asm-x86/hvm/ioreq.h
> >> +++ b/xen/include/asm-x86/hvm/ioreq.h
> >> @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct
> domain
> >> *d, ioservid_t id,  int hvm_unmap_io_range_from_ioreq_server(struct
> >> domain *d, ioservid_t id,
> >>                                            uint32_t type, uint64_t start,
> >>                                            uint64_t end);
> >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
> id,
> >> +                                     uint32_t type, uint32_t flags);
> >>   int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> >>                                  bool_t enabled);
> >>
> >> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index
> >> 470d29d..3786680 100644
> >> --- a/xen/include/asm-x86/p2m.h
> >> +++ b/xen/include/asm-x86/p2m.h
> >> @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
> >>                          | p2m_to_mask(p2m_ram_paging_out)      \
> >>                          | p2m_to_mask(p2m_ram_paged)           \
> >>                          | p2m_to_mask(p2m_ram_paging_in)       \
> >> -                       | p2m_to_mask(p2m_ram_shared))
> >> +                       | p2m_to_mask(p2m_ram_shared)          \
> >> +                       | p2m_to_mask(p2m_ioreq_server))
> >>
> >>   /* Types that represent a physmap hole that is ok to replace with a
> shared
> >>    * entry */
> >> @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
> >>   #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
> >>                         | p2m_to_mask(p2m_ram_ro)         \
> >>                         | p2m_to_mask(p2m_grant_map_ro)   \
> >> -                      | p2m_to_mask(p2m_ram_shared)     \
> >> -                      | p2m_to_mask(p2m_ioreq_server))
> >> +                      | p2m_to_mask(p2m_ram_shared))
> >>
> >>   /* Write-discard types, which should discard the write operations */
> >>   #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
> >> @@ -336,6 +336,20 @@ struct p2m_domain {
> >>           struct ept_data ept;
> >>           /* NPT-equivalent structure could be added here. */
> >>       };
> >> +
> >> +     struct {
> >> +         spinlock_t lock;
> >> +         /*
> >> +          * ioreq server who's responsible for the emulation of
> >> +          * gfns with specific p2m type(for now, p2m_ioreq_server).
> >> +          */
> >> +         struct hvm_ioreq_server *server;
> >> +         /*
> >> +          * flags specifies whether read, write or both operations
> >> +          * are to be emulated by an ioreq server.
> >> +          */
> >> +         unsigned int flags;
> >> +     } ioreq;
> >>   };
> >>
> >>   /* get host p2m table */
> >> @@ -827,6 +841,12 @@ static inline unsigned int
> >> p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
> >>       return flags;
> >>   }
> >>
> >> +int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
> >> +                         struct hvm_ioreq_server *s); struct
> >> +hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >> +                                              unsigned int *flags);
> >> +void p2m_destroy_ioreq_server(const struct domain *d, const struct
> >> +hvm_ioreq_server *s);
> >> +
> >>   #endif /* _XEN_ASM_X86_P2M_H */
> >>
> >>   /*
> >> diff --git a/xen/include/public/hvm/dm_op.h
> >> b/xen/include/public/hvm/dm_op.h index f54cece..2a36833 100644
> >> --- a/xen/include/public/hvm/dm_op.h
> >> +++ b/xen/include/public/hvm/dm_op.h
> >> @@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
> >>       uint64_aligned_t addr;
> >>   };
> >>
> >> +/*
> >> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the
> >> IOREQ Server <id>
> >> + *                                      to specific memroy type <type>
> > memroy->memory
> 
> Right. Thanks. :)
> 
> B.R.
> Yu
> >> + *                                      for specific accesses <flags>
> >> + *
> >> + * For now, flags only accept the value of
> >> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
> >> + * which means only write operations are to be forwarded to an ioreq
> >> server.
> >> + * Support for the emulation of read operations can be added when an
> >> +ioreq
> >> + * server has such requirement in future.
> >> + */
> >> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
> >> +
> >> +struct xen_dm_op_map_mem_type_to_ioreq_server {
> >> +    ioservid_t id;      /* IN - ioreq server id */
> >> +    uint16_t type;      /* IN - memory type */
> >> +    uint32_t flags;     /* IN - types of accesses to be forwarded to the
> >> +                           ioreq server. flags with 0 means to unmap the
> >> +                           ioreq server */
> >> +
> >> +#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0) #define
> >> +XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
> >> +
> >> +    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
> >> +                           has to be set to zero by the caller */ };
> >> +
> >>   struct xen_dm_op {
> >>       uint32_t op;
> >>       uint32_t pad;
> >> @@ -336,6 +362,8 @@ struct xen_dm_op {
> >>           struct xen_dm_op_set_mem_type set_mem_type;
> >>           struct xen_dm_op_inject_event inject_event;
> >>           struct xen_dm_op_inject_msi inject_msi;
> >> +        struct xen_dm_op_map_mem_type_to_ioreq_server
> >> +                map_mem_type_to_ioreq_server;
> >>       } u;
> >>   };
> >>
> >> diff --git a/xen/include/public/hvm/hvm_op.h
> >> b/xen/include/public/hvm/hvm_op.h index bc00ef0..0bdafdf 100644
> >> --- a/xen/include/public/hvm/hvm_op.h
> >> +++ b/xen/include/public/hvm/hvm_op.h
> >> @@ -93,7 +93,13 @@ typedef enum {
> >>       HVMMEM_unused,             /* Placeholder; setting memory to this type
> >>                                     will fail for code after 4.7.0 */  #endif
> >> -    HVMMEM_ioreq_server
> >> +    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq
> server;
> >> type
> >> +                                  changes to this value are only allowed after
> >> +                                  an ioreq server has claimed its ownership.
> >> +                                  Only pages with HVMMEM_ram_rw are allowed to
> >> +                                  change to this type; conversely, pages with
> >> +                                  this type are only allowed to be changed back
> >> +                                  to HVMMEM_ram_rw. */
> >>   } hvmmem_type_t;
> >>
> >>   /* Hint from PV drivers for pagetable destruction. */
> >> --
> >> 1.9.1
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-22 10:12     ` Yu Zhang
@ 2017-03-24  9:37       ` Tian, Kevin
  2017-03-24 12:45         ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2017-03-24  9:37 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Paul Durrant, Lv,
	Zhiyuan, Jan Beulich

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Wednesday, March 22, 2017 6:12 PM
> 
> On 3/22/2017 4:10 PM, Tian, Kevin wrote:
> >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> >> Sent: Tuesday, March 21, 2017 10:53 AM
> >>
> >> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> >> entries need to be reset back to p2m_ram_rw. This patch does this
> >> asynchronously with the current p2m_change_entry_type_global()
> interface.
> >>
> >> This patch also disallows live migration, when there's still any
> >> outstanding p2m_ioreq_server entry left. The core reason is our
> >> current implementation of p2m_change_entry_type_global() can not tell
> >> the state of p2m_ioreq_server entries(can not decide if an entry is
> >> to be emulated or to be resynced).
> > Don't quite get this point. change_global is triggered only upon
> > unmap. At that point there is no ioreq server to emulate the write
> > operations on those entries. All the things required is just to change
> > the type. What's the exact decision required here?
> 
> Well, one situation I can recall is that if another ioreq server maps to this
> type, and live migration happens later. The resolve_misconfig() code cannot
> differentiate if an p2m_ioreq_server page is an obsolete to be synced, or a
> new one to be only emulated.

so if you disallow another mapping before obsolete pages are synced
as you just replied in another mail, then such limitation would be gone?

> 
> I gave some explanation on this issue in discussion during Jun 20 - 22 last
> year.
> 
> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
> on Jun 20
> and
> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
> on Jun 21
> 
> > btw does it mean that live migration can be still supported as long as
> > device model proactively unmaps write-protected pages before starting
> > live migration?
> >
> 
> Yes.

I'm not sure whether there'll be a sequence issue. I assume toolstack
will first request entering logdirty mode, do iterative memory copy,
and then stop VM including its virtual devices and then build final
image (including vGPU state). XenGT supports live migration today. 
vGPU device model is notified for do state save/restore only in the 
last step of that flow (as part of Qemu save/restore). If your design 
requires vGPU device model to first unmaps write-protected pages 
(which means incapable of serving more request from guest which 
is equivalently to stop vGPU) before toolstack enters logdirty mode, 
I'm worried the required changes to the whole live migration flow...

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-24  9:05         ` Yu Zhang
@ 2017-03-24 10:19           ` Jan Beulich
  2017-03-24 12:35             ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-24 10:19 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 24.03.17 at 10:05, <yu.c.zhang@linux.intel.com> wrote:
> On 3/23/2017 4:57 PM, Jan Beulich wrote:
>>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>>> On 3/22/2017 10:21 PM, Jan Beulich wrote:
>>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>>> @@ -177,8 +178,64 @@ static int hvmemul_do_io(
>>>>>            break;
>>>>>        case X86EMUL_UNHANDLEABLE:
>>>>>        {
>>>>> -        struct hvm_ioreq_server *s =
>>>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>>>> +        /*
>>>>> +         * Xen isn't emulating the instruction internally, so see if
>>>>> +         * there's an ioreq server that can handle it. Rules:
>>>>> +         *
>>>>> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
>>>>> +         * to choose the ioreq server by range. If no server is found,
>>>>> +         * the access is ignored.
>>>>> +         *
>>>>> +         * - p2m_ioreq_server accesses are handled by the designated
>>>>> +         * ioreq_server for the domain, but there are some corner
>>>>> +         * cases:
>>>>> +         *
>>>>> +         *   - If the domain ioreq_server is NULL, assume there is a
>>>>> +         *   race between the unbinding of ioreq server and guest fault
>>>>> +         *   so re-try the instruction.
>>>> And that retry won't come back here because of? (The answer
>>>> should not include any behavior added by subsequent patches.)
>>> You got me. :)
>>> In this patch, retry will come back here. It should be after patch 4 or
>>> patch 5 that the retry
>>> will be ignored(p2m type changed back to p2m_ram_rw after the unbinding).
>> In which case I think we shouldn't insist on you to change things, but
>> you should spell out very clearly that this patch should not go in
>> without the others going in at the same time.
> 
> So maybe it would be better we leave the retry part to a later patch, 
> say patch 4/5 or patch 5/5,
> and return unhandleable in this patch?

I don't follow. I've specifically suggested that you don't change
the code, but simply state clearly the requirement that patches
2...5 of this series should all go in at the same time. I don't mind
you making changes, but the risk then is that further round trips
may be required because of there being new issues with the
changes you may do.

>>>>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>>>>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>>>>> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>>>>>        if ( *p2mt == p2m_mmio_direct )
>>>>>            goto direct_mmio_out;
>>>>>        rc = NESTEDHVM_PAGEFAULT_MMIO;
>>>>> -    if ( *p2mt == p2m_mmio_dm )
>>>>> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>>>> Btw., how does this addition match up with the rc value being
>>>> assigned right before the if()?
>>> Well returning a NESTEDHVM_PAGEFAULT_MMIO in such case will trigger
>>> handle_mmio() later in
>>> hvm_hap_nested_page_fault(). Guess that is what we expected.
>> That's probably what is expected, but it's no MMIO which we're
>> doing in that case. And note that we've stopped abusing
>> handle_mmio() for non-MMIO purposes a little while ago (commit
>> 3dd00f7b56 ["x86/HVM: restrict permitted instructions during
>> special purpose emulation"]).
> 
> OK. So what about we just remove this "*p2mt == p2m_ioreq_server"?

Well, you must have had a reason to add it. To be honest, I don't
care too much about the nested code (as it's far from production
ready anyway), so leaving the code above untouched would be
fine with me, but taking care of adjustments to nested code where
they're actually needed would be even better. So the preferred
option is for you to explain why you've done the change above,
and why you think it's correct/needed. The next best option might
be to drop the change.

>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>>                entry->r = entry->w = entry->x = 1;
>>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>>                break;
>>>>> +        case p2m_ioreq_server:
>>>>> +            entry->r = 1;
>>>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>>>> Is this effectively open coded p2m_get_ioreq_server() actually
>>>> okay? If so, why does the function need to be used elsewhere,
>>>> instead of doing direct, lock-free accesses?
>>> Maybe your comments is about whether it is necessary to use the lock in
>>> p2m_get_ioreq_server()?
>>> I still believe so, it does not only protect the value of ioreq server,
>>> but also the flag together with it.
>>>
>>> Besides, it is used not only in the emulation process, but also the
>>> hypercall to set the mem type.
>>> So the lock can still provide some kind protection against the
>>> p2m_set_ioreq_server() - even it does
>>> not always do so.
>> The question, fundamentally, is about consistency: The same
>> access model should be followed universally, unless there is an
>> explicit reason for an exception.
> 
> Sorry, I do not quite understand. Why the consistency is broken?

Because you don't call p2m_get_ioreq_server() here (discarding
the return value, but using the flags).

> I think this lock at least protects the ioreq server and the flag. The 
> only exception
> is the one you mentioned - s could become stale which we agreed to let 
> the device
> model do the check. Without this lock, things would become more complex 
> - more
> race conditions...

Sure, all understood. I wasn't really suggesting to drop the locked
accesses, but instead I was using this to illustrate the non-locked
access (and hence the inconsistency with other code) here. As
said - if there's a good reason not to call the function here, I'm all
ears.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-24  9:05         ` Yu Zhang
@ 2017-03-24 10:37           ` Jan Beulich
  2017-03-24 12:36             ` Yu Zhang
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2017-03-24 10:37 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 24.03.17 at 10:05, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 3/23/2017 5:00 PM, Jan Beulich wrote:
>>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>>> On 3/22/2017 10:29 PM, Jan Beulich wrote:
>>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>>> --- a/xen/arch/x86/hvm/ioreq.c
>>>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>>>> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, 
> ioservid_t id,
>>>>>    
>>>>>        spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>>>>    
>>>>> +    if ( rc == 0 && flags == 0 )
>>>>> +    {
>>>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>>> +
>>>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>>> +    }
>>>> If you do this after dropping the lock, don't you risk a race with
>>>> another server mapping the type to itself?
>>> I believe it's OK. Remaining p2m_ioreq_server entries still needs to be
>>> cleaned anyway.
>> Are you refusing a new server mapping the type before being
>> done with the cleanup?
> 
> No. I meant even a new server is mapped, we can still sweep the p2m 
> table later asynchronously.
> But this reminds me other point - will a dm op be interrupted by another 
> one, or should it?

Interrupted? Two of them may run in parallel on different CPUs,
against the same target domain.

> Since we have patch 5/5 which sweep the p2m table right after the unmap 
> happens, maybe
> we should refuse any mapping requirement if there's remaining 
> p2m_ioreq_server entries.

That's what I've tried to hint at with my question.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-24  9:26       ` Tian, Kevin
@ 2017-03-24 12:34         ` Yu Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-24 12:34 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Tim Deegan,
	Paul Durrant, Lv, Zhiyuan, Jan Beulich



On 3/24/2017 5:26 PM, Tian, Kevin wrote:
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: Wednesday, March 22, 2017 6:13 PM
>>
>> On 3/22/2017 3:49 PM, Tian, Kevin wrote:
>>>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>>>> Sent: Tuesday, March 21, 2017 10:53 AM
>>>>
>>>> A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to
>> let
>>>> one ioreq server claim/disclaim its responsibility for the handling
>>>> of guest pages with p2m type p2m_ioreq_server. Users of this DMOP can
>>>> specify which kind of operation is supposed to be emulated in a
>>>> parameter named flags. Currently, this DMOP only support the emulation
>> of write operations.
>>>> And it can be further extended to support the emulation of read ones
>>>> if an ioreq server has such requirement in the future.
>>> p2m_ioreq_server was already introduced before. Do you want to give
>>> some background how current state is around that type which will be
>>> helpful about purpose of this patch?
>> Sorry? I thought the background is described in the cover letter.
>> Previously p2m_ioreq_server is only for write-protection, and is tracked in an
>> ioreq server's rangeset, this patch is to bind the p2m type with an ioreq
>> server directly.
> cover letter will not be in git repo. Better you can include it to make
> this commit along complete.

OK. Thanks.

>>>> For now, we only support one ioreq server for this p2m type, so once
>>>> an ioreq server has claimed its ownership, subsequent calls of the
>>>> XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>> triggering this new DMOP, with ioreq server id set to the current
>>>> owner's and flags parameter set to 0.
>>>>
>>>> Note both XEN_DMOP_map_mem_type_to_ioreq_server and
>> p2m_ioreq_server
>>>> are only supported for HVMs with HAP enabled.
>>>>
>>>> Also note that only after one ioreq server claims its ownership of
>>>> p2m_ioreq_server, will the p2m type change to p2m_ioreq_server be
>>>> allowed.
>>>>
>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>> ---
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>> Cc: Tim Deegan <tim@xen.org>
>>>>
>>>> changes in v8:
>>>>     - According to comments from Jan & Paul: comments changes in
>>>> hvmemul_do_io().
>>>>     - According to comments from Jan: remove the redundant code which
>>>> would only
>>>>       be useful for read emulations.
>>>>     - According to comments from Jan: change interface which maps mem
>>>> type to
>>>>       ioreq server, removed uint16_t pad and added an uint64_t opaque.
>>>>     - Address other comments from Jan, i.e. correct return values; remove
>> stray
>>>>       cast.
>>>>
>>>> changes in v7:
>>>>     - Use new ioreq server interface -
>>>> XEN_DMOP_map_mem_type_to_ioreq_server.
>>>>     - According to comments from George: removed
>>>> domain_pause/unpause() in
>>>>       hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>>>>       and we can avoid the:
>>>>       a> deadlock issue existed in v6 patch, between p2m lock and ioreq
>> server
>>>>          lock by using these locks in the same order - solved in patch 4;
>>>>       b> for race condition between vm exit and ioreq server unbinding, we
>> can
>>>>          just retry this instruction.
>>>>     - According to comments from Jan and George: continue to clarify logic
>> in
>>>>       hvmemul_do_io().
>>>>     - According to comments from Jan: clarify comment in
>>>> p2m_set_ioreq_server().
>>>>
>>>> changes in v6:
>>>>     - Clarify logic in hvmemul_do_io().
>>>>     - Use recursive lock for ioreq server lock.
>>>>     - Remove debug print when mapping ioreq server.
>>>>     - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>     - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>     - Add comments for HVMMEM_ioreq_server to note only changes
>>>>       to/from HVMMEM_ram_rw are permitted.
>>>>     - Add domain_pause/unpause() in
>> hvm_map_mem_type_to_ioreq_server()
>>>>       to avoid the race condition when a vm exit happens on a write-
>>>>       protected page, just to find the ioreq server has been unmapped
>>>>       already.
>>>>     - Introduce a seperate patch to delay the release of p2m
>>>>       lock to avoid the race condition.
>>>>     - Introduce a seperate patch to handle the read-modify-write
>>>>       operations on a write protected page.
>>>>
>>>> changes in v5:
>>>>     - Simplify logic in hvmemul_do_io().
>>>>     - Use natual width types instead of fixed width types when possible.
>>>>     - Do not grant executable permission for p2m_ioreq_server entries.
>>>>     - Clarify comments and commit message.
>>>>     - Introduce a seperate patch to recalculate the p2m types after
>>>>       the ioreq server unmaps the p2m_ioreq_server.
>>>>
>>>> changes in v4:
>>>>     - According to Paul's advice, add comments around the definition
>>>>       of HVMMEM_iore_server in hvm_op.h.
>>>>     - According to Wei Liu's comments, change the format of the commit
>>>>       message.
>>>>
>>>> changes in v3:
>>>>     - Only support write emulation in this patch;
>>>>     - Remove the code to handle race condition in hvmemul_do_io(),
>>>>     - No need to reset the p2m type after an ioreq server has disclaimed
>>>>       its ownership of p2m_ioreq_server;
>>>>     - Only allow p2m type change to p2m_ioreq_server after an ioreq
>>>>       server has claimed its ownership of p2m_ioreq_server;
>>>>     - Only allow p2m type change to p2m_ioreq_server from pages with
>> type
>>>>       p2m_ram_rw, and vice versa;
>>>>     - HVMOP_map_mem_type_to_ioreq_server interface change - use
>> uint16,
>>>>       instead of enum to specify the memory type;
>>>>     - Function prototype change to p2m_get_ioreq_server();
>>>>     - Coding style changes;
>>>>     - Commit message changes;
>>>>     - Add Tim's Acked-by.
>>>>
>>>> changes in v2:
>>>>     - Only support HAP enabled HVMs;
>>>>     - Replace p2m_mem_type_changed() with
>> p2m_change_entry_type_global()
>>>>       to reset the p2m type, when an ioreq server tries to claim/disclaim
>>>>       its ownership of p2m_ioreq_server;
>>>>     - Comments changes.
>>>> ---
>>>>    xen/arch/x86/hvm/dm.c            | 37 ++++++++++++++++++--
>>>>    xen/arch/x86/hvm/emulate.c       | 65
>>>> ++++++++++++++++++++++++++++++++---
>>>>    xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
>>>>    xen/arch/x86/mm/hap/nested_hap.c |  2 +-
>>>>    xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
>>>>    xen/arch/x86/mm/p2m-pt.c         | 19 +++++++----
>>>>    xen/arch/x86/mm/p2m.c            | 74
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>    xen/arch/x86/mm/shadow/multi.c   |  3 +-
>>>>    xen/include/asm-x86/hvm/ioreq.h  |  2 ++
>>>>    xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
>>>>    xen/include/public/hvm/dm_op.h   | 28 +++++++++++++++
>>>>    xen/include/public/hvm/hvm_op.h  |  8 ++++-
>>>>    12 files changed, 290 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index
>>>> 333c884..3f9484d 100644
>>>> --- a/xen/arch/x86/hvm/dm.c
>>>> +++ b/xen/arch/x86/hvm/dm.c
>>>> @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
>>>>
>>>>    static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
>>>> {
>>>> +    if ( new == p2m_ioreq_server )
>>>> +        return old == p2m_ram_rw;
>>>> +
>>>> +    if ( old == p2m_ioreq_server )
>>>> +        return new == p2m_ram_rw;
>>>> +
>>>>        return p2m_is_ram(old) ||
>>>> -           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
>>>> -           (old == p2m_ioreq_server && new == p2m_ram_rw);
>>>> +           (p2m_is_hole(old) && new == p2m_mmio_dm);
>>>>    }
>>>>
>>>>    static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@
>>>> static int set_mem_type(struct domain *d,
>>>>             unlikely(data->mem_type == HVMMEM_unused) )
>>>>            return -EINVAL;
>>>>
>>>> +    if ( data->mem_type  == HVMMEM_ioreq_server )
>>>> +    {
>>>> +        unsigned int flags;
>>>> +
>>>> +        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm.
>> */
>>>> +        if ( !hap_enabled(d) )
>>>> +            return -EOPNOTSUPP;
>>>> +
>>>> +        /* Do not change to HVMMEM_ioreq_server if no ioreq server
>> mapped.
>>>> */
>>>> +        if ( !p2m_get_ioreq_server(d, &flags) )
>>>> +            return -EINVAL;
>>>> +    }
>>>> +
>>>>        while ( iter < data->nr )
>>>>        {
>>>>            unsigned long pfn = data->first_pfn + iter; @@ -365,6
>>>> +383,21 @@ static int dm_op(domid_t domid,
>>>>            break;
>>>>        }
>>>>
>>>> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
>>>> +    {
>>>> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>>> +            &op.u.map_mem_type_to_ioreq_server;
>>>> +
>>>> +        rc = -EOPNOTSUPP;
>>>> +        /* Only support for HAP enabled hvm. */
>>> Isn't it obvious from code?
>> Yes. Can be removed.
>>>> +        if ( !hap_enabled(d) )
>>>> +            break;
>>>> +
>>>> +        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
>>>> +                                              data->type, data->flags);
>>>> +        break;
>>>> +    }
>>>> +
>>>>        case XEN_DMOP_set_ioreq_server_state:
>>>>        {
>>>>            const struct xen_dm_op_set_ioreq_server_state *data = diff
>>>> --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>> index
>>>> f36d7c9..37139e6 100644
>>>> --- a/xen/arch/x86/hvm/emulate.c
>>>> +++ b/xen/arch/x86/hvm/emulate.c
>>>> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
>>>>        uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)  {
>>>>        struct vcpu *curr = current;
>>>> +    struct domain *currd = curr->domain;
>>>>        struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>>>>        ioreq_t p = {
>>>>            .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@
>>>> -140,7
>>>> +141,7 @@ static int hvmemul_do_io(
>>>>                 (p.dir != dir) ||
>>>>                 (p.df != df) ||
>>>>                 (p.data_is_ptr != data_is_addr) )
>>>> -            domain_crash(curr->domain);
>>>> +            domain_crash(currd);
>>>>
>>>>            if ( data_is_addr )
>>>>                return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,64 @@
>>>> static int hvmemul_do_io(
>>>>            break;
>>>>        case X86EMUL_UNHANDLEABLE:
>>>>        {
>>>> -        struct hvm_ioreq_server *s =
>>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>>> +        /*
>>>> +         * Xen isn't emulating the instruction internally, so see if
>>>> +         * there's an ioreq server that can handle it. Rules:
>>>> +         *
>>>> +         * - PIO and "normal" MMIO run through
>>>> + hvm_select_ioreq_server()
>>> why highlights "normal" here? What does a "abnormal" MMIO mean here?
>>> p2m_ioreq_server type?
>> Yes, it's just to differentiate the MMIO and the p2m_ioreq_server address,
>> copied from George's previous comments.
>> We can remove the "normal" here.
> then you need add such explanation otherwise it's difficult for
> other code reader to know its meaning.

Maybe I should just remove the "normal", p2m_ioreq_server ones are ram 
pages after all.

>>
>>>> +         * to choose the ioreq server by range. If no server is found,
>>>> +         * the access is ignored.
>>>> +         *
>>>> +         * - p2m_ioreq_server accesses are handled by the designated
>>>> +         * ioreq_server for the domain, but there are some corner
>>>> +         * cases:
>>> since only one case is listed, "there is a corner case"
>> Another corner case is in patch 3/5 - handling the read-modify-write
>> situations.
>> Maybe the correct thing is to use word "case" in this patch and change
>> it to "cases" in next patch. :-)
> then leave it be. Not a big matter.

OK. Thanks!

>>>> +         *
>>>> +         *   - If the domain ioreq_server is NULL, assume there is a
>>>> +         *   race between the unbinding of ioreq server and guest fault
>>>> +         *   so re-try the instruction.
>>>> +         */
>>>> +        struct hvm_ioreq_server *s = NULL;
>>>> +        p2m_type_t p2mt = p2m_invalid;
>>>> +
>>>> +        if ( is_mmio )
>>>> +        {
>>>> +            unsigned long gmfn = paddr_to_pfn(addr);
>>>> +
>>>> +            get_gfn_query_unlocked(currd, gmfn, &p2mt);
>>>> +
>>>> +            if ( p2mt == p2m_ioreq_server )
>>>> +            {
>>>> +                unsigned int flags;
>>>> +
>>>> +                /*
>>>> +                 * Value of s could be stale, when we lost a race
>>> better describe it in higher level, e.g. just "no ioreq server is
>>> found".
>>>
>>> what's the meaning of "lost a race"? shouldn't it mean
>>> "likely we suffer from a race with..."?
>>>
>>>> +                 * with dm_op which unmaps p2m_ioreq_server from the
>>>> +                 * ioreq server. Yet there's no cheap way to avoid
>>> again, not talking about specific code, focus on the operation,
>>> e.g. "race with an unmap operation on the ioreq server"
>>>
>>>> +                 * this, so device model need to do the check.
>>>> +                 */
>>> How is above comment related to below line?
>> Well, the 's' returned by p2m_get_ioreq_server() can be stale - if the
>> ioreq server is unmapped
>> after p2m_get_ioreq_server() returns. Current rangeset code also has
>> such issue if the PIO/MMIO
>> is removed from the rangeset of the ioreq server after
>> hvm_select_ioreq_server() returns.
>>
>> Since using spinlock or domain_pause/unpause is too heavy weighted, we
>> suggest the device
>> model side check whether the received ioreq is a valid one.
>>
>> Above comments are added, according to Jan & Paul's suggestion in v7, to
>> let developer know
>> we do not grantee the validity of 's' returned by
>> p2m_get_ioreq_server/hvm_select_ioreq_server().
>>
>> "Value of s could be stale, when we lost a race with..." is not for s
>> being NULL, it's about s being
>> not valid. For a NULL returned, it is...
> Then it makes sense.
>
>>>> +                s = p2m_get_ioreq_server(currd, &flags);
>>>> +
>>>> +                /*
>>>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
>>> p2mt is definitely ioreq_server within this if condition.
>>>
>>>> +                 * we probably lost a race with unbinding of ioreq
>>>> +                 * server, just retry the access.
>>>> +                 */
>>> looks redundant to earlier comment. Or earlier one should
>>> be just removed?
>> ... described here, to just retry the access.
>>
>>>> +                if ( s == NULL )
>>>> +                {
>>>> +                    rc = X86EMUL_RETRY;
>>>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>>>> +                    break;
>>>> +                }
>>>> +            }
>>>> +        }
>>>> +
>>>> +        /*
>>>> +         * Value of s could be stale, when we lost a race with dm_op
>>>> +         * which unmaps this PIO/MMIO address from the ioreq server.
>>>> +         * The device model side need to do the check.
>>>> +         */
>>> another duplicated comment. below code is actually for 'normal'
>>> MMIO case...
>> This is for another possible situation when 's' returned by
>> hvm_select_ioreq_server()
>> becomes stale later, when the PIO/MMIO is removed from the rangeset.
>>
>> Logic in this hvmemul_do_io() has always been a bit mixed. I mean, many
>> corner cases
>> and race conditions:
>>    - between the mapping/unmapping of PIO/MMIO from rangeset
>>    - between mapping/unmapping of ioreq server from p2m_ioreq_server
>> I tried to give much comments as I can when this patchset evolves, yet
>> to find I just
>> introduced more confusion...
>>
>> Any suggestions?
> maybe describe the whole story before the whole p2m_ioreq_server
> branch?
>
> 	/* comment */
> 	if ( p2mt == p2m_ioreq_server )
>
> Then you don't need duplicate a lot in specific code line?

Yep. Let me try. :)

>>>> +        if ( !s )
>>>> +            s = hvm_select_ioreq_server(currd, &p);
>>>>
>>>>            /* If there is no suitable backing DM, just ignore accesses */
>>>>            if ( !s )
>>>> @@ -189,7 +246,7 @@ static int hvmemul_do_io(
>>>>            else
>>>>            {
>>>>                rc = hvm_send_ioreq(s, &p, 0);
>>>> -            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
>>>> +            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
>>>>                    vio->io_req.state = STATE_IOREQ_NONE;
>>>>                else if ( data_is_addr )
>>>>                    rc = X86EMUL_OKAY;
>>>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index
>>>> ad2edad..746799f 100644
>>>> --- a/xen/arch/x86/hvm/ioreq.c
>>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>>> @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d,
>>>> ioservid_t id)
>>>>
>>>>            domain_pause(d);
>>>>
>>>> +        p2m_destroy_ioreq_server(d, s);
>>>> +
>>>>            hvm_ioreq_server_disable(s, 0);
>>>>
>>>>            list_del(&s->list_entry);
>>>> @@ -914,6 +916,42 @@ int
>>>> hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>>>        return rc;
>>>>    }
>>>>
>>>> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t
>> id,
>>>> +                                     uint32_t type, uint32_t flags) {
>>>> +    struct hvm_ioreq_server *s;
>>>> +    int rc;
>>>> +
>>>> +    /* For now, only HVMMEM_ioreq_server is supported. */
>>> obvious comment
>> IIRC, this comment(and the below one) is another changes that made
>> according to
>> some review comments, to remind that we can add new mem type in the
>> future.
>> So how about we add a line - "For the future, we can support other mem
>> types"?
>>
>> But that also sounds redundant to me. :)
>> So I am also OK to remove this and below comments.
> or add a general comment for whole function, indicating those
> checks are extensible in the future.

OK. A general comment for the function sounds more reasonable.

Thanks
Yu

[snip]


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-24 10:19           ` Jan Beulich
@ 2017-03-24 12:35             ` Yu Zhang
  2017-03-24 13:09               ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Yu Zhang @ 2017-03-24 12:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/24/2017 6:19 PM, Jan Beulich wrote:
>>>> On 24.03.17 at 10:05, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/23/2017 4:57 PM, Jan Beulich wrote:
>>>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 3/22/2017 10:21 PM, Jan Beulich wrote:
>>>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> @@ -177,8 +178,64 @@ static int hvmemul_do_io(
>>>>>>             break;
>>>>>>         case X86EMUL_UNHANDLEABLE:
>>>>>>         {
>>>>>> -        struct hvm_ioreq_server *s =
>>>>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>>>>> +        /*
>>>>>> +         * Xen isn't emulating the instruction internally, so see if
>>>>>> +         * there's an ioreq server that can handle it. Rules:
>>>>>> +         *
>>>>>> +         * - PIO and "normal" MMIO run through hvm_select_ioreq_server()
>>>>>> +         * to choose the ioreq server by range. If no server is found,
>>>>>> +         * the access is ignored.
>>>>>> +         *
>>>>>> +         * - p2m_ioreq_server accesses are handled by the designated
>>>>>> +         * ioreq_server for the domain, but there are some corner
>>>>>> +         * cases:
>>>>>> +         *
>>>>>> +         *   - If the domain ioreq_server is NULL, assume there is a
>>>>>> +         *   race between the unbinding of ioreq server and guest fault
>>>>>> +         *   so re-try the instruction.
>>>>> And that retry won't come back here because of? (The answer
>>>>> should not include any behavior added by subsequent patches.)
>>>> You got me. :)
>>>> In this patch, retry will come back here. It should be after patch 4 or
>>>> patch 5 that the retry
>>>> will be ignored(p2m type changed back to p2m_ram_rw after the unbinding).
>>> In which case I think we shouldn't insist on you to change things, but
>>> you should spell out very clearly that this patch should not go in
>>> without the others going in at the same time.
>> So maybe it would be better we leave the retry part to a later patch,
>> say patch 4/5 or patch 5/5,
>> and return unhandleable in this patch?
> I don't follow. I've specifically suggested that you don't change
> the code, but simply state clearly the requirement that patches
> 2...5 of this series should all go in at the same time. I don't mind
> you making changes, but the risk then is that further round trips
> may be required because of there being new issues with the
> changes you may do.

Thanks, Jan. I'll keep the code, and add a note in the commit message of 
this patch.

>>>>>> --- a/xen/arch/x86/mm/hap/nested_hap.c
>>>>>> +++ b/xen/arch/x86/mm/hap/nested_hap.c
>>>>>> @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
>>>>>>         if ( *p2mt == p2m_mmio_direct )
>>>>>>             goto direct_mmio_out;
>>>>>>         rc = NESTEDHVM_PAGEFAULT_MMIO;
>>>>>> -    if ( *p2mt == p2m_mmio_dm )
>>>>>> +    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
>>>>> Btw., how does this addition match up with the rc value being
>>>>> assigned right before the if()?
>>>> Well returning a NESTEDHVM_PAGEFAULT_MMIO in such case will trigger
>>>> handle_mmio() later in
>>>> hvm_hap_nested_page_fault(). Guess that is what we expected.
>>> That's probably what is expected, but it's no MMIO which we're
>>> doing in that case. And note that we've stopped abusing
>>> handle_mmio() for non-MMIO purposes a little while ago (commit
>>> 3dd00f7b56 ["x86/HVM: restrict permitted instructions during
>>> special purpose emulation"]).
>> OK. So what about we just remove this "*p2mt == p2m_ioreq_server"?
> Well, you must have had a reason to add it. To be honest, I don't
> care too much about the nested code (as it's far from production
> ready anyway), so leaving the code above untouched would be
> fine with me, but taking care of adjustments to nested code where
> they're actually needed would be even better. So the preferred
> option is for you to explain why you've done the change above,
> and why you think it's correct/needed. The next best option might
> be to drop the change.

Got it. I now prefer to drop the change. This code was added at the 
early stage of this patchset when
we hope p2m_ioreq_server can always trigger a handle_mmio(), but frankly 
we do not, and probably
will not use the nested case in the foreseeable future.

>>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>>>                 entry->r = entry->w = entry->x = 1;
>>>>>>                 entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>>>                 break;
>>>>>> +        case p2m_ioreq_server:
>>>>>> +            entry->r = 1;
>>>>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>>>>> Is this effectively open coded p2m_get_ioreq_server() actually
>>>>> okay? If so, why does the function need to be used elsewhere,
>>>>> instead of doing direct, lock-free accesses?
>>>> Maybe your comments is about whether it is necessary to use the lock in
>>>> p2m_get_ioreq_server()?
>>>> I still believe so, it does not only protect the value of ioreq server,
>>>> but also the flag together with it.
>>>>
>>>> Besides, it is used not only in the emulation process, but also the
>>>> hypercall to set the mem type.
>>>> So the lock can still provide some kind protection against the
>>>> p2m_set_ioreq_server() - even it does
>>>> not always do so.
>>> The question, fundamentally, is about consistency: The same
>>> access model should be followed universally, unless there is an
>>> explicit reason for an exception.
>> Sorry, I do not quite understand. Why the consistency is broken?
> Because you don't call p2m_get_ioreq_server() here (discarding
> the return value, but using the flags).

Oh. I see. You are worrying about p2m->ioreq.server/flag being cleared 
due to an unmap.
Below are some situation I can think of...

>> I think this lock at least protects the ioreq server and the flag. The
>> only exception
>> is the one you mentioned - s could become stale which we agreed to let
>> the device
>> model do the check. Without this lock, things would become more complex
>> - more
>> race conditions...
> Sure, all understood. I wasn't really suggesting to drop the locked
> accesses, but instead I was using this to illustrate the non-locked
> access (and hence the inconsistency with other code) here. As
> said - if there's a good reason not to call the function here, I'm all
> ears.

... the ept_p2m_type_to_flags() is used in 2 cases:
1> in resolve_misconfig() to do the recalculation for a p2m entry - in 
such case, we won't
meet a p2m_ioreq_server type in ept_p2m_type_to_flags(), because it's 
already recalculated
back to p2m_ram_rw in the caller;

2> triggered by p2m_set_entry() which is trying to set mem type for some 
gfns. The only
scenario I can imagine(which is also an extreme one) that may have 
racing potential is:
during the mem type setting process, the ioreq server unmapping is 
triggered on another
cpu, which then invalidates the value p2m->ioreq.flag, in such case 
ept_p2m_type_to_flag()
will return an entry with writable permission. But right after the mem 
type setting is done,
p2m lock will be freed and the unmapping hypercall will get opportunity 
to reset this p2m
entry. We do not need a write-protected entry in such case anyway.

Besides, even the p2m_get_ioreq_server() is used here in 
ept_p2m_type_to_flags(), it can only
provide a limited protection, there's a chance that returned flag in 
ept_p2m_type_to_flags()
be outdated in above situation.

So, since we do not really care about the p2m entry permission in such 
extreme situation, and
we can not 100% guarantee the lock will protect it, I do not think we 
need to use the lock here.

I am not sure if this explanation is convincing to you, but I'm also 
open to be convinced. :-)

Thanks
Yu


> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-24 10:37           ` Jan Beulich
@ 2017-03-24 12:36             ` Yu Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-24 12:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/24/2017 6:37 PM, Jan Beulich wrote:
>>>> On 24.03.17 at 10:05, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/23/2017 5:00 PM, Jan Beulich wrote:
>>>>>> On 23.03.17 at 04:23, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 3/22/2017 10:29 PM, Jan Beulich wrote:
>>>>>>>> On 21.03.17 at 03:52, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> --- a/xen/arch/x86/hvm/ioreq.c
>>>>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>>>>> @@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d,
>> ioservid_t id,
>>>>>>     
>>>>>>         spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>>>>>     
>>>>>> +    if ( rc == 0 && flags == 0 )
>>>>>> +    {
>>>>>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>>>> +
>>>>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>>>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>>>> +    }
>>>>> If you do this after dropping the lock, don't you risk a race with
>>>>> another server mapping the type to itself?
>>>> I believe it's OK. Remaining p2m_ioreq_server entries still needs to be
>>>> cleaned anyway.
>>> Are you refusing a new server mapping the type before being
>>> done with the cleanup?
>> No. I meant even a new server is mapped, we can still sweep the p2m
>> table later asynchronously.
>> But this reminds me other point - will a dm op be interrupted by another
>> one, or should it?
> Interrupted? Two of them may run in parallel on different CPUs,
> against the same target domain.

Right. That's possible.

>> Since we have patch 5/5 which sweep the p2m table right after the unmap
>> happens, maybe
>> we should refuse any mapping requirement if there's remaining
>> p2m_ioreq_server entries.
> That's what I've tried to hint at with my question.

Oh. I see. Thank you, Jan. :-)

Yu
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-24  9:37       ` Tian, Kevin
@ 2017-03-24 12:45         ` Yu Zhang
  0 siblings, 0 replies; 42+ messages in thread
From: Yu Zhang @ 2017-03-24 12:45 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Paul Durrant, Lv,
	Zhiyuan, Jan Beulich



On 3/24/2017 5:37 PM, Tian, Kevin wrote:
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: Wednesday, March 22, 2017 6:12 PM
>>
>> On 3/22/2017 4:10 PM, Tian, Kevin wrote:
>>>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>>>> Sent: Tuesday, March 21, 2017 10:53 AM
>>>>
>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>>>> entries need to be reset back to p2m_ram_rw. This patch does this
>>>> asynchronously with the current p2m_change_entry_type_global()
>> interface.
>>>> This patch also disallows live migration, when there's still any
>>>> outstanding p2m_ioreq_server entry left. The core reason is our
>>>> current implementation of p2m_change_entry_type_global() can not tell
>>>> the state of p2m_ioreq_server entries(can not decide if an entry is
>>>> to be emulated or to be resynced).
>>> Don't quite get this point. change_global is triggered only upon
>>> unmap. At that point there is no ioreq server to emulate the write
>>> operations on those entries. All the things required is just to change
>>> the type. What's the exact decision required here?
>> Well, one situation I can recall is that if another ioreq server maps to this
>> type, and live migration happens later. The resolve_misconfig() code cannot
>> differentiate if an p2m_ioreq_server page is an obsolete to be synced, or a
>> new one to be only emulated.
> so if you disallow another mapping before obsolete pages are synced
> as you just replied in another mail, then such limitation would be gone?

Well, it may still have problems.

Even we know the remaining p2m_ioreq_server is an definitely an outdated 
one, resolve_misconfig()
still lacks information to decide if this p2m type is supposed to reset 
to p2m_ram_rw(when in a p2m
sweeping), or it shall be marked as p2m_log_dirty(during a live 
migration process). Current code
in resolve_misconfig() lacks such information.

I mean, we surely can reset these pages to p2m_log_dirty directly if 
global_logdirty is on. But I do not
think this is the correct thing, although these pages will be reset back 
to p2m_ram_rw later in the ept
violation handling process, it might cause some clean pages(which were 
write protected once, but no
longer now) to be tracked and be sent to the target later if live 
migration is triggered.

>> I gave some explanation on this issue in discussion during Jun 20 - 22 last
>> year.
>>
>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02426.html
>> on Jun 20
>> and
>> http://lists.xenproject.org/archives/html/xen-devel/2016-06/msg02575.html
>> on Jun 21
>>
>>> btw does it mean that live migration can be still supported as long as
>>> device model proactively unmaps write-protected pages before starting
>>> live migration?
>>>
>> Yes.
> I'm not sure whether there'll be a sequence issue. I assume toolstack
> will first request entering logdirty mode, do iterative memory copy,
> and then stop VM including its virtual devices and then build final
> image (including vGPU state). XenGT supports live migration today.
> vGPU device model is notified for do state save/restore only in the
> last step of that flow (as part of Qemu save/restore). If your design
> requires vGPU device model to first unmaps write-protected pages
> (which means incapable of serving more request from guest which
> is equivalently to stop vGPU) before toolstack enters logdirty mode,
> I'm worried the required changes to the whole live migration flow...

Well, previously, George has written a draft patch to solve this issue 
and make the lazy p2m
change code more generic(with more information added in the p2m 
structure). We believed it
may also solve the live migration restriction(with no changes in the 
interface between the
hypervisor and device model). But there's some bugs, neither of us have 
enough time to debug.
So I'd like to submit our current code first, and with that solution 
being mature, we can remove
the live migration restriction. :-)

Thanks
Yu

> Thanks
> Kevin
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-24 12:35             ` Yu Zhang
@ 2017-03-24 13:09               ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2017-03-24 13:09 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 24.03.17 at 13:35, <yu.c.zhang@linux.intel.com> wrote:
> Besides, even the p2m_get_ioreq_server() is used here in 
> ept_p2m_type_to_flags(), it can only
> provide a limited protection, there's a chance that returned flag in 
> ept_p2m_type_to_flags()
> be outdated in above situation.
> 
> So, since we do not really care about the p2m entry permission in such 
> extreme situation, and
> we can not 100% guarantee the lock will protect it, I do not think we 
> need to use the lock here.
> 
> I am not sure if this explanation is convincing to you, but I'm also 
> open to be convinced. :-)

Well, okay, keep it as it is then. I'm not fully convinced, but the
change wouldn't buy us much.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled.
  2017-03-21  2:52 ` [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
@ 2017-03-29 13:39   ` George Dunlap
  2017-03-29 13:50     ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: George Dunlap @ 2017-03-29 13:39 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Andrew Cooper, Paul Durrant, Lv, Zhiyuan, Jan Beulich, xen-devel

On Tue, Mar 21, 2017 at 2:52 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
> Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
> select the ioreq server. For example, operations on gfns with
> p2m_ioreq_server type will be delivered to a corresponding ioreq
> server, and this requires that the p2m type not be switched back
> to p2m_ram_rw during the emulation process. To avoid this race
> condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
> until mmio is handled.
>
> Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
> before the handling of mmio, due to a deadlock risk between the p2m
> lock and the event lock(in commit 77b8dfe). Later, a per-event channel
> lock was introduced in commit de6acb7, to send events. So we do not
> need to worry about the deadlock issue.
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Who else's ack does this need?  It seems like this is a general
improvement that can go in without the rest of the series.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled.
  2017-03-29 13:39   ` George Dunlap
@ 2017-03-29 13:50     ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2017-03-29 13:50 UTC (permalink / raw)
  To: Yu Zhang, George Dunlap
  Cc: Andrew Cooper, Paul Durrant, Zhiyuan Lv, xen-devel

>>> On 29.03.17 at 15:39, <dunlapg@umich.edu> wrote:
> On Tue, Mar 21, 2017 at 2:52 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
>> Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
>> select the ioreq server. For example, operations on gfns with
>> p2m_ioreq_server type will be delivered to a corresponding ioreq
>> server, and this requires that the p2m type not be switched back
>> to p2m_ram_rw during the emulation process. To avoid this race
>> condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
>> until mmio is handled.
>>
>> Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
>> before the handling of mmio, due to a deadlock risk between the p2m
>> lock and the event lock(in commit 77b8dfe). Later, a per-event channel
>> lock was introduced in commit de6acb7, to send events. So we do not
>> need to worry about the deadlock issue.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Who else's ack does this need?  It seems like this is a general
> improvement that can go in without the rest of the series.

I didn't put it in on its own because it didn't really seem useful to
me without the rest of the series.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-03-29 13:50 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-21  2:52 [PATCH v9 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2017-03-21  2:52 ` [PATCH v9 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2017-03-29 13:39   ` George Dunlap
2017-03-29 13:50     ` Jan Beulich
2017-03-21  2:52 ` [PATCH v9 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2017-03-22  7:49   ` Tian, Kevin
2017-03-22 10:12     ` Yu Zhang
2017-03-24  9:26       ` Tian, Kevin
2017-03-24 12:34         ` Yu Zhang
2017-03-22 14:21   ` Jan Beulich
2017-03-23  3:23     ` Yu Zhang
2017-03-23  8:57       ` Jan Beulich
2017-03-24  9:05         ` Yu Zhang
2017-03-24 10:19           ` Jan Beulich
2017-03-24 12:35             ` Yu Zhang
2017-03-24 13:09               ` Jan Beulich
2017-03-21  2:52 ` [PATCH v9 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2017-03-22 14:22   ` Jan Beulich
2017-03-21  2:52 ` [PATCH v9 4/5] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
2017-03-21 10:05   ` Paul Durrant
2017-03-22  8:10   ` Tian, Kevin
2017-03-22 10:12     ` Yu Zhang
2017-03-24  9:37       ` Tian, Kevin
2017-03-24 12:45         ` Yu Zhang
2017-03-22 14:29   ` Jan Beulich
2017-03-23  3:23     ` Yu Zhang
2017-03-23  9:00       ` Jan Beulich
2017-03-24  9:05         ` Yu Zhang
2017-03-24 10:37           ` Jan Beulich
2017-03-24 12:36             ` Yu Zhang
2017-03-21  2:52 ` [PATCH v9 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2017-03-21 10:00   ` Paul Durrant
2017-03-21 11:15     ` Yu Zhang
2017-03-21 13:49       ` Paul Durrant
2017-03-21 14:14         ` Yu Zhang
2017-03-22  8:28   ` Tian, Kevin
2017-03-22  8:54     ` Jan Beulich
2017-03-22  9:02       ` Tian, Kevin
2017-03-22 14:39   ` Jan Beulich
2017-03-23  3:23     ` Yu Zhang
2017-03-23  9:02       ` Jan Beulich
2017-03-24  9:05         ` Yu Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.