All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
@ 2017-04-06 15:53 Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 1/6] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel; +Cc: zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to GPU 
I/O resources, e.g. the PPGTT(per-process graphic translation tables).
Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges
to be emulated. To select an ioreq server, the rangeset is searched to
see if the I/O range is recorded. However, number of ram pages to be
tracked may exceed the upper limit of rangeset.

Previously, one solution was proposed to refactor the rangeset, and 
extend its upper limit. However, after 12 rounds discussion, we have
decided to drop this approach due to security concerns. Now this new 
patch series introduces a new mem type, HVMMEM_ioreq_server, and added
hvm operations to let one ioreq server to claim its ownership of ram 
pages with this type. Accesses to a page of this type will be handled
by the specified ioreq server directly.


Yu Zhang (6):
  x86/ioreq server: Release the p2m lock after mmio is handled.
  x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to
    an ioreq server.
  x86/ioreq server: Add device model wrappers for new DMOP
  x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server
    pages.
  x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server
    entries.
  x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server
    entries when an ioreq server unmaps.

 tools/libs/devicemodel/core.c                   | 25 +++++++
 tools/libs/devicemodel/include/xendevicemodel.h | 18 +++++
 tools/libs/devicemodel/libxendevicemodel.map    |  1 +
 xen/arch/x86/hvm/dm.c                           | 70 +++++++++++++++++-
 xen/arch/x86/hvm/emulate.c                      | 95 ++++++++++++++++++++++--
 xen/arch/x86/hvm/hvm.c                          |  7 +-
 xen/arch/x86/hvm/ioreq.c                        | 52 ++++++++++++++
 xen/arch/x86/mm/hap/hap.c                       |  9 +++
 xen/arch/x86/mm/p2m-ept.c                       | 32 ++++++++-
 xen/arch/x86/mm/p2m-pt.c                        | 49 ++++++++++---
 xen/arch/x86/mm/p2m.c                           | 96 +++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c                  |  3 +-
 xen/include/asm-x86/hvm/ioreq.h                 |  2 +
 xen/include/asm-x86/p2m.h                       | 40 +++++++++--
 xen/include/public/hvm/dm_op.h                  | 28 ++++++++
 xen/include/public/hvm/hvm_op.h                 |  8 ++-
 16 files changed, 506 insertions(+), 29 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v12 1/6] x86/ioreq server: Release the p2m lock after mmio is handled.
  2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
@ 2017-04-06 15:53 ` Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
select the ioreq server. For example, operations on gfns with
p2m_ioreq_server type will be delivered to a corresponding ioreq
server, and this requires that the p2m type not be switched back
to p2m_ram_rw during the emulation process. To avoid this race
condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
until mmio is handled.

Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
before the handling of mmio, due to a deadlock risk between the p2m
lock and the event lock(in commit 77b8dfe). Later, a per-event channel
lock was introduced in commit de6acb7, to send events. So we do not
need to worry about the deadlock issue.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

changes in v4: 
  - According to comments from Jan: remove the redundant "rc = 0" code.
---
 xen/arch/x86/hvm/hvm.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index eba6e9d..ac7deff 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1829,15 +1829,10 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
          (npfec.write_access &&
           (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) )
     {
-        __put_gfn(p2m, gfn);
-        if ( ap2m_active )
-            __put_gfn(hostp2m, gfn);
-
-        rc = 0;
         if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec) )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         rc = 1;
-        goto out;
+        goto out_put_gfn;
     }
 
     /* Check if the page has been paged out */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 1/6] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
@ 2017-04-06 15:53 ` Yu Zhang
  2017-04-07  7:33   ` Tian, Kevin
  2017-04-06 15:53 ` [PATCH v12 3/6] x86/ioreq server: Add device model wrappers for new DMOP Yu Zhang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul Durrant, zhiyuan.lv, Jan Beulich

Previously, p2m_ioreq_server is used to write-protect guest ram
pages, which are tracked with ioreq server's rangeset. However,
number of ram pages to be tracked may exceed the upper limit of
rangeset.

Now, a new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added
to let one ioreq server claim/disclaim its responsibility for the
handling of guest pages with p2m type p2m_ioreq_server. Users of
this DMOP can specify which kind of operation is supposed to be
emulated in a parameter named flags. Currently, this DMOP only
support the emulation of write operations. And it can be further
extended to support the emulation of read ones if an ioreq server
has such requirement in the future.

For now, we only support one ioreq server for this p2m type, so
once an ioreq server has claimed its ownership, subsequent calls
of the XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can
also disclaim the ownership of guest ram pages with p2m_ioreq_server,
by triggering this new DMOP, with ioreq server id set to the current
owner's and flags parameter set to 0.

Note:
a> both XEN_DMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
are only supported for HVMs with HAP enabled.

b> only after one ioreq server claims its ownership of p2m_ioreq_server,
will the p2m type change to p2m_ioreq_server be allowed.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Note: this patch shall be accepted together with the following ones in
this series.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Tim Deegan <tim@xen.org>

changes in v11:
  - According to comments from Jan: commit message change.
  - Added "Reviewed-by: Jan Beulich <jbeulich@suse.com>"
  - Added "Reviewed-by: George Dunlap <george.dunlap@citrix.com>"

changes in v10:
  - According to comments from Jan: add a new patch for the libdevicemodel and 
    libxc interface.
  - According to comments from Jan: remove p2m_destroy_ioreq_server(), use 
    p2m_set_ioreq_server(d, 0, s) instead.
  - According to comments from Jan & Kevin: comments changes in hvmemul_do_io().
  - According to comments from Jan & Kevin: commit message  changes.

changes in v8:
  - According to comments from Jan & Paul: comments changes in hvmemul_do_io().
  - According to comments from Jan: remove the redundant code which would only
    be useful for read emulations.
  - According to comments from Jan: change interface which maps mem type to
    ioreq server, removed uint16_t pad and added an uint64_t opaque.
  - Address other comments from Jan, i.e. correct return values; remove stray
    cast.

changes in v7:
  - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
  - According to comments from George: removed domain_pause/unpause() in
    hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
    and we can avoid the:
    a> deadlock issue existed in v6 patch, between p2m lock and ioreq server
       lock by using these locks in the same order - solved in patch 4;
    b> for race condition between vm exit and ioreq server unbinding, we can
       just retry this instruction.
  - According to comments from Jan and George: continue to clarify logic in
    hvmemul_do_io().
  - According to comments from Jan: clarify comment in p2m_set_ioreq_server().

changes in v6:
  - Clarify logic in hvmemul_do_io().
  - Use recursive lock for ioreq server lock.
  - Remove debug print when mapping ioreq server.
  - Clarify code in ept_p2m_type_to_flags() for consistency.
  - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
  - Add comments for HVMMEM_ioreq_server to note only changes
    to/from HVMMEM_ram_rw are permitted.
  - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
    to avoid the race condition when a vm exit happens on a write-
    protected page, just to find the ioreq server has been unmapped
    already.
  - Introduce a seperate patch to delay the release of p2m
    lock to avoid the race condition.
  - Introduce a seperate patch to handle the read-modify-write
    operations on a write protected page.

changes in v5:
  - Simplify logic in hvmemul_do_io().
  - Use natual width types instead of fixed width types when possible.
  - Do not grant executable permission for p2m_ioreq_server entries.
  - Clarify comments and commit message.
  - Introduce a seperate patch to recalculate the p2m types after
    the ioreq server unmaps the p2m_ioreq_server.

changes in v4:
  - According to Paul's advice, add comments around the definition
    of HVMMEM_iore_server in hvm_op.h.
  - According to Wei Liu's comments, change the format of the commit
    message.

changes in v3:
  - Only support write emulation in this patch;
  - Remove the code to handle race condition in hvmemul_do_io(),
  - No need to reset the p2m type after an ioreq server has disclaimed
    its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server after an ioreq
    server has claimed its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server from pages with type
    p2m_ram_rw, and vice versa;
  - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
    instead of enum to specify the memory type;
  - Function prototype change to p2m_get_ioreq_server();
  - Coding style changes;
  - Commit message changes;
  - Add Tim's Acked-by.

changes in v2:
  - Only support HAP enabled HVMs;
  - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
    to reset the p2m type, when an ioreq server tries to claim/disclaim
    its ownership of p2m_ioreq_server;
  - Comments changes.
---
 xen/arch/x86/hvm/dm.c           | 35 +++++++++++++++++++++--
 xen/arch/x86/hvm/emulate.c      | 61 ++++++++++++++++++++++++++++++++++++++---
 xen/arch/x86/hvm/ioreq.c        | 44 +++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m-ept.c       |  8 +++++-
 xen/arch/x86/mm/p2m-pt.c        | 19 +++++++++----
 xen/arch/x86/mm/p2m.c           | 58 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c  |  3 +-
 xen/include/asm-x86/hvm/ioreq.h |  2 ++
 xen/include/asm-x86/p2m.h       | 25 +++++++++++++++--
 xen/include/public/hvm/dm_op.h  | 28 +++++++++++++++++++
 xen/include/public/hvm/hvm_op.h |  8 +++++-
 11 files changed, 272 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 333c884..7e0da81 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
 
 static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
 {
+    if ( new == p2m_ioreq_server )
+        return old == p2m_ram_rw;
+
+    if ( old == p2m_ioreq_server )
+        return new == p2m_ram_rw;
+
     return p2m_is_ram(old) ||
-           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
-           (old == p2m_ioreq_server && new == p2m_ram_rw);
+           (p2m_is_hole(old) && new == p2m_mmio_dm);
 }
 
 static int set_mem_type(struct domain *d,
@@ -202,6 +207,18 @@ static int set_mem_type(struct domain *d,
          unlikely(data->mem_type == HVMMEM_unused) )
         return -EINVAL;
 
+    if ( data->mem_type  == HVMMEM_ioreq_server )
+    {
+        unsigned int flags;
+
+        if ( !hap_enabled(d) )
+            return -EOPNOTSUPP;
+
+        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */
+        if ( !p2m_get_ioreq_server(d, &flags) )
+            return -EINVAL;
+    }
+
     while ( iter < data->nr )
     {
         unsigned long pfn = data->first_pfn + iter;
@@ -365,6 +382,20 @@ static int dm_op(domid_t domid,
         break;
     }
 
+    case XEN_DMOP_map_mem_type_to_ioreq_server:
+    {
+        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
+            &op.u.map_mem_type_to_ioreq_server;
+
+        rc = -EOPNOTSUPP;
+        if ( !hap_enabled(d) )
+            break;
+
+        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                              data->type, data->flags);
+        break;
+    }
+
     case XEN_DMOP_set_ioreq_server_state:
     {
         const struct xen_dm_op_set_ioreq_server_state *data =
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 2d92957..dc6f1f2 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -100,6 +100,7 @@ static int hvmemul_do_io(
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
@@ -141,7 +142,7 @@ static int hvmemul_do_io(
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
-            domain_crash(curr->domain);
+            domain_crash(currd);
 
         if ( data_is_addr )
             return X86EMUL_UNHANDLEABLE;
@@ -178,8 +179,60 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
     {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
+        /*
+         * Xen isn't emulating the instruction internally, so see if there's
+         * an ioreq server that can handle it.
+         *
+         * Rules:
+         * A> PIO or MMIO accesses run through hvm_select_ioreq_server() to
+         * choose the ioreq server by range. If no server is found, the access
+         * is ignored.
+         *
+         * B> p2m_ioreq_server accesses are handled by the designated
+         * ioreq server for the domain, but there are some corner cases:
+         *
+         *   - If the domain ioreq server is NULL, it's likely we suffer from
+         *   a race with an unmap operation on the ioreq server, so re-try the
+         *   instruction.
+         *
+         * Note: Even when an ioreq server is found, its value could become
+         * stale later, because it is possible that
+         *
+         *   - the PIO or MMIO address is removed from the rangeset of the
+         *   ioreq server, before the event is delivered to the device model.
+         *
+         *   - the p2m_ioreq_server type is unmapped from the ioreq server,
+         *   before the event is delivered to the device model.
+         *
+         * However, there's no cheap approach to avoid above situations in xen,
+         * so the device model side needs to check the incoming ioreq event.
+         */
+        struct hvm_ioreq_server *s = NULL;
+        p2m_type_t p2mt = p2m_invalid;
+
+        if ( is_mmio )
+        {
+            unsigned long gmfn = paddr_to_pfn(addr);
+
+            get_gfn_query_unlocked(currd, gmfn, &p2mt);
+
+            if ( p2mt == p2m_ioreq_server )
+            {
+                unsigned int flags;
+
+                s = p2m_get_ioreq_server(currd, &flags);
+
+                if ( s == NULL )
+                {
+                    rc = X86EMUL_RETRY;
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+            }
+        }
+
+        if ( !s )
+            s = hvm_select_ioreq_server(currd, &p);
 
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
@@ -190,7 +243,7 @@ static int hvmemul_do_io(
         else
         {
             rc = hvm_send_ioreq(s, &p, 0);
-            if ( rc != X86EMUL_RETRY || curr->domain->is_shutting_down )
+            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
                 vio->io_req.state = STATE_IOREQ_NONE;
             else if ( data_is_addr )
                 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index ad2edad..5bf3b6d 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
         domain_pause(d);
 
+        p2m_set_ioreq_server(d, 0, s);
+
         hvm_ioreq_server_disable(s, 0);
 
         list_del(&s->list_entry);
@@ -914,6 +916,48 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+/*
+ * Map or unmap an ioreq server to specific memory type. For now, only
+ * HVMMEM_ioreq_server is supported, and in the future new types can be
+ * introduced, e.g. HVMMEM_ioreq_serverX mapped to ioreq server X. And
+ * currently, only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in the future.
+ */
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    if ( type != HVMMEM_ioreq_server )
+        return -EINVAL;
+
+    if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            rc = p2m_set_ioreq_server(d, flags, s);
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled)
 {
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 568944f..cc1eb21 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->r = entry->w = entry->x = 1;
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
+        case p2m_ioreq_server:
+            entry->r = 1;
+            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
+            entry->x = 0;
+            entry->a = !!cpu_has_vmx_ept_ad;
+            entry->d = entry->w && entry->a;
+            break;
         case p2m_mmio_direct:
             entry->r = entry->x = 1;
             entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
@@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
         case p2m_grant_map_ro:
-        case p2m_ioreq_server:
             entry->r = 1;
             entry->w = entry->x = 0;
             entry->a = !!cpu_has_vmx_ept_ad;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 268b232..c0055f3 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
     PGT_l3_page_table
 };
 
-static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
+static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
+                                       p2m_type_t t,
+                                       mfn_t mfn,
                                        unsigned int level)
 {
     unsigned long flags;
@@ -92,8 +94,12 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
     default:
         return flags | _PAGE_NX_BIT;
     case p2m_grant_map_ro:
-    case p2m_ioreq_server:
         return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
+    case p2m_ioreq_server:
+        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
+        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
+            return flags & ~_PAGE_RW;
+        return flags;
     case p2m_ram_ro:
     case p2m_ram_logdirty:
     case p2m_ram_shared:
@@ -440,7 +446,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
                               ? p2m_ram_logdirty : p2m_ram_rw;
             unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
+            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+                                                    _mfn(mfn), level);
 
             if ( level )
             {
@@ -578,7 +585,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
             ? l3e_from_pfn(mfn_x(mfn),
-                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
+                           p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
 
@@ -615,7 +622,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
-                                             p2m_type_to_flags(p2mt, mfn, 0));
+                                         p2m_type_to_flags(p2m, p2mt, mfn, 0));
         else
             entry_content = l1e_empty();
 
@@ -652,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt, mfn, 1) |
+                                       p2m_type_to_flags(p2m, p2mt, mfn, 1) |
                                        _PAGE_PSE);
         else
             l2e_content = l2e_empty();
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d38004c..b84add0 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     else
         p2m_pt_init(p2m);
 
+    spin_lock_init(&p2m->ioreq.lock);
+
     return ret;
 }
 
@@ -286,6 +288,62 @@ void p2m_memory_type_changed(struct domain *d)
     }
 }
 
+int p2m_set_ioreq_server(struct domain *d,
+                         unsigned int flags,
+                         struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+
+    /*
+     * Use lock to prevent concurrent setting attempts
+     * from multiple ioreq servers.
+     */
+    spin_lock(&p2m->ioreq.lock);
+
+    /* Unmap ioreq server from p2m type by passing flags with 0. */
+    if ( flags == 0 )
+    {
+        rc = -EINVAL;
+        if ( p2m->ioreq.server != s )
+            goto out;
+
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+    else
+    {
+        rc = -EBUSY;
+        if ( p2m->ioreq.server != NULL )
+            goto out;
+
+        p2m->ioreq.server = s;
+        p2m->ioreq.flags = flags;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock(&p2m->ioreq.lock);
+
+    return rc;
+}
+
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&p2m->ioreq.lock);
+
+    s = p2m->ioreq.server;
+    *flags = p2m->ioreq.flags;
+
+    spin_unlock(&p2m->ioreq.lock);
+    return s;
+}
+
 void p2m_enable_hardware_log_dirty(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 4798c93..5195d61 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3306,8 +3306,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     /* Need to hand off device-model MMIO to the device model */
-    if ( p2mt == p2m_mmio_dm
-         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
+    if ( p2mt == p2m_mmio_dm )
     {
         gpa = guest_walk_to_gpa(&gw);
         goto mmio;
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index fbf2c74..b43667a 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
 int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
                                          uint32_t type, uint64_t start,
                                          uint64_t end);
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags);
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index bc189d1..4521620 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
                        | p2m_to_mask(p2m_ram_paging_out)      \
                        | p2m_to_mask(p2m_ram_paged)           \
                        | p2m_to_mask(p2m_ram_paging_in)       \
-                       | p2m_to_mask(p2m_ram_shared))
+                       | p2m_to_mask(p2m_ram_shared)          \
+                       | p2m_to_mask(p2m_ioreq_server))
 
 /* Types that represent a physmap hole that is ok to replace with a shared
  * entry */
@@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
 #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
                       | p2m_to_mask(p2m_ram_ro)         \
                       | p2m_to_mask(p2m_grant_map_ro)   \
-                      | p2m_to_mask(p2m_ram_shared)     \
-                      | p2m_to_mask(p2m_ioreq_server))
+                      | p2m_to_mask(p2m_ram_shared))
 
 /* Write-discard types, which should discard the write operations */
 #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
@@ -336,6 +336,20 @@ struct p2m_domain {
         struct ept_data ept;
         /* NPT-equivalent structure could be added here. */
     };
+
+     struct {
+         spinlock_t lock;
+         /*
+          * ioreq server who's responsible for the emulation of
+          * gfns with specific p2m type(for now, p2m_ioreq_server).
+          */
+         struct hvm_ioreq_server *server;
+         /*
+          * flags specifies whether read, write or both operations
+          * are to be emulated by an ioreq server.
+          */
+         unsigned int flags;
+     } ioreq;
 };
 
 /* get host p2m table */
@@ -827,6 +841,11 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
     return flags;
 }
 
+int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
+                         struct hvm_ioreq_server *s);
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags);
+
 #endif /* _XEN_ASM_X86_P2M_H */
 
 /*
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index f54cece..5ea79ef 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -318,6 +318,32 @@ struct xen_dm_op_inject_msi {
     uint64_aligned_t addr;
 };
 
+/*
+ * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
+ *                                      to specific memory type <type>
+ *                                      for specific accesses <flags>
+ *
+ * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
+ * which means only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in future.
+ */
+#define XEN_DMOP_map_mem_type_to_ioreq_server 15
+
+struct xen_dm_op_map_mem_type_to_ioreq_server {
+    ioservid_t id;      /* IN - ioreq server id */
+    uint16_t type;      /* IN - memory type */
+    uint32_t flags;     /* IN - types of accesses to be forwarded to the
+                           ioreq server. flags with 0 means to unmap the
+                           ioreq server */
+
+#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0)
+#define XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
+
+    uint64_t opaque;    /* IN/OUT - only used for hypercall continuation,
+                           has to be set to zero by the caller */
+};
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -336,6 +362,8 @@ struct xen_dm_op {
         struct xen_dm_op_set_mem_type set_mem_type;
         struct xen_dm_op_inject_event inject_event;
         struct xen_dm_op_inject_msi inject_msi;
+        struct xen_dm_op_map_mem_type_to_ioreq_server
+                map_mem_type_to_ioreq_server;
     } u;
 };
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index bc00ef0..0bdafdf 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -93,7 +93,13 @@ typedef enum {
     HVMMEM_unused,             /* Placeholder; setting memory to this type
                                   will fail for code after 4.7.0 */
 #endif
-    HVMMEM_ioreq_server
+    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
+                                  changes to this value are only allowed after
+                                  an ioreq server has claimed its ownership.
+                                  Only pages with HVMMEM_ram_rw are allowed to
+                                  change to this type; conversely, pages with
+                                  this type are only allowed to be changed back
+                                  to HVMMEM_ram_rw. */
 } hvmmem_type_t;
 
 /* Hint from PV drivers for pagetable destruction. */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v12 3/6] x86/ioreq server: Add device model wrappers for new DMOP
  2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 1/6] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2017-04-06 15:53 ` Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 4/6] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Paul Durrant, Ian Jackson, zhiyuan.lv

A new device model wrapper is added for the newly introduced
DMOP - XEN_DMOP_map_mem_type_to_ioreq_server.

Since currently this DMOP only supports the emulation of write
operations, attempts to trigger the DMOP with values other than
XEN_DMOP_IOREQ_MEM_ACCESS_WRITE or 0(to unmap the ioreq server)
shall fail. The wrapper shall be updated once read operations
are also to be emulated in the future.

Also note currently this DMOP only supports one memory type,
and can be extended in the future to map multiple memory types
to multiple ioreq servers, e.g. mapping HVMMEM_ioreq_serverX to
ioreq server X, This wrapper shall be updated when such change
is made.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

changes in v3: 
  - Added "Acked-by: Wei Liu <wei.liu2@citrix.com>".

changes in v2: 
  - According to Paul and Wei's comments: drop the compat wrapper changes.
  - Added "Reviewed-by: Paul Durrant <paul.durrant@citrix.com>".
---
 tools/libs/devicemodel/core.c                   | 25 +++++++++++++++++++++++++
 tools/libs/devicemodel/include/xendevicemodel.h | 18 ++++++++++++++++++
 tools/libs/devicemodel/libxendevicemodel.map    |  1 +
 3 files changed, 44 insertions(+)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index a85cb49..ff09819 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -244,6 +244,31 @@ int xendevicemodel_unmap_io_range_from_ioreq_server(
     return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
 }
 
+int xendevicemodel_map_mem_type_to_ioreq_server(
+    xendevicemodel_handle *dmod, domid_t domid, ioservid_t id, uint16_t type,
+    uint32_t flags)
+{
+    struct xen_dm_op op;
+    struct xen_dm_op_map_mem_type_to_ioreq_server *data;
+
+    if (type != HVMMEM_ioreq_server ||
+        flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) {
+        errno = EINVAL;
+        return -1;
+    }
+
+    memset(&op, 0, sizeof(op));
+
+    op.op = XEN_DMOP_map_mem_type_to_ioreq_server;
+    data = &op.u.map_mem_type_to_ioreq_server;
+
+    data->id = id;
+    data->type = type;
+    data->flags = flags;
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
 int xendevicemodel_map_pcidev_to_ioreq_server(
     xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
     uint16_t segment, uint8_t bus, uint8_t device, uint8_t function)
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h
index b3f600e..1da216f 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -104,6 +104,24 @@ int xendevicemodel_unmap_io_range_from_ioreq_server(
     uint64_t start, uint64_t end);
 
 /**
+ * This function registers/deregisters a memory type for emulation.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced.
+ * @parm id the IOREQ Server id.
+ * @parm type the memory type to be emulated. For now, only HVMMEM_ioreq_server
+ *            is supported, and in the future new types can be introduced, e.g.
+ *            HVMMEM_ioreq_serverX mapped to ioreq server X.
+ * @parm flags operations to be emulated; 0 for unmap. For now, only write
+ *             operations will be emulated and can be extended to emulate
+ *             read ones in the future.
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_map_mem_type_to_ioreq_server(
+    xendevicemodel_handle *dmod, domid_t domid, ioservid_t id, uint16_t type,
+    uint32_t flags);
+
+/**
  * This function registers a PCI device for config space emulation.
  *
  * @parm dmod a handle to an open devicemodel interface.
diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
index 45c773e..130222c 100644
--- a/tools/libs/devicemodel/libxendevicemodel.map
+++ b/tools/libs/devicemodel/libxendevicemodel.map
@@ -5,6 +5,7 @@ VERS_1.0 {
 		xendevicemodel_get_ioreq_server_info;
 		xendevicemodel_map_io_range_to_ioreq_server;
 		xendevicemodel_unmap_io_range_from_ioreq_server;
+		xendevicemodel_map_mem_type_to_ioreq_server;
 		xendevicemodel_map_pcidev_to_ioreq_server;
 		xendevicemodel_unmap_pcidev_from_ioreq_server;
 		xendevicemodel_destroy_ioreq_server;
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v12 4/6] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (2 preceding siblings ...)
  2017-04-06 15:53 ` [PATCH v12 3/6] x86/ioreq server: Add device model wrappers for new DMOP Yu Zhang
@ 2017-04-06 15:53 ` Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-04-06 15:53 ` [PATCH v12 6/6] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  5 siblings, 0 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

In ept_handle_violation(), write violations are also treated as
read violations. And when a VM is accessing a write-protected
address with read-modify-write instructions, the read emulation
process is triggered first.

For p2m_ioreq_server pages, current ioreq server only forwards
the write operations to the device model. Therefore when such page
is being accessed by a read-modify-write instruction, the read
operations should be emulated here in hypervisor. This patch provides
such a handler to copy the data to the buffer.

Note: MMIOs with p2m_mmio_dm type do not need such special treatment
because both reads and writes will go to the device mode.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

changes in v4: 
  - Added "Reviewed-by: Jan Beulich <jbeulich@suse.com>" with one comment
    change in hvmemul_do_io().

changes in v3: 
  - According to comments from Jan: clarify comments in hvmemul_do_io().

changes in v2: 
  - According to comments from Jan: rename mem_ops to ioreq_server_ops.
  - According to comments from Jan: use hvm_copy_from_guest_phys() in
    ioreq_server_read(), instead of do it by myself.
---
 xen/arch/x86/hvm/emulate.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index dc6f1f2..4de3936 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -95,6 +95,26 @@ static const struct hvm_io_handler null_handler = {
     .ops = &null_ops
 };
 
+static int ioreq_server_read(const struct hvm_io_handler *io_handler,
+                    uint64_t addr,
+                    uint32_t size,
+                    uint64_t *data)
+{
+    if ( hvm_copy_from_guest_phys(data, addr, size) != HVMCOPY_okay )
+        return X86EMUL_UNHANDLEABLE;
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops ioreq_server_ops = {
+    .read = ioreq_server_read,
+    .write = null_write
+};
+
+static const struct hvm_io_handler ioreq_server_handler = {
+    .ops = &ioreq_server_ops
+};
+
 static int hvmemul_do_io(
     bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
@@ -195,6 +215,9 @@ static int hvmemul_do_io(
          *   a race with an unmap operation on the ioreq server, so re-try the
          *   instruction.
          *
+         *   - If the accesss is a read, this could be part of a
+         *   read-modify-write instruction, emulate the read first.
+         *
          * Note: Even when an ioreq server is found, its value could become
          * stale later, because it is possible that
          *
@@ -228,6 +251,17 @@ static int hvmemul_do_io(
                     vio->io_req.state = STATE_IOREQ_NONE;
                     break;
                 }
+
+                /*
+                 * This is part of a read-modify-write instruction.
+                 * Emulate the read part so we have the value available.
+                 */
+                if ( dir == IOREQ_READ )
+                {
+                    rc = hvm_process_io_intercept(&ioreq_server_handler, &p);
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
             }
         }
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (3 preceding siblings ...)
  2017-04-06 15:53 ` [PATCH v12 4/6] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
@ 2017-04-06 15:53 ` Yu Zhang
  2017-04-06 16:45   ` George Dunlap
                     ` (2 more replies)
  2017-04-06 15:53 ` [PATCH v12 6/6] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  5 siblings, 3 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Paul Durrant, zhiyuan.lv, Jan Beulich

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

This patch disallows mapping of an ioreq server, when there's still
p2m_ioreq_server entry left, in case another mapping occurs right after
the current one being unmapped, releases its lock, with p2m table not
synced yet.

This patch also disallows live migration, when there's remaining
p2m_ioreq_server entry in p2m table. The core reason is our current
implementation of p2m_change_entry_type_global() lacks information
to resync p2m_ioreq_server entries correctly if global_logdirty is
on.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

changes in v7:
  - According to comments from George: add code to increase the
    entry_count.
  - Comment changes in {ept,p2m_pt}_set_entry.

changes in v6:
  - According to comments from Jan & George: move the count from
    p2m_change_type_one() to {ept,p2m_pt}_set_entry.
  - According to comments from George: comments change.

changes in v5:
  - According to comments from Jan: use unsigned long for entry_count;
  - According to comments from Jan: refuse mapping requirement when
    there's p2m_ioreq_server remained in p2m table.
  - Added "Reviewed-by: Paul Durrant <paul.durrant@citrix.com>"

changes in v4:
  - According to comments from Jan: use ASSERT() instead of 'if'
    condition in p2m_change_type_one().
  - According to comments from Jan: commit message changes to mention
    the p2m_ioreq_server are all based on 4K sized pages.

changes in v3:
  - Move the synchronously resetting logic into patch 5.
  - According to comments from Jan: introduce p2m_check_changeable()
    to clarify the p2m type change code.
  - According to comments from George: use locks in the same order
    to avoid deadlock, call p2m_change_entry_type_global() after unmap
    of the ioreq server is finished.

changes in v2:
  - Move the calculation of ioreq server page entry_cout into
    p2m_change_type_one() so that we do not need a seperate lock.
    Note: entry_count is also calculated in resolve_misconfig()/
    do_recalc(), fortunately callers of both routines have p2m
    lock protected already.
  - Simplify logic in hvmop_set_mem_type().
  - Introduce routine p2m_finish_type_change() to walk the p2m
    table and do the p2m reset.
---
 xen/arch/x86/hvm/ioreq.c  |  8 ++++++++
 xen/arch/x86/mm/hap/hap.c |  9 +++++++++
 xen/arch/x86/mm/p2m-ept.c | 24 +++++++++++++++++++++++-
 xen/arch/x86/mm/p2m-pt.c  | 30 ++++++++++++++++++++++++++++--
 xen/arch/x86/mm/p2m.c     |  9 +++++++++
 xen/include/asm-x86/p2m.h |  9 ++++++++-
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 5bf3b6d..07a6c26 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
 
     spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
 
+    if ( rc == 0 && flags == 0 )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..4b591fe 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
  */
 static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /*
+     * Refuse to turn on global log-dirty mode if
+     * there are outstanding p2m_ioreq_server pages.
+     */
+    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+        return -EBUSY;
+
     /* turn on PG_log_dirty bit in paging mode */
     paging_lock(d);
     d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..ecd5ceb 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                     e.ipat = ipat;
                     if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
                     {
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                         {
+                             ASSERT(p2m->ioreq.entry_count > 0);
+                             p2m->ioreq.entry_count--;
+                         }
+
                          e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
                                      ? p2m_ram_logdirty : p2m_ram_rw;
                          ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
@@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                     old_entry.suppress_ve : 1;
 
+    /*
+     * p2m_ioreq_server is only used for 4K pages, so the
+     * count shall only happen on ept page table entries.
+     */
+    if ( p2mt == p2m_ioreq_server )
+    {
+        ASSERT(i == 0);
+        p2m->ioreq.entry_count++;
+    }
+
+    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
+    {
+        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
+        p2m->ioreq.entry_count--;
+    }
+
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
         old_entry.epte = 0;
@@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
     if ( is_epte_valid(ept_entry) )
     {
         if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
+             p2m_check_changeable(ept_entry->sa_p2mt) )
             *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
                                                       : p2m_ram_rw;
         else
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index c0055f3..4b2ff9e 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -436,11 +436,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
          needs_recalc(l1, *pent) )
     {
         l1_pgentry_t e = *pent;
+        p2m_type_t p2mt_old;
 
         if ( !valid_recalc(l1, e) )
             P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                       p2m->domain->domain_id, gfn, level);
-        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
+        p2mt_old = p2m_flags_to_type(l1e_get_flags(e));
+        if ( p2m_is_changeable(p2mt_old) )
         {
             unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
@@ -460,6 +462,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                      mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
+
+            if ( p2mt_old == p2m_ioreq_server )
+            {
+                ASSERT(p2m->ioreq.entry_count > 0);
+                p2m->ioreq.entry_count--;
+            }
+
             e = l1e_from_pfn(mfn, flags);
             p2m_add_iommu_flags(&e, level,
                                 (p2mt == p2m_ram_rw)
@@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
     if ( page_order == PAGE_ORDER_4K )
     {
+        p2m_type_t p2mt_old;
+
         rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
                             L2_PAGETABLE_SHIFT - PAGE_SHIFT,
                             L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
@@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         if ( entry_content.l1 != 0 )
             p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
 
+        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
+
+        /*
+         * p2m_ioreq_server is only used for 4K pages, so
+         * the count shall only be performed for level 1 entries.
+         */
+        if ( p2mt == p2m_ioreq_server )
+            p2m->ioreq.entry_count++;
+
+        if ( p2mt_old == p2m_ioreq_server )
+        {
+            ASSERT(p2m->ioreq.entry_count > 0);
+            p2m->ioreq.entry_count--;
+        }
+
         /* level 1 entry */
         p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
         /* NB: paging_write_p2m_entry() handles tlb flushes properly */
@@ -729,7 +755,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
                                      struct p2m_domain *p2m, unsigned long gfn)
 {
-    if ( !recalc || !p2m_is_changeable(t) )
+    if ( !recalc || !p2m_check_changeable(t) )
         return t;
     return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
                                                 : p2m_ram_rw;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b84add0..4169d18 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
         if ( p2m->ioreq.server != NULL )
             goto out;
 
+        /*
+         * It is possible that an ioreq server has just been unmapped,
+         * released the spin lock, with some p2m_ioreq_server entries
+         * in p2m table remained. We shall refuse another ioreq server
+         * mapping request in such case.
+         */
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            goto out;
+
         p2m->ioreq.server = s;
         p2m->ioreq.flags = flags;
     }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 4521620..e7e390d 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
 
 /* Types that can be subject to bulk transitions. */
 #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
-                              | p2m_to_mask(p2m_ram_logdirty) )
+                              | p2m_to_mask(p2m_ram_logdirty) \
+                              | p2m_to_mask(p2m_ioreq_server) )
+
+#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
 
 #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
 
@@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
 #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
 #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
+#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
 #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
 #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
 /* Grant types are *not* considered valid, because they can be
@@ -178,6 +182,8 @@ typedef unsigned int p2m_query_t;
 
 #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) & P2M_INVALID_MFN_TYPES)
 
+#define p2m_check_changeable(t) (p2m_is_changeable(t) && !p2m_is_ioreq(t))
+
 typedef enum {
     p2m_host,
     p2m_nested,
@@ -349,6 +355,7 @@ struct p2m_domain {
           * are to be emulated by an ioreq server.
           */
          unsigned int flags;
+         unsigned long entry_count;
      } ioreq;
 };
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v12 6/6] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (4 preceding siblings ...)
  2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
@ 2017-04-06 15:53 ` Yu Zhang
  5 siblings, 0 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-06 15:53 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
synchronously by iterating the p2m table.

The synchronous resetting is necessary because we need to guarantee
the p2m table is clean before another ioreq server is mapped. And
since the sweeping of p2m table could be time consuming, it is done
with hypercall continuation.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

changes in v4: 
  - Added "Reviewed-by: Paul Durrant <paul.durrant@citrix.com>"
  - Added "Reviewed-by: Jan Beulich <jbeulich@suse.com>"
  - Added "Reviewed-by: George Dunlap <george.dunlap@citrix.com>"

changes in v3: 
  - According to comments from Paul: use mar_nr, instead of
    last_gfn for p2m_finish_type_change().
  - According to comments from Jan: use gfn_t as type of
    first_gfn in p2m_finish_type_change().
  - According to comments from Jan: simplify the if condition
    before using p2m_finish_type_change().

changes in v2: 
  - According to comments from Jan and Andrew: do not use the 
    HVMOP type hypercall continuation method. Instead, adding
    an opaque in xen_dm_op_map_mem_type_to_ioreq_server to
    store the gfn.
  - According to comments from Jan: change routine's comments
    and name of parameters of p2m_finish_type_change().

changes in v1:
  - This patch is splitted from patch 4 of last version.
  - According to comments from Jan: update the gfn_start for
    when use hypercall continuation to reset the p2m type.
  - According to comments from Jan: use min() to compare gfn_end
    and max mapped pfn in p2m_finish_type_change()
---
 xen/arch/x86/hvm/dm.c     | 41 ++++++++++++++++++++++++++++++++++++++---
 xen/arch/x86/mm/p2m.c     | 29 +++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  6 ++++++
 3 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 7e0da81..d72b7bd 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -384,15 +384,50 @@ static int dm_op(domid_t domid,
 
     case XEN_DMOP_map_mem_type_to_ioreq_server:
     {
-        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
+        struct xen_dm_op_map_mem_type_to_ioreq_server *data =
             &op.u.map_mem_type_to_ioreq_server;
+        unsigned long first_gfn = data->opaque;
+
+        const_op = false;
 
         rc = -EOPNOTSUPP;
         if ( !hap_enabled(d) )
             break;
 
-        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
-                                              data->type, data->flags);
+        if ( first_gfn == 0 )
+            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                                  data->type, data->flags);
+        else
+            rc = 0;
+
+        /*
+         * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
+         * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
+         */
+        if ( rc == 0 && data->flags == 0 )
+        {
+            struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+            while ( read_atomic(&p2m->ioreq.entry_count) &&
+                    first_gfn <= p2m->max_mapped_pfn )
+            {
+                /* Iterate p2m table for 256 gfns each time. */
+                p2m_finish_type_change(d, _gfn(first_gfn), 256,
+                                       p2m_ioreq_server, p2m_ram_rw);
+
+                first_gfn += 256;
+
+                /* Check for continuation if it's not the last iteration. */
+                if ( first_gfn <= p2m->max_mapped_pfn &&
+                     hypercall_preempt_check() )
+                {
+                    rc = -ERESTART;
+                    data->opaque = first_gfn;
+                    break;
+                }
+            }
+        }
+
         break;
     }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 4169d18..1d57e5c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1011,6 +1011,35 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+/* Synchronously modify the p2m type for a range of gfns from ot to nt. */
+void p2m_finish_type_change(struct domain *d,
+                            gfn_t first_gfn, unsigned long max_nr,
+                            p2m_type_t ot, p2m_type_t nt)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    p2m_type_t t;
+    unsigned long gfn = gfn_x(first_gfn);
+    unsigned long last_gfn = gfn + max_nr - 1;
+
+    ASSERT(ot != nt);
+    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
+
+    p2m_lock(p2m);
+
+    last_gfn = min(last_gfn, p2m->max_mapped_pfn);
+    while ( gfn <= last_gfn )
+    {
+        get_gfn_query_unlocked(d, gfn, &t);
+
+        if ( t == ot )
+            p2m_change_type_one(d, gfn, t, nt);
+
+        gfn++;
+    }
+
+    p2m_unlock(p2m);
+}
+
 /*
  * Returns:
  *    0              for success
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index e7e390d..0e670af 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -611,6 +611,12 @@ void p2m_change_type_range(struct domain *d,
 int p2m_change_type_one(struct domain *d, unsigned long gfn,
                         p2m_type_t ot, p2m_type_t nt);
 
+/* Synchronously change the p2m type for a range of gfns */
+void p2m_finish_type_change(struct domain *d,
+                            gfn_t first_gfn,
+                            unsigned long max_nr,
+                            p2m_type_t ot, p2m_type_t nt);
+
 /* Report a change affecting memory types. */
 void p2m_memory_type_changed(struct domain *d);
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
@ 2017-04-06 16:45   ` George Dunlap
  2017-04-07  7:34   ` Tian, Kevin
  2017-04-07  9:40   ` Jan Beulich
  2 siblings, 0 replies; 29+ messages in thread
From: George Dunlap @ 2017-04-06 16:45 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Paul Durrant, zhiyuan.lv, Jan Beulich

On 06/04/17 16:53, Yu Zhang wrote:
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global()
> interface.
> 
> New field entry_count is introduced in struct p2m_domain, to record
> the number of p2m_ioreq_server p2m page table entries. One nature of
> these entries is that they only point to 4K sized page frames, because
> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
> p2m_change_type_one(). We do not need to worry about the counting for
> 2M/1G sized pages.
> 
> This patch disallows mapping of an ioreq server, when there's still
> p2m_ioreq_server entry left, in case another mapping occurs right after
> the current one being unmapped, releases its lock, with p2m table not
> synced yet.
> 
> This patch also disallows live migration, when there's remaining
> p2m_ioreq_server entry in p2m table. The core reason is our current
> implementation of p2m_change_entry_type_global() lacks information
> to resync p2m_ioreq_server entries correctly if global_logdirty is
> on.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> changes in v7:
>   - According to comments from George: add code to increase the
>     entry_count.
>   - Comment changes in {ept,p2m_pt}_set_entry.
> 
> changes in v6:
>   - According to comments from Jan & George: move the count from
>     p2m_change_type_one() to {ept,p2m_pt}_set_entry.
>   - According to comments from George: comments change.
> 
> changes in v5:
>   - According to comments from Jan: use unsigned long for entry_count;
>   - According to comments from Jan: refuse mapping requirement when
>     there's p2m_ioreq_server remained in p2m table.
>   - Added "Reviewed-by: Paul Durrant <paul.durrant@citrix.com>"
> 
> changes in v4:
>   - According to comments from Jan: use ASSERT() instead of 'if'
>     condition in p2m_change_type_one().
>   - According to comments from Jan: commit message changes to mention
>     the p2m_ioreq_server are all based on 4K sized pages.
> 
> changes in v3:
>   - Move the synchronously resetting logic into patch 5.
>   - According to comments from Jan: introduce p2m_check_changeable()
>     to clarify the p2m type change code.
>   - According to comments from George: use locks in the same order
>     to avoid deadlock, call p2m_change_entry_type_global() after unmap
>     of the ioreq server is finished.
> 
> changes in v2:
>   - Move the calculation of ioreq server page entry_cout into
>     p2m_change_type_one() so that we do not need a seperate lock.
>     Note: entry_count is also calculated in resolve_misconfig()/
>     do_recalc(), fortunately callers of both routines have p2m
>     lock protected already.
>   - Simplify logic in hvmop_set_mem_type().
>   - Introduce routine p2m_finish_type_change() to walk the p2m
>     table and do the p2m reset.
> ---
>  xen/arch/x86/hvm/ioreq.c  |  8 ++++++++
>  xen/arch/x86/mm/hap/hap.c |  9 +++++++++
>  xen/arch/x86/mm/p2m-ept.c | 24 +++++++++++++++++++++++-
>  xen/arch/x86/mm/p2m-pt.c  | 30 ++++++++++++++++++++++++++++--
>  xen/arch/x86/mm/p2m.c     |  9 +++++++++
>  xen/include/asm-x86/p2m.h |  9 ++++++++-
>  6 files changed, 85 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 5bf3b6d..07a6c26 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>  
>      spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>  
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> +    }
> +
>      return rc;
>  }
>  
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index a57b385..4b591fe 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -187,6 +187,15 @@ out:
>   */
>  static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>  {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    /*
> +     * Refuse to turn on global log-dirty mode if
> +     * there are outstanding p2m_ioreq_server pages.
> +     */
> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
> +        return -EBUSY;
> +
>      /* turn on PG_log_dirty bit in paging mode */
>      paging_lock(d);
>      d->arch.paging.mode |= PG_log_dirty;
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index cc1eb21..ecd5ceb 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>                      e.ipat = ipat;
>                      if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>                      {
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             ASSERT(p2m->ioreq.entry_count > 0);
> +                             p2m->ioreq.entry_count--;
> +                         }
> +
>                           e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>                                       ? p2m_ram_logdirty : p2m_ram_rw;
>                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
> @@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>                                      old_entry.suppress_ve : 1;
>  
> +    /*
> +     * p2m_ioreq_server is only used for 4K pages, so the
> +     * count shall only happen on ept page table entries.

Actually, I liked the previous comment ("We only do x...") better. :-)

FYI normally if you're trying to avoid saying "we" you use the passive
voice, like this: "...so the count *is only done* on ept page table
entries".

But I don't think that's a big deal at this point:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-04-06 15:53 ` [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2017-04-07  7:33   ` Tian, Kevin
  0 siblings, 0 replies; 29+ messages in thread
From: Tian, Kevin @ 2017-04-07  7:33 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Tim Deegan,
	Paul Durrant, Lv, Zhiyuan, Jan Beulich

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Thursday, April 6, 2017 11:54 PM
> 
> Previously, p2m_ioreq_server is used to write-protect guest ram
> pages, which are tracked with ioreq server's rangeset. However,
> number of ram pages to be tracked may exceed the upper limit of
> rangeset.
> 
> Now, a new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added
> to let one ioreq server claim/disclaim its responsibility for the
> handling of guest pages with p2m type p2m_ioreq_server. Users of
> this DMOP can specify which kind of operation is supposed to be
> emulated in a parameter named flags. Currently, this DMOP only
> support the emulation of write operations. And it can be further
> extended to support the emulation of read ones if an ioreq server
> has such requirement in the future.
> 
> For now, we only support one ioreq server for this p2m type, so
> once an ioreq server has claimed its ownership, subsequent calls
> of the XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can
> also disclaim the ownership of guest ram pages with p2m_ioreq_server,
> by triggering this new DMOP, with ioreq server id set to the current
> owner's and flags parameter set to 0.
> 
> Note:
> a> both XEN_DMOP_map_mem_type_to_ioreq_server and
> p2m_ioreq_server
> are only supported for HVMs with HAP enabled.
> 
> b> only after one ioreq server claims its ownership of p2m_ioreq_server,
> will the p2m type change to p2m_ioreq_server be allowed.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Acked-by: Tim Deegan <tim@xen.org>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-04-06 16:45   ` George Dunlap
@ 2017-04-07  7:34   ` Tian, Kevin
  2017-04-07  9:40   ` Jan Beulich
  2 siblings, 0 replies; 29+ messages in thread
From: Tian, Kevin @ 2017-04-07  7:34 UTC (permalink / raw)
  To: Yu Zhang, xen-devel
  Cc: Nakajima, Jun, George Dunlap, Andrew Cooper, Paul Durrant, Lv,
	Zhiyuan, Jan Beulich

> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: Thursday, April 6, 2017 11:54 PM
> 
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global()
> interface.
> 
> New field entry_count is introduced in struct p2m_domain, to record
> the number of p2m_ioreq_server p2m page table entries. One nature of
> these entries is that they only point to 4K sized page frames, because
> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
> p2m_change_type_one(). We do not need to worry about the counting for
> 2M/1G sized pages.
> 
> This patch disallows mapping of an ioreq server, when there's still
> p2m_ioreq_server entry left, in case another mapping occurs right after
> the current one being unmapped, releases its lock, with p2m table not
> synced yet.
> 
> This patch also disallows live migration, when there's remaining
> p2m_ioreq_server entry in p2m table. The core reason is our current
> implementation of p2m_change_entry_type_global() lacks information
> to resync p2m_ioreq_server entries correctly if global_logdirty is
> on.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-04-06 16:45   ` George Dunlap
  2017-04-07  7:34   ` Tian, Kevin
@ 2017-04-07  9:40   ` Jan Beulich
  2017-04-07  9:53     ` Yu Zhang
  2017-04-07 10:14     ` Yu Zhang
  2 siblings, 2 replies; 29+ messages in thread
From: Jan Beulich @ 2017-04-07  9:40 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>                      e.ipat = ipat;
>                      if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>                      {
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             ASSERT(p2m->ioreq.entry_count > 0);
> +                             p2m->ioreq.entry_count--;
> +                         }
> +
>                           e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>                                       ? p2m_ram_logdirty : p2m_ram_rw;

I don't think this can be right: Why would it be valid to change the
type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
here, without taking into account further information? This code
can run at any time, not just when you want to reset things. So at
the very least there is a check missing whether a suitable ioreq
server still exists (and only if it doesn't you want to do the type
reset).

> @@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>                                      old_entry.suppress_ve : 1;
>  
> +    /*
> +     * p2m_ioreq_server is only used for 4K pages, so the
> +     * count shall only happen on ept page table entries.
> +     */
> +    if ( p2mt == p2m_ioreq_server )
> +    {
> +        ASSERT(i == 0);
> +        p2m->ioreq.entry_count++;
> +    }
> +
> +    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
> +    {
> +        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);

I think this would better be two ASSERT()s, so if one triggers it's
clear what problem it was right away. The two conditions aren't
really related to one another.

> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>      if ( is_epte_valid(ept_entry) )
>      {
>          if ( (recalc || ept_entry->recalc) &&
> -             p2m_is_changeable(ept_entry->sa_p2mt) )
> +             p2m_check_changeable(ept_entry->sa_p2mt) )

I think the distinction between these two is rather arbitrary, and I
also think this is part of the problem above: Distinguishing log-dirty
from ram-rw requires auxiliary data to be consulted. The same
ought to apply to ioreq-server, and then there wouldn't be a need
to have two p2m_*_changeable() flavors.

Of course the subsequent use p2m_is_logdirty_range() may then
need amending.

In the end it looks like you have the inverse problem here compared
to above: You should return ram-rw when the reset was already
initiated. At least that's how I would see the logic to match up with
the log-dirty handling (where the _effective_ rather than the last
stored type is being returned).

> @@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>  
>      if ( page_order == PAGE_ORDER_4K )
>      {
> +        p2m_type_t p2mt_old;
> +
>          rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>                              L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>                              L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          if ( entry_content.l1 != 0 )
>              p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>  
> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
> +
> +        /*
> +         * p2m_ioreq_server is only used for 4K pages, so
> +         * the count shall only be performed for level 1 entries.
> +         */
> +        if ( p2mt == p2m_ioreq_server )
> +            p2m->ioreq.entry_count++;
> +
> +        if ( p2mt_old == p2m_ioreq_server )
> +        {
> +            ASSERT(p2m->ioreq.entry_count > 0);
> +            p2m->ioreq.entry_count--;
> +        }
> +
>          /* level 1 entry */
>          p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);

I think to match up with EPT you also want to add

    ASSERT(p2mt_old != p2m_ioreq_server);

to the 2M and 1G paths.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07  9:40   ` Jan Beulich
@ 2017-04-07  9:53     ` Yu Zhang
  2017-04-07 10:22       ` George Dunlap
  2017-04-07 10:26       ` Jan Beulich
  2017-04-07 10:14     ` Yu Zhang
  1 sibling, 2 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-07  9:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>                       e.ipat = ipat;
>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>                       {
>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>> +                         {
>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>> +                             p2m->ioreq.entry_count--;
>> +                         }
>> +
>>                            e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>                                        ? p2m_ram_logdirty : p2m_ram_rw;
> I don't think this can be right: Why would it be valid to change the
> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
> here, without taking into account further information? This code
> can run at any time, not just when you want to reset things. So at
> the very least there is a check missing whether a suitable ioreq
> server still exists (and only if it doesn't you want to do the type
> reset).

Sorry, Jan. I think we have discussed this quite long ago.
Indeed, there's information lacked here, and that's why global_logdirty 
is disallowed
when there's remaining p2m_ioreq_server entries. :-)

>
>> @@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>           new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>>                                       old_entry.suppress_ve : 1;
>>   
>> +    /*
>> +     * p2m_ioreq_server is only used for 4K pages, so the
>> +     * count shall only happen on ept page table entries.
>> +     */
>> +    if ( p2mt == p2m_ioreq_server )
>> +    {
>> +        ASSERT(i == 0);
>> +        p2m->ioreq.entry_count++;
>> +    }
>> +
>> +    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
>> +    {
>> +        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
> I think this would better be two ASSERT()s, so if one triggers it's
> clear what problem it was right away. The two conditions aren't
> really related to one another.
>
>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>       if ( is_epte_valid(ept_entry) )
>>       {
>>           if ( (recalc || ept_entry->recalc) &&
>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
> I think the distinction between these two is rather arbitrary, and I
> also think this is part of the problem above: Distinguishing log-dirty
> from ram-rw requires auxiliary data to be consulted. The same
> ought to apply to ioreq-server, and then there wouldn't be a need
> to have two p2m_*_changeable() flavors.

Well, I think we have also discussed this quite long ago, here is the link.
https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html

> Of course the subsequent use p2m_is_logdirty_range() may then
> need amending.
>
> In the end it looks like you have the inverse problem here compared
> to above: You should return ram-rw when the reset was already
> initiated. At least that's how I would see the logic to match up with
> the log-dirty handling (where the _effective_ rather than the last
> stored type is being returned).
>
>> @@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>   
>>       if ( page_order == PAGE_ORDER_4K )
>>       {
>> +        p2m_type_t p2mt_old;
>> +
>>           rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>>                               L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>>                               L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
>> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>           if ( entry_content.l1 != 0 )
>>               p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>>   
>> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
>> +
>> +        /*
>> +         * p2m_ioreq_server is only used for 4K pages, so
>> +         * the count shall only be performed for level 1 entries.
>> +         */
>> +        if ( p2mt == p2m_ioreq_server )
>> +            p2m->ioreq.entry_count++;
>> +
>> +        if ( p2mt_old == p2m_ioreq_server )
>> +        {
>> +            ASSERT(p2m->ioreq.entry_count > 0);
>> +            p2m->ioreq.entry_count--;
>> +        }
>> +
>>           /* level 1 entry */
>>           p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
> I think to match up with EPT you also want to add
>
>      ASSERT(p2mt_old != p2m_ioreq_server);
>
> to the 2M and 1G paths.

Is this really necessary? 2M and 1G page does not have p2mt_old, 
defining one and peek the p2m type just
to have an ASSERT does not seem quite useful - and will hurt the 
performance.

As to ept, since there's already a variable 'i', which may be greater 
than 0 - so I added an ASSERT.

Yu
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07  9:40   ` Jan Beulich
  2017-04-07  9:53     ` Yu Zhang
@ 2017-04-07 10:14     ` Yu Zhang
  2017-04-07 10:28       ` Jan Beulich
  2017-04-07 10:28       ` George Dunlap
  1 sibling, 2 replies; 29+ messages in thread
From: Yu Zhang @ 2017-04-07 10:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>                       e.ipat = ipat;
>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>                       {
>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>> +                         {
>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>> +                             p2m->ioreq.entry_count--;
>> +                         }
>> +
>>                            e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>                                        ? p2m_ram_logdirty : p2m_ram_rw;
> I don't think this can be right: Why would it be valid to change the
> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
> here, without taking into account further information? This code
> can run at any time, not just when you want to reset things. So at
> the very least there is a check missing whether a suitable ioreq
> server still exists (and only if it doesn't you want to do the type
> reset).

Also I do not think we need to check if a suitable ioreq server still 
exists. We have guaranteed
in our patch that no new ioreq server will be mapped as long as the p2m 
table is not clean. :)

Yu

[snip]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07  9:53     ` Yu Zhang
@ 2017-04-07 10:22       ` George Dunlap
  2017-04-07 10:22         ` Yu Zhang
  2017-04-07 10:26       ` Jan Beulich
  1 sibling, 1 reply; 29+ messages in thread
From: George Dunlap @ 2017-04-07 10:22 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

On 07/04/17 10:53, Yu Zhang wrote:
> 
> 
> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>> *p2m, unsigned long gfn)
>>>                       e.ipat = ipat;
>>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>                       {
>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>> +                         {
>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>> +                             p2m->ioreq.entry_count--;
>>> +                         }
>>> +
>>>                            e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>> + i, gfn + i)
>>>                                        ? p2m_ram_logdirty : p2m_ram_rw;
>> I don't think this can be right: Why would it be valid to change the
>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>> here, without taking into account further information? This code
>> can run at any time, not just when you want to reset things. So at
>> the very least there is a check missing whether a suitable ioreq
>> server still exists (and only if it doesn't you want to do the type
>> reset).
> 
> Sorry, Jan. I think we have discussed this quite long ago.
> Indeed, there's information lacked here, and that's why global_logdirty
> is disallowed
> when there's remaining p2m_ioreq_server entries. :-)
> 
>>
>>> @@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned
>>> long gfn, mfn_t mfn,
>>>           new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>>>                                       old_entry.suppress_ve : 1;
>>>   +    /*
>>> +     * p2m_ioreq_server is only used for 4K pages, so the
>>> +     * count shall only happen on ept page table entries.
>>> +     */
>>> +    if ( p2mt == p2m_ioreq_server )
>>> +    {
>>> +        ASSERT(i == 0);
>>> +        p2m->ioreq.entry_count++;
>>> +    }
>>> +
>>> +    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
>>> +    {
>>> +        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
>> I think this would better be two ASSERT()s, so if one triggers it's
>> clear what problem it was right away. The two conditions aren't
>> really related to one another.
>>
>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>       if ( is_epte_valid(ept_entry) )
>>>       {
>>>           if ( (recalc || ept_entry->recalc) &&
>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>> I think the distinction between these two is rather arbitrary, and I
>> also think this is part of the problem above: Distinguishing log-dirty
>> from ram-rw requires auxiliary data to be consulted. The same
>> ought to apply to ioreq-server, and then there wouldn't be a need
>> to have two p2m_*_changeable() flavors.
> 
> Well, I think we have also discussed this quite long ago, here is the link.
> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
> 
>> Of course the subsequent use p2m_is_logdirty_range() may then
>> need amending.
>>
>> In the end it looks like you have the inverse problem here compared
>> to above: You should return ram-rw when the reset was already
>> initiated. At least that's how I would see the logic to match up with
>> the log-dirty handling (where the _effective_ rather than the last
>> stored type is being returned).
>>
>>> @@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
>>> long gfn, mfn_t mfn,
>>>         if ( page_order == PAGE_ORDER_4K )
>>>       {
>>> +        p2m_type_t p2mt_old;
>>> +
>>>           rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>>>                               L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>>>                               L2_PAGETABLE_ENTRIES,
>>> PGT_l1_page_table, 1);
>>> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>>> unsigned long gfn, mfn_t mfn,
>>>           if ( entry_content.l1 != 0 )
>>>               p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>>>   +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
>>> +
>>> +        /*
>>> +         * p2m_ioreq_server is only used for 4K pages, so
>>> +         * the count shall only be performed for level 1 entries.
>>> +         */
>>> +        if ( p2mt == p2m_ioreq_server )
>>> +            p2m->ioreq.entry_count++;
>>> +
>>> +        if ( p2mt_old == p2m_ioreq_server )
>>> +        {
>>> +            ASSERT(p2m->ioreq.entry_count > 0);
>>> +            p2m->ioreq.entry_count--;
>>> +        }
>>> +
>>>           /* level 1 entry */
>>>           p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>> I think to match up with EPT you also want to add
>>
>>      ASSERT(p2mt_old != p2m_ioreq_server);
>>
>> to the 2M and 1G paths.
> 
> Is this really necessary? 2M and 1G page does not have p2mt_old,
> defining one and peek the p2m type just
> to have an ASSERT does not seem quite useful - and will hurt the
> performance.
> 
> As to ept, since there's already a variable 'i', which may be greater
> than 0 - so I added an ASSERT.

Yes, that's Jan's point -- that for EPT, there is effectively ASSERT()
that 2M and 1G entries are not p2m_ioreq_server; but for SVM, because of
the code duplication, there is not.

ASSERT()s are:
1. There to double-check that the assumptions you're making (i.e., "2M
and 1G entries can never be of type p2m_ioreq_server") are valid
2. Only enabled when debug=y, and so are generally not a performance
consideration.

You're making an assumption, so an ASSERT is useful; and it's only a
one-line check that will be removed for non-debug builds, so the
performance is not a consideration.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:22       ` George Dunlap
@ 2017-04-07 10:22         ` Yu Zhang
  2017-04-07 10:41           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Yu Zhang @ 2017-04-07 10:22 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 4/7/2017 6:22 PM, George Dunlap wrote:
> On 07/04/17 10:53, Yu Zhang wrote:
>>
>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>>> *p2m, unsigned long gfn)
>>>>                        e.ipat = ipat;
>>>>                        if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>                        {
>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>> +                         {
>>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>>> +                             p2m->ioreq.entry_count--;
>>>> +                         }
>>>> +
>>>>                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>>> + i, gfn + i)
>>>>                                         ? p2m_ram_logdirty : p2m_ram_rw;
>>> I don't think this can be right: Why would it be valid to change the
>>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>>> here, without taking into account further information? This code
>>> can run at any time, not just when you want to reset things. So at
>>> the very least there is a check missing whether a suitable ioreq
>>> server still exists (and only if it doesn't you want to do the type
>>> reset).
>> Sorry, Jan. I think we have discussed this quite long ago.
>> Indeed, there's information lacked here, and that's why global_logdirty
>> is disallowed
>> when there's remaining p2m_ioreq_server entries. :-)
>>
>>>> @@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned
>>>> long gfn, mfn_t mfn,
>>>>            new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>>>>                                        old_entry.suppress_ve : 1;
>>>>    +    /*
>>>> +     * p2m_ioreq_server is only used for 4K pages, so the
>>>> +     * count shall only happen on ept page table entries.
>>>> +     */
>>>> +    if ( p2mt == p2m_ioreq_server )
>>>> +    {
>>>> +        ASSERT(i == 0);
>>>> +        p2m->ioreq.entry_count++;
>>>> +    }
>>>> +
>>>> +    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
>>>> +    {
>>>> +        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
>>> I think this would better be two ASSERT()s, so if one triggers it's
>>> clear what problem it was right away. The two conditions aren't
>>> really related to one another.
>>>
>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>>        if ( is_epte_valid(ept_entry) )
>>>>        {
>>>>            if ( (recalc || ept_entry->recalc) &&
>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>> I think the distinction between these two is rather arbitrary, and I
>>> also think this is part of the problem above: Distinguishing log-dirty
>>> from ram-rw requires auxiliary data to be consulted. The same
>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>> to have two p2m_*_changeable() flavors.
>> Well, I think we have also discussed this quite long ago, here is the link.
>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
>>
>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>> need amending.
>>>
>>> In the end it looks like you have the inverse problem here compared
>>> to above: You should return ram-rw when the reset was already
>>> initiated. At least that's how I would see the logic to match up with
>>> the log-dirty handling (where the _effective_ rather than the last
>>> stored type is being returned).
>>>
>>>> @@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
>>>> long gfn, mfn_t mfn,
>>>>          if ( page_order == PAGE_ORDER_4K )
>>>>        {
>>>> +        p2m_type_t p2mt_old;
>>>> +
>>>>            rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>>>>                                L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>>>>                                L2_PAGETABLE_ENTRIES,
>>>> PGT_l1_page_table, 1);
>>>> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>>>> unsigned long gfn, mfn_t mfn,
>>>>            if ( entry_content.l1 != 0 )
>>>>                p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>>>>    +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
>>>> +
>>>> +        /*
>>>> +         * p2m_ioreq_server is only used for 4K pages, so
>>>> +         * the count shall only be performed for level 1 entries.
>>>> +         */
>>>> +        if ( p2mt == p2m_ioreq_server )
>>>> +            p2m->ioreq.entry_count++;
>>>> +
>>>> +        if ( p2mt_old == p2m_ioreq_server )
>>>> +        {
>>>> +            ASSERT(p2m->ioreq.entry_count > 0);
>>>> +            p2m->ioreq.entry_count--;
>>>> +        }
>>>> +
>>>>            /* level 1 entry */
>>>>            p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>>> I think to match up with EPT you also want to add
>>>
>>>       ASSERT(p2mt_old != p2m_ioreq_server);
>>>
>>> to the 2M and 1G paths.
>> Is this really necessary? 2M and 1G page does not have p2mt_old,
>> defining one and peek the p2m type just
>> to have an ASSERT does not seem quite useful - and will hurt the
>> performance.
>>
>> As to ept, since there's already a variable 'i', which may be greater
>> than 0 - so I added an ASSERT.
> Yes, that's Jan's point -- that for EPT, there is effectively ASSERT()
> that 2M and 1G entries are not p2m_ioreq_server; but for SVM, because of
> the code duplication, there is not.
>
> ASSERT()s are:
> 1. There to double-check that the assumptions you're making (i.e., "2M
> and 1G entries can never be of type p2m_ioreq_server") are valid
> 2. Only enabled when debug=y, and so are generally not a performance
> consideration.
>
> You're making an assumption, so an ASSERT is useful; and it's only a
> one-line check that will be removed for non-debug builds, so the
> performance is not a consideration.

Thanks George.
I do not worry about the cost of the ASSERT() itself, but the effort of 
peeking a p2m:
p2m_flags_to_type(l1e_get_flags(*p2m_entry));
And this cannot be  removed during runtime.

Yu

>   -George
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07  9:53     ` Yu Zhang
  2017-04-07 10:22       ` George Dunlap
@ 2017-04-07 10:26       ` Jan Beulich
  2017-04-07 10:55         ` Yu Zhang
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 10:26 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>>                       e.ipat = ipat;
>>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>                       {
>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>> +                         {
>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>> +                             p2m->ioreq.entry_count--;
>>> +                         }
>>> +
>>>                            e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>>                                        ? p2m_ram_logdirty : p2m_ram_rw;
>> I don't think this can be right: Why would it be valid to change the
>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>> here, without taking into account further information? This code
>> can run at any time, not just when you want to reset things. So at
>> the very least there is a check missing whether a suitable ioreq
>> server still exists (and only if it doesn't you want to do the type
>> reset).
> 
> Sorry, Jan. I think we have discussed this quite long ago.
> Indeed, there's information lacked here, and that's why global_logdirty 
> is disallowed
> when there's remaining p2m_ioreq_server entries. :-)

log-dirty isn't the interesting part (which is why I did put it in
parentheses). You change ioreq-server to ram-rw here
regardless of whether we're actually cleaning up after a
detached server.

>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>       if ( is_epte_valid(ept_entry) )
>>>       {
>>>           if ( (recalc || ept_entry->recalc) &&
>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>> I think the distinction between these two is rather arbitrary, and I
>> also think this is part of the problem above: Distinguishing log-dirty
>> from ram-rw requires auxiliary data to be consulted. The same
>> ought to apply to ioreq-server, and then there wouldn't be a need
>> to have two p2m_*_changeable() flavors.
> 
> Well, I think we have also discussed this quite long ago, here is the link.
> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html 

That was a different discussion, namely not considering this ...

>> Of course the subsequent use p2m_is_logdirty_range() may then
>> need amending.
>>
>> In the end it looks like you have the inverse problem here compared
>> to above: You should return ram-rw when the reset was already
>> initiated. At least that's how I would see the logic to match up with
>> the log-dirty handling (where the _effective_ rather than the last
>> stored type is being returned).

... at all.

>>> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>>           if ( entry_content.l1 != 0 )
>>>               p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>>>   
>>> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
>>> +
>>> +        /*
>>> +         * p2m_ioreq_server is only used for 4K pages, so
>>> +         * the count shall only be performed for level 1 entries.
>>> +         */
>>> +        if ( p2mt == p2m_ioreq_server )
>>> +            p2m->ioreq.entry_count++;
>>> +
>>> +        if ( p2mt_old == p2m_ioreq_server )
>>> +        {
>>> +            ASSERT(p2m->ioreq.entry_count > 0);
>>> +            p2m->ioreq.entry_count--;
>>> +        }
>>> +
>>>           /* level 1 entry */
>>>           p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>> I think to match up with EPT you also want to add
>>
>>      ASSERT(p2mt_old != p2m_ioreq_server);
>>
>> to the 2M and 1G paths.
> 
> Is this really necessary? 2M and 1G page does not have p2mt_old, 
> defining one and peek the p2m type just
> to have an ASSERT does not seem quite useful - and will hurt the 
> performance.

I don't follow. Matching up with the 4k case, what you'd add would
be

    ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:14     ` Yu Zhang
@ 2017-04-07 10:28       ` Jan Beulich
  2017-04-07 10:28       ` George Dunlap
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 10:28 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 07.04.17 at 12:14, <yu.c.zhang@linux.intel.com> wrote:
> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>>                       e.ipat = ipat;
>>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>                       {
>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>> +                         {
>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>> +                             p2m->ioreq.entry_count--;
>>> +                         }
>>> +
>>>                            e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>>                                        ? p2m_ram_logdirty : p2m_ram_rw;
>> I don't think this can be right: Why would it be valid to change the
>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>> here, without taking into account further information? This code
>> can run at any time, not just when you want to reset things. So at
>> the very least there is a check missing whether a suitable ioreq
>> server still exists (and only if it doesn't you want to do the type
>> reset).
> 
> Also I do not think we need to check if a suitable ioreq server still 
> exists. We have guaranteed
> in our patch that no new ioreq server will be mapped as long as the p2m 
> table is not clean. :)

You didn't get my point then: The problem is when there still _is_
an ioreq server. You're changing the type even in that case.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:14     ` Yu Zhang
  2017-04-07 10:28       ` Jan Beulich
@ 2017-04-07 10:28       ` George Dunlap
  2017-04-07 10:50         ` Yu Zhang
  1 sibling, 1 reply; 29+ messages in thread
From: George Dunlap @ 2017-04-07 10:28 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

On 07/04/17 11:14, Yu Zhang wrote:
> 
> 
> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>> *p2m, unsigned long gfn)
>>>                       e.ipat = ipat;
>>>                       if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>                       {
>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>> +                         {
>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>> +                             p2m->ioreq.entry_count--;
>>> +                         }
>>> +
>>>                            e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>> + i, gfn + i)
>>>                                        ? p2m_ram_logdirty : p2m_ram_rw;
>> I don't think this can be right: Why would it be valid to change the
>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>> here, without taking into account further information? This code
>> can run at any time, not just when you want to reset things. So at
>> the very least there is a check missing whether a suitable ioreq
>> server still exists (and only if it doesn't you want to do the type
>> reset).
> 
> Also I do not think we need to check if a suitable ioreq server still
> exists. We have guaranteed
> in our patch that no new ioreq server will be mapped as long as the p2m
> table is not clean. :)

Jan is saying that you should only change ioreq_server -> ram if there
is *not* an ioreq server; and that if this is called with an ioreq
server still active, then it must be some other change you're looking at.

The problem, though, is that misconfiguration happens in many
circumstances.  Grep for "memory_type_changed()" -- each of those
results in a recalculation of the entire p2m, which will (in the current
code) wipe out any ioreq_server entries.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:22         ` Yu Zhang
@ 2017-04-07 10:41           ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 10:41 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, George Dunlap,
	xen-devel, Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 07.04.17 at 12:22, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 4/7/2017 6:22 PM, George Dunlap wrote:
>> On 07/04/17 10:53, Yu Zhang wrote:
>>>
>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>>>> *p2m, unsigned long gfn)
>>>>>                        e.ipat = ipat;
>>>>>                        if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>>                        {
>>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>>> +                         {
>>>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>>>> +                             p2m->ioreq.entry_count--;
>>>>> +                         }
>>>>> +
>>>>>                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>>>> + i, gfn + i)
>>>>>                                         ? p2m_ram_logdirty : p2m_ram_rw;
>>>> I don't think this can be right: Why would it be valid to change the
>>>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>>>> here, without taking into account further information? This code
>>>> can run at any time, not just when you want to reset things. So at
>>>> the very least there is a check missing whether a suitable ioreq
>>>> server still exists (and only if it doesn't you want to do the type
>>>> reset).
>>> Sorry, Jan. I think we have discussed this quite long ago.
>>> Indeed, there's information lacked here, and that's why global_logdirty
>>> is disallowed
>>> when there's remaining p2m_ioreq_server entries. :-)
>>>
>>>>> @@ -816,6 +822,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned
>>>>> long gfn, mfn_t mfn,
>>>>>            new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>>>>>                                        old_entry.suppress_ve : 1;
>>>>>    +    /*
>>>>> +     * p2m_ioreq_server is only used for 4K pages, so the
>>>>> +     * count shall only happen on ept page table entries.
>>>>> +     */
>>>>> +    if ( p2mt == p2m_ioreq_server )
>>>>> +    {
>>>>> +        ASSERT(i == 0);
>>>>> +        p2m->ioreq.entry_count++;
>>>>> +    }
>>>>> +
>>>>> +    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
>>>>> +    {
>>>>> +        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
>>>> I think this would better be two ASSERT()s, so if one triggers it's
>>>> clear what problem it was right away. The two conditions aren't
>>>> really related to one another.
>>>>
>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>>>        if ( is_epte_valid(ept_entry) )
>>>>>        {
>>>>>            if ( (recalc || ept_entry->recalc) &&
>>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>>> I think the distinction between these two is rather arbitrary, and I
>>>> also think this is part of the problem above: Distinguishing log-dirty
>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>> to have two p2m_*_changeable() flavors.
>>> Well, I think we have also discussed this quite long ago, here is the link.
>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html 
>>>
>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>> need amending.
>>>>
>>>> In the end it looks like you have the inverse problem here compared
>>>> to above: You should return ram-rw when the reset was already
>>>> initiated. At least that's how I would see the logic to match up with
>>>> the log-dirty handling (where the _effective_ rather than the last
>>>> stored type is being returned).
>>>>
>>>>> @@ -606,6 +615,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
>>>>> long gfn, mfn_t mfn,
>>>>>          if ( page_order == PAGE_ORDER_4K )
>>>>>        {
>>>>> +        p2m_type_t p2mt_old;
>>>>> +
>>>>>            rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>>>>>                                L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>>>>>                                L2_PAGETABLE_ENTRIES,
>>>>> PGT_l1_page_table, 1);
>>>>> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m,
>>>>> unsigned long gfn, mfn_t mfn,
>>>>>            if ( entry_content.l1 != 0 )
>>>>>                p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>>>>>    +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
>>>>> +
>>>>> +        /*
>>>>> +         * p2m_ioreq_server is only used for 4K pages, so
>>>>> +         * the count shall only be performed for level 1 entries.
>>>>> +         */
>>>>> +        if ( p2mt == p2m_ioreq_server )
>>>>> +            p2m->ioreq.entry_count++;
>>>>> +
>>>>> +        if ( p2mt_old == p2m_ioreq_server )
>>>>> +        {
>>>>> +            ASSERT(p2m->ioreq.entry_count > 0);
>>>>> +            p2m->ioreq.entry_count--;
>>>>> +        }
>>>>> +
>>>>>            /* level 1 entry */
>>>>>            p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>>>> I think to match up with EPT you also want to add
>>>>
>>>>       ASSERT(p2mt_old != p2m_ioreq_server);
>>>>
>>>> to the 2M and 1G paths.
>>> Is this really necessary? 2M and 1G page does not have p2mt_old,
>>> defining one and peek the p2m type just
>>> to have an ASSERT does not seem quite useful - and will hurt the
>>> performance.
>>>
>>> As to ept, since there's already a variable 'i', which may be greater
>>> than 0 - so I added an ASSERT.
>> Yes, that's Jan's point -- that for EPT, there is effectively ASSERT()
>> that 2M and 1G entries are not p2m_ioreq_server; but for SVM, because of
>> the code duplication, there is not.
>>
>> ASSERT()s are:
>> 1. There to double-check that the assumptions you're making (i.e., "2M
>> and 1G entries can never be of type p2m_ioreq_server") are valid
>> 2. Only enabled when debug=y, and so are generally not a performance
>> consideration.
>>
>> You're making an assumption, so an ASSERT is useful; and it's only a
>> one-line check that will be removed for non-debug builds, so the
>> performance is not a consideration.
> 
> Thanks George.
> I do not worry about the cost of the ASSERT() itself, but the effort of 
> peeking a p2m:
> p2m_flags_to_type(l1e_get_flags(*p2m_entry));
> And this cannot be  removed during runtime.

The l1e_get_flags() is not needed - both paths latch that into
"flags" already. And p2m_flags_to_type() is a simple inline
function, for which the compiler should be able to see that its
result is not used, and hence all code generated from it can be
deleted.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:28       ` George Dunlap
@ 2017-04-07 10:50         ` Yu Zhang
  2017-04-07 11:28           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Yu Zhang @ 2017-04-07 10:50 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 4/7/2017 6:28 PM, George Dunlap wrote:
> On 07/04/17 11:14, Yu Zhang wrote:
>>
>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>>> *p2m, unsigned long gfn)
>>>>                        e.ipat = ipat;
>>>>                        if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>                        {
>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>> +                         {
>>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>>> +                             p2m->ioreq.entry_count--;
>>>> +                         }
>>>> +
>>>>                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>>> + i, gfn + i)
>>>>                                         ? p2m_ram_logdirty : p2m_ram_rw;
>>> I don't think this can be right: Why would it be valid to change the
>>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>>> here, without taking into account further information? This code
>>> can run at any time, not just when you want to reset things. So at
>>> the very least there is a check missing whether a suitable ioreq
>>> server still exists (and only if it doesn't you want to do the type
>>> reset).
>> Also I do not think we need to check if a suitable ioreq server still
>> exists. We have guaranteed
>> in our patch that no new ioreq server will be mapped as long as the p2m
>> table is not clean. :)
> Jan is saying that you should only change ioreq_server -> ram if there
> is *not* an ioreq server; and that if this is called with an ioreq
> server still active, then it must be some other change you're looking at.
>
> The problem, though, is that misconfiguration happens in many
> circumstances.  Grep for "memory_type_changed()" -- each of those
> results in a recalculation of the entire p2m, which will (in the current
> code) wipe out any ioreq_server entries.

Well, I'm not aware that other actions besides the logdirty will cause 
the reset.
But if that would happen, will below change solve this?

@@ -546,12 +546,16 @@ static int resolve_misconfig(struct p2m_domain 
*p2m, unsigned long gfn)
                      {
                           if ( e.sa_p2mt == p2m_ioreq_server )
                           {
-                             ASSERT(p2m->ioreq.entry_count > 0);
-                             p2m->ioreq.entry_count--;
+                             if ( p2m->ioreq.server == NULL )
+                             {
+                                 ASSERT(p2m->ioreq.entry_count > 0);
+                                 p2m->ioreq.entry_count--;
+                                 e.sa_p2mt = p2m_ram_rw;
+                             }
                           }
-
-                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + 
i, gfn + i)
-                                     ? p2m_ram_logdirty : p2m_ram_rw;
+                         else
+                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn 
+ i, gfn + i)
+                                         ? p2m_ram_logdirty : p2m_ram_rw;
                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, 
e.access);
                      }
                      e.recalc = 0;

Yu
>
>   -George
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:26       ` Jan Beulich
@ 2017-04-07 10:55         ` Yu Zhang
  2017-04-07 11:31           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Yu Zhang @ 2017-04-07 10:55 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, Andrew Cooper, xen-devel, Paul Durrant, zhiyuan.lv,
	Jun Nakajima



On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>>>                        e.ipat = ipat;
>>>>                        if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>                        {
>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>> +                         {
>>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>>> +                             p2m->ioreq.entry_count--;
>>>> +                         }
>>>> +
>>>>                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>>>                                         ? p2m_ram_logdirty : p2m_ram_rw;
>>> I don't think this can be right: Why would it be valid to change the
>>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>>> here, without taking into account further information? This code
>>> can run at any time, not just when you want to reset things. So at
>>> the very least there is a check missing whether a suitable ioreq
>>> server still exists (and only if it doesn't you want to do the type
>>> reset).
>> Sorry, Jan. I think we have discussed this quite long ago.
>> Indeed, there's information lacked here, and that's why global_logdirty
>> is disallowed
>> when there's remaining p2m_ioreq_server entries. :-)
> log-dirty isn't the interesting part (which is why I did put it in
> parentheses). You change ioreq-server to ram-rw here
> regardless of whether we're actually cleaning up after a
> detached server.
>
>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>>        if ( is_epte_valid(ept_entry) )
>>>>        {
>>>>            if ( (recalc || ept_entry->recalc) &&
>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>> I think the distinction between these two is rather arbitrary, and I
>>> also think this is part of the problem above: Distinguishing log-dirty
>>> from ram-rw requires auxiliary data to be consulted. The same
>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>> to have two p2m_*_changeable() flavors.
>> Well, I think we have also discussed this quite long ago, here is the link.
>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
> That was a different discussion, namely not considering this ...
>
>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>> need amending.
>>>
>>> In the end it looks like you have the inverse problem here compared
>>> to above: You should return ram-rw when the reset was already
>>> initiated. At least that's how I would see the logic to match up with
>>> the log-dirty handling (where the _effective_ rather than the last
>>> stored type is being returned).
> ... at all.

Sorry I still do not get it.
George, do you have any idea about this?

>>>> @@ -629,6 +640,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>>>            if ( entry_content.l1 != 0 )
>>>>                p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>>>>    
>>>> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
>>>> +
>>>> +        /*
>>>> +         * p2m_ioreq_server is only used for 4K pages, so
>>>> +         * the count shall only be performed for level 1 entries.
>>>> +         */
>>>> +        if ( p2mt == p2m_ioreq_server )
>>>> +            p2m->ioreq.entry_count++;
>>>> +
>>>> +        if ( p2mt_old == p2m_ioreq_server )
>>>> +        {
>>>> +            ASSERT(p2m->ioreq.entry_count > 0);
>>>> +            p2m->ioreq.entry_count--;
>>>> +        }
>>>> +
>>>>            /* level 1 entry */
>>>>            p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>>> I think to match up with EPT you also want to add
>>>
>>>       ASSERT(p2mt_old != p2m_ioreq_server);
>>>
>>> to the 2M and 1G paths.
>> Is this really necessary? 2M and 1G page does not have p2mt_old,
>> defining one and peek the p2m type just
>> to have an ASSERT does not seem quite useful - and will hurt the
>> performance.
> I don't follow. Matching up with the 4k case, what you'd add would
> be
>
>      ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);

Yeah, that's a good point.

Yu

> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:50         ` Yu Zhang
@ 2017-04-07 11:28           ` Jan Beulich
  2017-04-07 12:17             ` Yu Zhang
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 11:28 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, George Dunlap,
	xen-devel, Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 07.04.17 at 12:50, <yu.c.zhang@linux.intel.com> wrote:
> On 4/7/2017 6:28 PM, George Dunlap wrote:
>> On 07/04/17 11:14, Yu Zhang wrote:
>>>
>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>>>> *p2m, unsigned long gfn)
>>>>>                        e.ipat = ipat;
>>>>>                        if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>>                        {
>>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>>> +                         {
>>>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>>>> +                             p2m->ioreq.entry_count--;
>>>>> +                         }
>>>>> +
>>>>>                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>>>> + i, gfn + i)
>>>>>                                         ? p2m_ram_logdirty : p2m_ram_rw;
>>>> I don't think this can be right: Why would it be valid to change the
>>>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>>>> here, without taking into account further information? This code
>>>> can run at any time, not just when you want to reset things. So at
>>>> the very least there is a check missing whether a suitable ioreq
>>>> server still exists (and only if it doesn't you want to do the type
>>>> reset).
>>> Also I do not think we need to check if a suitable ioreq server still
>>> exists. We have guaranteed
>>> in our patch that no new ioreq server will be mapped as long as the p2m
>>> table is not clean. :)
>> Jan is saying that you should only change ioreq_server -> ram if there
>> is *not* an ioreq server; and that if this is called with an ioreq
>> server still active, then it must be some other change you're looking at.
>>
>> The problem, though, is that misconfiguration happens in many
>> circumstances.  Grep for "memory_type_changed()" -- each of those
>> results in a recalculation of the entire p2m, which will (in the current
>> code) wipe out any ioreq_server entries.
> 
> Well, I'm not aware that other actions besides the logdirty will cause the reset.
> But if that would happen, will below change solve this?
> 
> @@ -546,12 +546,16 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>                       {
>                            if ( e.sa_p2mt == p2m_ioreq_server )
>                            {
> -                             ASSERT(p2m->ioreq.entry_count > 0);
> -                             p2m->ioreq.entry_count--;
> +                             if ( p2m->ioreq.server == NULL )
> +                             {
> +                                 ASSERT(p2m->ioreq.entry_count > 0);
> +                                 p2m->ioreq.entry_count--;
> +                                 e.sa_p2mt = p2m_ram_rw;
> +                             }
>                            }
> -
> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
> +                         else
> +                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
> +                                         ? p2m_ram_logdirty : p2m_ram_rw;

Now you _never_ change away from ioreq-server, you only adjust
the counter.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 10:55         ` Yu Zhang
@ 2017-04-07 11:31           ` Jan Beulich
  2017-04-07 13:56             ` George Dunlap
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 11:31 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 07.04.17 at 12:55, <yu.c.zhang@linux.intel.com> wrote:
> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>>>        if ( is_epte_valid(ept_entry) )
>>>>>        {
>>>>>            if ( (recalc || ept_entry->recalc) &&
>>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>>> I think the distinction between these two is rather arbitrary, and I
>>>> also think this is part of the problem above: Distinguishing log-dirty
>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>> to have two p2m_*_changeable() flavors.
>>> Well, I think we have also discussed this quite long ago, here is the link.
>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html 
>> That was a different discussion, namely not considering this ...
>>
>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>> need amending.
>>>>
>>>> In the end it looks like you have the inverse problem here compared
>>>> to above: You should return ram-rw when the reset was already
>>>> initiated. At least that's how I would see the logic to match up with
>>>> the log-dirty handling (where the _effective_ rather than the last
>>>> stored type is being returned).
>> ... at all.
> 
> Sorry I still do not get it.

Leaving ioreq-server out of the picture, what the function returns
to the caller is the type as it would be if a re-calculation would have
been done on the entry, even if that hasn't happened yet (the
function here shouldn't change any entries after all). I think we
want the same behavior here for what have been ioreq-server
entries (and which would become ram-rw after the next re-calc).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 11:28           ` Jan Beulich
@ 2017-04-07 12:17             ` Yu Zhang
  2017-04-07 12:36               ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Yu Zhang @ 2017-04-07 12:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, George Dunlap,
	xen-devel, Paul Durrant, zhiyuan.lv, Jun Nakajima



On 4/7/2017 7:28 PM, Jan Beulich wrote:
>>>> On 07.04.17 at 12:50, <yu.c.zhang@linux.intel.com> wrote:
>> On 4/7/2017 6:28 PM, George Dunlap wrote:
>>> On 07/04/17 11:14, Yu Zhang wrote:
>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>>> @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain
>>>>>> *p2m, unsigned long gfn)
>>>>>>                         e.ipat = ipat;
>>>>>>                         if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>>>>>>                         {
>>>>>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>>>>>> +                         {
>>>>>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>>>>>> +                             p2m->ioreq.entry_count--;
>>>>>> +                         }
>>>>>> +
>>>>>>                              e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn
>>>>>> + i, gfn + i)
>>>>>>                                          ? p2m_ram_logdirty : p2m_ram_rw;
>>>>> I don't think this can be right: Why would it be valid to change the
>>>>> type from p2m_ioreq_server to p2m_ram_rw (or p2m_ram_logdirty)
>>>>> here, without taking into account further information? This code
>>>>> can run at any time, not just when you want to reset things. So at
>>>>> the very least there is a check missing whether a suitable ioreq
>>>>> server still exists (and only if it doesn't you want to do the type
>>>>> reset).
>>>> Also I do not think we need to check if a suitable ioreq server still
>>>> exists. We have guaranteed
>>>> in our patch that no new ioreq server will be mapped as long as the p2m
>>>> table is not clean. :)
>>> Jan is saying that you should only change ioreq_server -> ram if there
>>> is *not* an ioreq server; and that if this is called with an ioreq
>>> server still active, then it must be some other change you're looking at.
>>>
>>> The problem, though, is that misconfiguration happens in many
>>> circumstances.  Grep for "memory_type_changed()" -- each of those
>>> results in a recalculation of the entire p2m, which will (in the current
>>> code) wipe out any ioreq_server entries.
>> Well, I'm not aware that other actions besides the logdirty will cause the reset.
>> But if that would happen, will below change solve this?
>>
>> @@ -546,12 +546,16 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>                        {
>>                             if ( e.sa_p2mt == p2m_ioreq_server )
>>                             {
>> -                             ASSERT(p2m->ioreq.entry_count > 0);
>> -                             p2m->ioreq.entry_count--;
>> +                             if ( p2m->ioreq.server == NULL )
>> +                             {
>> +                                 ASSERT(p2m->ioreq.entry_count > 0);
>> +                                 p2m->ioreq.entry_count--;
>> +                                 e.sa_p2mt = p2m_ram_rw;
>> +                             }
>>                             }
>> -
>> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
>> +                         else
>> +                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>> +                                         ? p2m_ram_logdirty : p2m_ram_rw;
> Now you _never_ change away from ioreq-server, you only adjust
> the counter.

Oh right.

Then sth. like:

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index ecd5ceb..b960a1d 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -542,7 +542,10 @@ static int resolve_misconfig(struct p2m_domain 
*p2m, unsigned long gfn)
                                                 _mfn(e.mfn), 0, &ipat,
                                                 e.sa_p2mt == 
p2m_mmio_direct);
                      e.ipat = ipat;
-                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
+                    if ( e.recalc &&
+                         (p2m_check_changeable(e.sa_p2mt) ||
+                          (e.sa_p2mt == p2m_ioreq_server &&
+                           p2m->ioreq.server == NULL)) )
                      {
                           if ( e.sa_p2mt == p2m_ioreq_server )
                           {
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 4b2ff9e..1c600b1 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -442,7 +442,8 @@ static int do_recalc(struct p2m_domain *p2m, 
unsigned long gfn)
              P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                        p2m->domain->domain_id, gfn, level);
          p2mt_old = p2m_flags_to_type(l1e_get_flags(e));
-        if ( p2m_is_changeable(p2mt_old) )
+        if ( p2m_check_changeable(p2mt_old) ||
+             (p2mt_old == p2m_ioreq_server && p2m->ioreq.server == NULL) )
          {
              unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
              p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, 
gfn | ~mask)

But as you can see. I used p2m_check_changeable() here.

Yu
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 12:17             ` Yu Zhang
@ 2017-04-07 12:36               ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 12:36 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, George Dunlap,
	xen-devel, Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 07.04.17 at 14:17, <yu.c.zhang@linux.intel.com> wrote:
> On 4/7/2017 7:28 PM, Jan Beulich wrote:
>>>>> On 07.04.17 at 12:50, <yu.c.zhang@linux.intel.com> wrote:
>>> @@ -546,12 +546,16 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>>>                        {
>>>                             if ( e.sa_p2mt == p2m_ioreq_server )
>>>                             {
>>> -                             ASSERT(p2m->ioreq.entry_count > 0);
>>> -                             p2m->ioreq.entry_count--;
>>> +                             if ( p2m->ioreq.server == NULL )
>>> +                             {
>>> +                                 ASSERT(p2m->ioreq.entry_count > 0);
>>> +                                 p2m->ioreq.entry_count--;
>>> +                                 e.sa_p2mt = p2m_ram_rw;
>>> +                             }
>>>                             }
>>> -
>>> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
>>> +                         else
>>> +                             e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
>>> +                                         ? p2m_ram_logdirty : p2m_ram_rw;
>> Now you _never_ change away from ioreq-server, you only adjust
>> the counter.
> 
> Oh right.

Oh, wrong (i.e. I was wrong), you do change the type, and did
overlook that extra new line. So your first suggestion (still visible
above) seems right to me now.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 11:31           ` Jan Beulich
@ 2017-04-07 13:56             ` George Dunlap
  2017-04-07 14:05               ` Yu Zhang
  2017-04-07 14:22               ` Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: George Dunlap @ 2017-04-07 13:56 UTC (permalink / raw)
  To: Jan Beulich, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

[-- Attachment #1: Type: text/plain, Size: 18776 bytes --]

On 07/04/17 12:31, Jan Beulich wrote:
>>>> On 07.04.17 at 12:55, <yu.c.zhang@linux.intel.com> wrote:
>> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>>>>        if ( is_epte_valid(ept_entry) )
>>>>>>        {
>>>>>>            if ( (recalc || ept_entry->recalc) &&
>>>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>>>> I think the distinction between these two is rather arbitrary, and I
>>>>> also think this is part of the problem above: Distinguishing log-dirty
>>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>>> to have two p2m_*_changeable() flavors.
>>>> Well, I think we have also discussed this quite long ago, here is the link.
>>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html 
>>> That was a different discussion, namely not considering this ...
>>>
>>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>>> need amending.
>>>>>
>>>>> In the end it looks like you have the inverse problem here compared
>>>>> to above: You should return ram-rw when the reset was already
>>>>> initiated. At least that's how I would see the logic to match up with
>>>>> the log-dirty handling (where the _effective_ rather than the last
>>>>> stored type is being returned).
>>> ... at all.
>>
>> Sorry I still do not get it.
> 
> Leaving ioreq-server out of the picture, what the function returns
> to the caller is the type as it would be if a re-calculation would have
> been done on the entry, even if that hasn't happened yet (the
> function here shouldn't change any entries after all). I think we
> want the same behavior here for what have been ioreq-server
> entries (and which would become ram-rw after the next re-calc).

How about something like the attached? (In-lined for convenience.)

 -George

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

This patch disallows mapping of an ioreq server, when there's still
p2m_ioreq_server entry left, in case another mapping occurs right after
the current one being unmapped, releases its lock, with p2m table not
synced yet.

This patch also disallows live migration, when there's remaining
p2m_ioreq_server entry in p2m table. The core reason is our current
implementation of p2m_change_entry_type_global() lacks information
to resync p2m_ioreq_server entries correctly if global_logdirty is
on.

We still need to handle other recalculations, however; which means
that when doing a recalculation, if the current type is
p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
not.  If it is, we leave it as type p2m_ioreq_server; if not, we reset
it to p2m_ram as appropriate.

To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
it for all type recalculations (both in p2m-pt.c and p2m-ept.c).

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/ioreq.c  |  8 ++++++
 xen/arch/x86/mm/hap/hap.c |  9 ++++++
 xen/arch/x86/mm/p2m-ept.c | 73
+++++++++++++++++++++++++++++------------------
 xen/arch/x86/mm/p2m-pt.c  | 58 ++++++++++++++++++++++++-------------
 xen/arch/x86/mm/p2m.c     |  9 ++++++
 xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
 6 files changed, 135 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 5bf3b6d..07a6c26 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
*d, ioservid_t id,

     spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);

+    if ( rc == 0 && flags == 0 )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
     return rc;
 }

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..4b591fe 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
  */
 static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /*
+     * Refuse to turn on global log-dirty mode if
+     * there are outstanding p2m_ioreq_server pages.
+     */
+    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+        return -EBUSY;
+
     /* turn on PG_log_dirty bit in paging mode */
     paging_lock(d);
     d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..9f41067 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
unsigned long gfn)
             {
                 for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
                 {
+                    p2m_type_t nt;
                     e = atomic_read_ept_entry(&epte[i]);
                     if ( e.emt == MTRR_NUM_TYPES )
                         e.emt = 0;
@@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
*p2m, unsigned long gfn)
                                                _mfn(e.mfn), 0, &ipat,
                                                e.sa_p2mt ==
p2m_mmio_direct);
                     e.ipat = ipat;
-                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
-                    {
-                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
i, gfn + i)
-                                     ? p2m_ram_logdirty : p2m_ram_rw;
+                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
+ i);
+                    if ( nt != e.sa_p2mt ) {
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                         {
+                             ASSERT(p2m->ioreq.entry_count > 0);
+                             p2m->ioreq.entry_count--;
+                         }
+
+                         e.sa_p2mt = nt;
                          ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
e.access);
                     }
                     e.recalc = 0;
@@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
*p2m, unsigned long gfn)

                 if ( recalc && p2m_is_changeable(e.sa_p2mt) )
                 {
-                     unsigned long mask = ~0UL << (level *
EPT_TABLE_ORDER);
-
-                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
-                                                    gfn | ~mask) )
-                     {
-                     case 0:
-                          e.sa_p2mt = p2m_ram_rw;
-                          e.recalc = 0;
-                          break;
-                     case 1:
-                          e.sa_p2mt = p2m_ram_logdirty;
-                          e.recalc = 0;
-                          break;
-                     default: /* Force split. */
-                          emt = -1;
-                          break;
-                     }
+                    unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
+
+                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
+                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
+                                                   gfn | ~mask) )
+                    {
+                    case 0:
+                        e.sa_p2mt = p2m_ram_rw;
+                        e.recalc = 0;
+                        break;
+                    case 1:
+                        e.sa_p2mt = p2m_ram_logdirty;
+                        e.recalc = 0;
+                        break;
+                    default: /* Force split. */
+                        emt = -1;
+                        break;
+                    }
                 }
                 if ( unlikely(emt < 0) )
                 {
@@ -816,6 +823,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long
gfn, mfn_t mfn,
         new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                     old_entry.suppress_ve : 1;

+    /*
+     * p2m_ioreq_server is only used for 4K pages, so the
+     * count shall only happen on ept page table entries.
+     */
+    if ( p2mt == p2m_ioreq_server )
+    {
+        ASSERT(i == 0);
+        p2m->ioreq.entry_count++;
+    }
+
+    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
+    {
+        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
+        p2m->ioreq.entry_count--;
+    }
+
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
         old_entry.epte = 0;
@@ -964,12 +987,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,

     if ( is_epte_valid(ept_entry) )
     {
-        if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
-            *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
-                                                      : p2m_ram_rw;
-        else
-            *t = ept_entry->sa_p2mt;
+        *t = p2m_recalc_type(recalc || ept_entry->recalc,
+                             ept_entry->sa_p2mt, p2m, gfn);
         *a = ept_entry->access;
         if ( sve )
             *sve = ept_entry->suppress_ve;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index c0055f3..e481f53 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -389,6 +389,7 @@ static int do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
         {
             unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);

+            ASSERT(p2m_flags_to_type(l1e_get_flags(*pent)) !=
p2m_ioreq_server);
             if ( !needs_recalc(l1, *pent) ||

!p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(*pent))) ||
                  p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) >= 0 )
@@ -436,17 +437,18 @@ static int do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
          needs_recalc(l1, *pent) )
     {
         l1_pgentry_t e = *pent;
+        p2m_type_t ot, nt;
+        unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);

         if ( !valid_recalc(l1, e) )
             P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                       p2m->domain->domain_id, gfn, level);
-        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
+        ot = p2m_flags_to_type(l1e_get_flags(e));
+        nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask);
+        if ( nt != ot )
         {
-            unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
-            p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask,
gfn | ~mask)
-                              ? p2m_ram_logdirty : p2m_ram_rw;
             unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+            unsigned long flags = p2m_type_to_flags(p2m, nt,
                                                     _mfn(mfn), level);

             if ( level )
@@ -460,9 +462,17 @@ static int do_recalc(struct p2m_domain *p2m,
unsigned long gfn)
                      mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
+
+            if ( ot == p2m_ioreq_server )
+            {
+                ASSERT(p2m->ioreq.entry_count > 0);
+                ASSERT(level == 0);
+                p2m->ioreq.entry_count--;
+            }
+
             e = l1e_from_pfn(mfn, flags);
             p2m_add_iommu_flags(&e, level,
-                                (p2mt == p2m_ram_rw)
+                                (nt == p2m_ram_rw)
                                 ? IOMMUF_readable|IOMMUF_writable : 0);
             ASSERT(!needs_recalc(l1, e));
         }
@@ -606,6 +616,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,

     if ( page_order == PAGE_ORDER_4K )
     {
+        p2m_type_t p2mt_old;
+
         rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
                             L2_PAGETABLE_SHIFT - PAGE_SHIFT,
                             L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
@@ -629,6 +641,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,
         if ( entry_content.l1 != 0 )
             p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);

+        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
+
+        /*
+         * p2m_ioreq_server is only used for 4K pages, so
+         * the count shall only be performed for level 1 entries.
+         */
+        if ( p2mt == p2m_ioreq_server )
+            p2m->ioreq.entry_count++;
+
+        if ( p2mt_old == p2m_ioreq_server )
+        {
+            ASSERT(p2m->ioreq.entry_count > 0);
+            p2m->ioreq.entry_count--;
+        }
+
         /* level 1 entry */
         p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
         /* NB: paging_write_p2m_entry() handles tlb flushes properly */
@@ -726,15 +753,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
long gfn, mfn_t mfn,
     return rc;
 }

-static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
-                                     struct p2m_domain *p2m, unsigned
long gfn)
-{
-    if ( !recalc || !p2m_is_changeable(t) )
-        return t;
-    return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
-                                                : p2m_ram_rw;
-}
-
 static mfn_t
 p2m_pt_get_entry(struct p2m_domain *p2m, unsigned long gfn,
                  p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
@@ -820,8 +838,8 @@ pod_retry_l3:
             mfn = _mfn(l3e_get_pfn(*l3e) +
                        l2_table_offset(addr) * L1_PAGETABLE_ENTRIES +
                        l1_table_offset(addr));
-            *t = recalc_type(recalc || _needs_recalc(flags),
-                             p2m_flags_to_type(flags), p2m, gfn);
+            *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                                 p2m_flags_to_type(flags), p2m, gfn);
             unmap_domain_page(l3e);

             ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -859,8 +877,8 @@ pod_retry_l2:
     if ( flags & _PAGE_PSE )
     {
         mfn = _mfn(l2e_get_pfn(*l2e) + l1_table_offset(addr));
-        *t = recalc_type(recalc || _needs_recalc(flags),
-                         p2m_flags_to_type(flags), p2m, gfn);
+        *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                             p2m_flags_to_type(flags), p2m, gfn);
         unmap_domain_page(l2e);

         ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -896,7 +914,7 @@ pod_retry_l1:
         return INVALID_MFN;
     }
     mfn = _mfn(l1e_get_pfn(*l1e));
-    *t = recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
+    *t = p2m_recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
     unmap_domain_page(l1e);

     ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b84add0..4169d18 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
         if ( p2m->ioreq.server != NULL )
             goto out;

+        /*
+         * It is possible that an ioreq server has just been unmapped,
+         * released the spin lock, with some p2m_ioreq_server entries
+         * in p2m table remained. We shall refuse another ioreq server
+         * mapping request in such case.
+         */
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            goto out;
+
         p2m->ioreq.server = s;
         p2m->ioreq.flags = flags;
     }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 4521620..8077d0d 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;

 /* Types that can be subject to bulk transitions. */
 #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
-                              | p2m_to_mask(p2m_ram_logdirty) )
+                              | p2m_to_mask(p2m_ram_logdirty) \
+                              | p2m_to_mask(p2m_ioreq_server) )
+
+#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))

 #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))

@@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
 #define p2m_is_discard_write(_t) (p2m_to_mask(_t) &
P2M_DISCARD_WRITE_TYPES)
 #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
+#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
 #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
 #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
 /* Grant types are *not* considered valid, because they can be
@@ -349,6 +353,7 @@ struct p2m_domain {
           * are to be emulated by an ioreq server.
           */
          unsigned int flags;
+         unsigned long entry_count;
      } ioreq;
 };

@@ -744,6 +749,25 @@ static inline p2m_type_t p2m_flags_to_type(unsigned
long flags)
     return (flags >> 12) & 0x7f;
 }

+static inline p2m_type_t p2m_recalc_type_range(bool_t recalc, p2m_type_t t,
+                                               struct p2m_domain *p2m,
+                                               unsigned long gfn_start,
+                                               unsigned long gfn_end)
+{
+    if ( !recalc || !p2m_is_changeable(t) )
+        return t;
+    if ( t == p2m_ioreq_server && p2m->ioreq.server != NULL )
+        return t;
+    return p2m_is_logdirty_range(p2m, gfn_start, gfn_end) ?
p2m_ram_logdirty
+        : p2m_ram_rw;
+}
+
+static inline p2m_type_t p2m_recalc_type(bool_t recalc, p2m_type_t t,
+                                         struct p2m_domain *p2m,
unsigned long gfn)
+{
+    return p2m_recalc_type_range(recalc, t, p2m, gfn, gfn);
+}
+
 int p2m_pt_handle_deferred_changes(uint64_t gpa);

 /*
-- 
2.1.4



[-- Attachment #2: 0001-x86-ioreq-server-Asynchronously-reset-outstanding-p2.patch --]
[-- Type: text/x-diff, Size: 16923 bytes --]

>From 75929db08097a7a11d2c586afc86c18b6154f7b1 Mon Sep 17 00:00:00 2001
From: Yu Zhang <yu.c.zhang@linux.intel.com>
Date: Thu, 6 Apr 2017 23:53:36 +0800
Subject: [PATCH] x86/ioreq server: Asynchronously reset outstanding
 p2m_ioreq_server entries.

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

New field entry_count is introduced in struct p2m_domain, to record
the number of p2m_ioreq_server p2m page table entries. One nature of
these entries is that they only point to 4K sized page frames, because
all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
p2m_change_type_one(). We do not need to worry about the counting for
2M/1G sized pages.

This patch disallows mapping of an ioreq server, when there's still
p2m_ioreq_server entry left, in case another mapping occurs right after
the current one being unmapped, releases its lock, with p2m table not
synced yet.

This patch also disallows live migration, when there's remaining
p2m_ioreq_server entry in p2m table. The core reason is our current
implementation of p2m_change_entry_type_global() lacks information
to resync p2m_ioreq_server entries correctly if global_logdirty is
on.

We still need to handle other recalculations, however; which means
that when doing a recalculation, if the current type is
p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
not.  If it is, we leave it as type p2m_ioreq_server; if not, we reset
it to p2m_ram as appropriate.

To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
it for all type recalculations (both in p2m-pt.c and p2m-ept.c).

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/ioreq.c  |  8 ++++++
 xen/arch/x86/mm/hap/hap.c |  9 ++++++
 xen/arch/x86/mm/p2m-ept.c | 73 +++++++++++++++++++++++++++++------------------
 xen/arch/x86/mm/p2m-pt.c  | 58 ++++++++++++++++++++++++-------------
 xen/arch/x86/mm/p2m.c     |  9 ++++++
 xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
 6 files changed, 135 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 5bf3b6d..07a6c26 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
 
     spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
 
+    if ( rc == 0 && flags == 0 )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..4b591fe 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
  */
 static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /*
+     * Refuse to turn on global log-dirty mode if
+     * there are outstanding p2m_ioreq_server pages.
+     */
+    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+        return -EBUSY;
+
     /* turn on PG_log_dirty bit in paging mode */
     paging_lock(d);
     d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..9f41067 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
             {
                 for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
                 {
+                    p2m_type_t nt;
                     e = atomic_read_ept_entry(&epte[i]);
                     if ( e.emt == MTRR_NUM_TYPES )
                         e.emt = 0;
@@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                                                _mfn(e.mfn), 0, &ipat,
                                                e.sa_p2mt == p2m_mmio_direct);
                     e.ipat = ipat;
-                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
-                    {
-                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
-                                     ? p2m_ram_logdirty : p2m_ram_rw;
+                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn + i);
+                    if ( nt != e.sa_p2mt ) {
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                         {
+                             ASSERT(p2m->ioreq.entry_count > 0);
+                             p2m->ioreq.entry_count--;
+                         }
+
+                         e.sa_p2mt = nt;
                          ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
                     }
                     e.recalc = 0;
@@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
 
                 if ( recalc && p2m_is_changeable(e.sa_p2mt) )
                 {
-                     unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
-
-                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
-                                                    gfn | ~mask) )
-                     {
-                     case 0:
-                          e.sa_p2mt = p2m_ram_rw;
-                          e.recalc = 0;
-                          break;
-                     case 1:
-                          e.sa_p2mt = p2m_ram_logdirty;
-                          e.recalc = 0;
-                          break;
-                     default: /* Force split. */
-                          emt = -1;
-                          break;
-                     }
+                    unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
+                    
+                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
+                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
+                                                   gfn | ~mask) )
+                    {
+                    case 0:
+                        e.sa_p2mt = p2m_ram_rw;
+                        e.recalc = 0;
+                        break;
+                    case 1:
+                        e.sa_p2mt = p2m_ram_logdirty;
+                        e.recalc = 0;
+                        break;
+                    default: /* Force split. */
+                        emt = -1;
+                        break;
+                    }
                 }
                 if ( unlikely(emt < 0) )
                 {
@@ -816,6 +823,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                     old_entry.suppress_ve : 1;
 
+    /*
+     * p2m_ioreq_server is only used for 4K pages, so the
+     * count shall only happen on ept page table entries.
+     */
+    if ( p2mt == p2m_ioreq_server )
+    {
+        ASSERT(i == 0);
+        p2m->ioreq.entry_count++;
+    }
+
+    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
+    {
+        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
+        p2m->ioreq.entry_count--;
+    }
+
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
         old_entry.epte = 0;
@@ -964,12 +987,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
     if ( is_epte_valid(ept_entry) )
     {
-        if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
-            *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
-                                                      : p2m_ram_rw;
-        else
-            *t = ept_entry->sa_p2mt;
+        *t = p2m_recalc_type(recalc || ept_entry->recalc,
+                             ept_entry->sa_p2mt, p2m, gfn);
         *a = ept_entry->access;
         if ( sve )
             *sve = ept_entry->suppress_ve;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index c0055f3..e481f53 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -389,6 +389,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
         {
             unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
 
+            ASSERT(p2m_flags_to_type(l1e_get_flags(*pent)) != p2m_ioreq_server);
             if ( !needs_recalc(l1, *pent) ||
                  !p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(*pent))) ||
                  p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) >= 0 )
@@ -436,17 +437,18 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
          needs_recalc(l1, *pent) )
     {
         l1_pgentry_t e = *pent;
+        p2m_type_t ot, nt;
+        unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
 
         if ( !valid_recalc(l1, e) )
             P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                       p2m->domain->domain_id, gfn, level);
-        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
+        ot = p2m_flags_to_type(l1e_get_flags(e));
+        nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask);
+        if ( nt != ot )
         {
-            unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
-            p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
-                              ? p2m_ram_logdirty : p2m_ram_rw;
             unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+            unsigned long flags = p2m_type_to_flags(p2m, nt,
                                                     _mfn(mfn), level);
 
             if ( level )
@@ -460,9 +462,17 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                      mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
+
+            if ( ot == p2m_ioreq_server )
+            {
+                ASSERT(p2m->ioreq.entry_count > 0);
+                ASSERT(level == 0);
+                p2m->ioreq.entry_count--;
+            }
+
             e = l1e_from_pfn(mfn, flags);
             p2m_add_iommu_flags(&e, level,
-                                (p2mt == p2m_ram_rw)
+                                (nt == p2m_ram_rw)
                                 ? IOMMUF_readable|IOMMUF_writable : 0);
             ASSERT(!needs_recalc(l1, e));
         }
@@ -606,6 +616,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
     if ( page_order == PAGE_ORDER_4K )
     {
+        p2m_type_t p2mt_old;
+
         rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
                             L2_PAGETABLE_SHIFT - PAGE_SHIFT,
                             L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
@@ -629,6 +641,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         if ( entry_content.l1 != 0 )
             p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
 
+        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
+
+        /*
+         * p2m_ioreq_server is only used for 4K pages, so
+         * the count shall only be performed for level 1 entries.
+         */
+        if ( p2mt == p2m_ioreq_server )
+            p2m->ioreq.entry_count++;
+
+        if ( p2mt_old == p2m_ioreq_server )
+        {
+            ASSERT(p2m->ioreq.entry_count > 0);
+            p2m->ioreq.entry_count--;
+        }
+
         /* level 1 entry */
         p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
         /* NB: paging_write_p2m_entry() handles tlb flushes properly */
@@ -726,15 +753,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     return rc;
 }
 
-static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
-                                     struct p2m_domain *p2m, unsigned long gfn)
-{
-    if ( !recalc || !p2m_is_changeable(t) )
-        return t;
-    return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
-                                                : p2m_ram_rw;
-}
-
 static mfn_t
 p2m_pt_get_entry(struct p2m_domain *p2m, unsigned long gfn,
                  p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
@@ -820,8 +838,8 @@ pod_retry_l3:
             mfn = _mfn(l3e_get_pfn(*l3e) +
                        l2_table_offset(addr) * L1_PAGETABLE_ENTRIES +
                        l1_table_offset(addr));
-            *t = recalc_type(recalc || _needs_recalc(flags),
-                             p2m_flags_to_type(flags), p2m, gfn);
+            *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                                 p2m_flags_to_type(flags), p2m, gfn);
             unmap_domain_page(l3e);
 
             ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -859,8 +877,8 @@ pod_retry_l2:
     if ( flags & _PAGE_PSE )
     {
         mfn = _mfn(l2e_get_pfn(*l2e) + l1_table_offset(addr));
-        *t = recalc_type(recalc || _needs_recalc(flags),
-                         p2m_flags_to_type(flags), p2m, gfn);
+        *t = p2m_recalc_type(recalc || _needs_recalc(flags),
+                             p2m_flags_to_type(flags), p2m, gfn);
         unmap_domain_page(l2e);
         
         ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
@@ -896,7 +914,7 @@ pod_retry_l1:
         return INVALID_MFN;
     }
     mfn = _mfn(l1e_get_pfn(*l1e));
-    *t = recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
+    *t = p2m_recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
     unmap_domain_page(l1e);
 
     ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b84add0..4169d18 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
         if ( p2m->ioreq.server != NULL )
             goto out;
 
+        /*
+         * It is possible that an ioreq server has just been unmapped,
+         * released the spin lock, with some p2m_ioreq_server entries
+         * in p2m table remained. We shall refuse another ioreq server
+         * mapping request in such case.
+         */
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            goto out;
+
         p2m->ioreq.server = s;
         p2m->ioreq.flags = flags;
     }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 4521620..8077d0d 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
 
 /* Types that can be subject to bulk transitions. */
 #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
-                              | p2m_to_mask(p2m_ram_logdirty) )
+                              | p2m_to_mask(p2m_ram_logdirty) \
+                              | p2m_to_mask(p2m_ioreq_server) )
+
+#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
 
 #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
 
@@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
 #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
 #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
+#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
 #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
 #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
 /* Grant types are *not* considered valid, because they can be
@@ -349,6 +353,7 @@ struct p2m_domain {
           * are to be emulated by an ioreq server.
           */
          unsigned int flags;
+         unsigned long entry_count;
      } ioreq;
 };
 
@@ -744,6 +749,25 @@ static inline p2m_type_t p2m_flags_to_type(unsigned long flags)
     return (flags >> 12) & 0x7f;
 }
 
+static inline p2m_type_t p2m_recalc_type_range(bool_t recalc, p2m_type_t t,
+                                               struct p2m_domain *p2m,
+                                               unsigned long gfn_start,
+                                               unsigned long gfn_end)
+{
+    if ( !recalc || !p2m_is_changeable(t) )
+        return t;
+    if ( t == p2m_ioreq_server && p2m->ioreq.server != NULL )
+        return t;
+    return p2m_is_logdirty_range(p2m, gfn_start, gfn_end) ? p2m_ram_logdirty
+        : p2m_ram_rw;
+}
+
+static inline p2m_type_t p2m_recalc_type(bool_t recalc, p2m_type_t t,
+                                         struct p2m_domain *p2m, unsigned long gfn)
+{
+    return p2m_recalc_type_range(recalc, t, p2m, gfn, gfn);
+}
+
 int p2m_pt_handle_deferred_changes(uint64_t gpa);
 
 /*
-- 
2.1.4


[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 13:56             ` George Dunlap
@ 2017-04-07 14:05               ` Yu Zhang
  2017-04-07 14:22                 ` George Dunlap
  2017-04-07 14:22               ` Jan Beulich
  1 sibling, 1 reply; 29+ messages in thread
From: Yu Zhang @ 2017-04-07 14:05 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 4/7/2017 9:56 PM, George Dunlap wrote:
> On 07/04/17 12:31, Jan Beulich wrote:
>>>>> On 07.04.17 at 12:55, <yu.c.zhang@linux.intel.com> wrote:
>>> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>>>>>>>         if ( is_epte_valid(ept_entry) )
>>>>>>>         {
>>>>>>>             if ( (recalc || ept_entry->recalc) &&
>>>>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>>>>> I think the distinction between these two is rather arbitrary, and I
>>>>>> also think this is part of the problem above: Distinguishing log-dirty
>>>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>>>> to have two p2m_*_changeable() flavors.
>>>>> Well, I think we have also discussed this quite long ago, here is the link.
>>>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
>>>> That was a different discussion, namely not considering this ...
>>>>
>>>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>>>> need amending.
>>>>>>
>>>>>> In the end it looks like you have the inverse problem here compared
>>>>>> to above: You should return ram-rw when the reset was already
>>>>>> initiated. At least that's how I would see the logic to match up with
>>>>>> the log-dirty handling (where the _effective_ rather than the last
>>>>>> stored type is being returned).
>>>> ... at all.
>>> Sorry I still do not get it.
>> Leaving ioreq-server out of the picture, what the function returns
>> to the caller is the type as it would be if a re-calculation would have
>> been done on the entry, even if that hasn't happened yet (the
>> function here shouldn't change any entries after all). I think we
>> want the same behavior here for what have been ioreq-server
>> entries (and which would become ram-rw after the next re-calc).
> How about something like the attached? (In-lined for convenience.)
>
>   -George
>
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> asynchronously with the current p2m_change_entry_type_global()
> interface.
>
> New field entry_count is introduced in struct p2m_domain, to record
> the number of p2m_ioreq_server p2m page table entries. One nature of
> these entries is that they only point to 4K sized page frames, because
> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
> p2m_change_type_one(). We do not need to worry about the counting for
> 2M/1G sized pages.
>
> This patch disallows mapping of an ioreq server, when there's still
> p2m_ioreq_server entry left, in case another mapping occurs right after
> the current one being unmapped, releases its lock, with p2m table not
> synced yet.
>
> This patch also disallows live migration, when there's remaining
> p2m_ioreq_server entry in p2m table. The core reason is our current
> implementation of p2m_change_entry_type_global() lacks information
> to resync p2m_ioreq_server entries correctly if global_logdirty is
> on.
>
> We still need to handle other recalculations, however; which means
> that when doing a recalculation, if the current type is
> p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
> not.  If it is, we leave it as type p2m_ioreq_server; if not, we reset
> it to p2m_ram as appropriate.
>
> To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
> it for all type recalculations (both in p2m-pt.c and p2m-ept.c).
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> ---
>   xen/arch/x86/hvm/ioreq.c  |  8 ++++++
>   xen/arch/x86/mm/hap/hap.c |  9 ++++++
>   xen/arch/x86/mm/p2m-ept.c | 73
> +++++++++++++++++++++++++++++------------------
>   xen/arch/x86/mm/p2m-pt.c  | 58 ++++++++++++++++++++++++-------------
>   xen/arch/x86/mm/p2m.c     |  9 ++++++
>   xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
>   6 files changed, 135 insertions(+), 48 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index 5bf3b6d..07a6c26 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
> *d, ioservid_t id,
>
>       spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> +    }
> +
>       return rc;
>   }
>
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index a57b385..4b591fe 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -187,6 +187,15 @@ out:
>    */
>   static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>   {
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +    /*
> +     * Refuse to turn on global log-dirty mode if
> +     * there are outstanding p2m_ioreq_server pages.
> +     */
> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
> +        return -EBUSY;
> +
>       /* turn on PG_log_dirty bit in paging mode */
>       paging_lock(d);
>       d->arch.paging.mode |= PG_log_dirty;
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index cc1eb21..9f41067 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
> unsigned long gfn)
>               {
>                   for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
>                   {
> +                    p2m_type_t nt;
>                       e = atomic_read_ept_entry(&epte[i]);
>                       if ( e.emt == MTRR_NUM_TYPES )
>                           e.emt = 0;
> @@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
> *p2m, unsigned long gfn)
>                                                  _mfn(e.mfn), 0, &ipat,
>                                                  e.sa_p2mt ==
> p2m_mmio_direct);
>                       e.ipat = ipat;
> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
> -                    {
> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
> i, gfn + i)
> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
> +                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
> + i);
> +                    if ( nt != e.sa_p2mt ) {
> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             ASSERT(p2m->ioreq.entry_count > 0);
> +                             p2m->ioreq.entry_count--;
> +                         }
> +
> +                         e.sa_p2mt = nt;
>                            ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
> e.access);
>                       }
>                       e.recalc = 0;
> @@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
> *p2m, unsigned long gfn)
>
>                   if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>                   {
> -                     unsigned long mask = ~0UL << (level *
> EPT_TABLE_ORDER);
> -
> -                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
> -                                                    gfn | ~mask) )
> -                     {
> -                     case 0:
> -                          e.sa_p2mt = p2m_ram_rw;
> -                          e.recalc = 0;
> -                          break;
> -                     case 1:
> -                          e.sa_p2mt = p2m_ram_logdirty;
> -                          e.recalc = 0;
> -                          break;
> -                     default: /* Force split. */
> -                          emt = -1;
> -                          break;
> -                     }
> +                    unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
> +
> +                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
> +                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
> +                                                   gfn | ~mask) )
> +                    {
> +                    case 0:
> +                        e.sa_p2mt = p2m_ram_rw;
> +                        e.recalc = 0;
> +                        break;
> +                    case 1:
> +                        e.sa_p2mt = p2m_ram_logdirty;
> +                        e.recalc = 0;
> +                        break;
> +                    default: /* Force split. */
> +                        emt = -1;
> +                        break;
> +                    }


So here I guess only the "ASSERT(e.sa_p2mt != p2m_ioreq_server);" is 
new, right?
Thanks George, it's great to know your helper routine works also for the 
NPT.

Yu

>                   }
>                   if ( unlikely(emt < 0) )
>                   {
> @@ -816,6 +823,22 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long
> gfn, mfn_t mfn,
>           new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>                                       old_entry.suppress_ve : 1;
>
> +    /*
> +     * p2m_ioreq_server is only used for 4K pages, so the
> +     * count shall only happen on ept page table entries.
> +     */
> +    if ( p2mt == p2m_ioreq_server )
> +    {
> +        ASSERT(i == 0);
> +        p2m->ioreq.entry_count++;
> +    }
> +
> +    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
> +    {
> +        ASSERT(p2m->ioreq.entry_count > 0 && i == 0);
> +        p2m->ioreq.entry_count--;
> +    }
> +
>       rc = atomic_write_ept_entry(ept_entry, new_entry, target);
>       if ( unlikely(rc) )
>           old_entry.epte = 0;
> @@ -964,12 +987,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
>
>       if ( is_epte_valid(ept_entry) )
>       {
> -        if ( (recalc || ept_entry->recalc) &&
> -             p2m_is_changeable(ept_entry->sa_p2mt) )
> -            *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
> -                                                      : p2m_ram_rw;
> -        else
> -            *t = ept_entry->sa_p2mt;
> +        *t = p2m_recalc_type(recalc || ept_entry->recalc,
> +                             ept_entry->sa_p2mt, p2m, gfn);
>           *a = ept_entry->access;
>           if ( sve )
>               *sve = ept_entry->suppress_ve;
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index c0055f3..e481f53 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -389,6 +389,7 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>           {
>               unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
>
> +            ASSERT(p2m_flags_to_type(l1e_get_flags(*pent)) !=
> p2m_ioreq_server);
>               if ( !needs_recalc(l1, *pent) ||
>
> !p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(*pent))) ||
>                    p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) >= 0 )
> @@ -436,17 +437,18 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>            needs_recalc(l1, *pent) )
>       {
>           l1_pgentry_t e = *pent;
> +        p2m_type_t ot, nt;
> +        unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
>
>           if ( !valid_recalc(l1, e) )
>               P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
>                         p2m->domain->domain_id, gfn, level);
> -        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
> +        ot = p2m_flags_to_type(l1e_get_flags(e));
> +        nt = p2m_recalc_type_range(true, ot, p2m, gfn & mask, gfn | ~mask);
> +        if ( nt != ot )
>           {
> -            unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
> -            p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask,
> gfn | ~mask)
> -                              ? p2m_ram_logdirty : p2m_ram_rw;
>               unsigned long mfn = l1e_get_pfn(e);
> -            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
> +            unsigned long flags = p2m_type_to_flags(p2m, nt,
>                                                       _mfn(mfn), level);
>
>               if ( level )
> @@ -460,9 +462,17 @@ static int do_recalc(struct p2m_domain *p2m,
> unsigned long gfn)
>                        mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
>                   flags |= _PAGE_PSE;
>               }
> +
> +            if ( ot == p2m_ioreq_server )
> +            {
> +                ASSERT(p2m->ioreq.entry_count > 0);
> +                ASSERT(level == 0);
> +                p2m->ioreq.entry_count--;
> +            }
> +
>               e = l1e_from_pfn(mfn, flags);
>               p2m_add_iommu_flags(&e, level,
> -                                (p2mt == p2m_ram_rw)
> +                                (nt == p2m_ram_rw)
>                                   ? IOMMUF_readable|IOMMUF_writable : 0);
>               ASSERT(!needs_recalc(l1, e));
>           }
> @@ -606,6 +616,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
> long gfn, mfn_t mfn,
>
>       if ( page_order == PAGE_ORDER_4K )
>       {
> +        p2m_type_t p2mt_old;
> +
>           rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>                               L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>                               L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
> @@ -629,6 +641,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
> long gfn, mfn_t mfn,
>           if ( entry_content.l1 != 0 )
>               p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
>
> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
> +
> +        /*
> +         * p2m_ioreq_server is only used for 4K pages, so
> +         * the count shall only be performed for level 1 entries.
> +         */
> +        if ( p2mt == p2m_ioreq_server )
> +            p2m->ioreq.entry_count++;
> +
> +        if ( p2mt_old == p2m_ioreq_server )
> +        {
> +            ASSERT(p2m->ioreq.entry_count > 0);
> +            p2m->ioreq.entry_count--;
> +        }
> +
>           /* level 1 entry */
>           p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>           /* NB: paging_write_p2m_entry() handles tlb flushes properly */
> @@ -726,15 +753,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned
> long gfn, mfn_t mfn,
>       return rc;
>   }
>
> -static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
> -                                     struct p2m_domain *p2m, unsigned
> long gfn)
> -{
> -    if ( !recalc || !p2m_is_changeable(t) )
> -        return t;
> -    return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
> -                                                : p2m_ram_rw;
> -}
> -
>   static mfn_t
>   p2m_pt_get_entry(struct p2m_domain *p2m, unsigned long gfn,
>                    p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
> @@ -820,8 +838,8 @@ pod_retry_l3:
>               mfn = _mfn(l3e_get_pfn(*l3e) +
>                          l2_table_offset(addr) * L1_PAGETABLE_ENTRIES +
>                          l1_table_offset(addr));
> -            *t = recalc_type(recalc || _needs_recalc(flags),
> -                             p2m_flags_to_type(flags), p2m, gfn);
> +            *t = p2m_recalc_type(recalc || _needs_recalc(flags),
> +                                 p2m_flags_to_type(flags), p2m, gfn);
>               unmap_domain_page(l3e);
>
>               ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
> @@ -859,8 +877,8 @@ pod_retry_l2:
>       if ( flags & _PAGE_PSE )
>       {
>           mfn = _mfn(l2e_get_pfn(*l2e) + l1_table_offset(addr));
> -        *t = recalc_type(recalc || _needs_recalc(flags),
> -                         p2m_flags_to_type(flags), p2m, gfn);
> +        *t = p2m_recalc_type(recalc || _needs_recalc(flags),
> +                             p2m_flags_to_type(flags), p2m, gfn);
>           unmap_domain_page(l2e);
>
>           ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
> @@ -896,7 +914,7 @@ pod_retry_l1:
>           return INVALID_MFN;
>       }
>       mfn = _mfn(l1e_get_pfn(*l1e));
> -    *t = recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
> +    *t = p2m_recalc_type(recalc || _needs_recalc(flags), l1t, p2m, gfn);
>       unmap_domain_page(l1e);
>
>       ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index b84add0..4169d18 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d,
>           if ( p2m->ioreq.server != NULL )
>               goto out;
>
> +        /*
> +         * It is possible that an ioreq server has just been unmapped,
> +         * released the spin lock, with some p2m_ioreq_server entries
> +         * in p2m table remained. We shall refuse another ioreq server
> +         * mapping request in such case.
> +         */
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            goto out;
> +
>           p2m->ioreq.server = s;
>           p2m->ioreq.flags = flags;
>       }
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 4521620..8077d0d 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
>
>   /* Types that can be subject to bulk transitions. */
>   #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
> -                              | p2m_to_mask(p2m_ram_logdirty) )
> +                              | p2m_to_mask(p2m_ram_logdirty) \
> +                              | p2m_to_mask(p2m_ioreq_server) )
> +
> +#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
>
>   #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
>
> @@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
>   #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
>   #define p2m_is_discard_write(_t) (p2m_to_mask(_t) &
> P2M_DISCARD_WRITE_TYPES)
>   #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
> +#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
>   #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
>   #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
>   /* Grant types are *not* considered valid, because they can be
> @@ -349,6 +353,7 @@ struct p2m_domain {
>             * are to be emulated by an ioreq server.
>             */
>            unsigned int flags;
> +         unsigned long entry_count;
>        } ioreq;
>   };
>
> @@ -744,6 +749,25 @@ static inline p2m_type_t p2m_flags_to_type(unsigned
> long flags)
>       return (flags >> 12) & 0x7f;
>   }
>
> +static inline p2m_type_t p2m_recalc_type_range(bool_t recalc, p2m_type_t t,
> +                                               struct p2m_domain *p2m,
> +                                               unsigned long gfn_start,
> +                                               unsigned long gfn_end)
> +{
> +    if ( !recalc || !p2m_is_changeable(t) )
> +        return t;
> +    if ( t == p2m_ioreq_server && p2m->ioreq.server != NULL )
> +        return t;
> +    return p2m_is_logdirty_range(p2m, gfn_start, gfn_end) ?
> p2m_ram_logdirty
> +        : p2m_ram_rw;
> +}
> +
> +static inline p2m_type_t p2m_recalc_type(bool_t recalc, p2m_type_t t,
> +                                         struct p2m_domain *p2m,
> unsigned long gfn)
> +{
> +    return p2m_recalc_type_range(recalc, t, p2m, gfn, gfn);
> +}
> +
>   int p2m_pt_handle_deferred_changes(uint64_t gpa);
>
>   /*
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 14:05               ` Yu Zhang
@ 2017-04-07 14:22                 ` George Dunlap
  0 siblings, 0 replies; 29+ messages in thread
From: George Dunlap @ 2017-04-07 14:22 UTC (permalink / raw)
  To: Yu Zhang, Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

On 07/04/17 15:05, Yu Zhang wrote:
> 
> 
> On 4/7/2017 9:56 PM, George Dunlap wrote:
>> On 07/04/17 12:31, Jan Beulich wrote:
>>>>>> On 07.04.17 at 12:55, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 4/7/2017 6:26 PM, Jan Beulich wrote:
>>>>>>>> On 07.04.17 at 11:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> On 4/7/2017 5:40 PM, Jan Beulich wrote:
>>>>>>>>>> On 06.04.17 at 17:53, <yu.c.zhang@linux.intel.com> wrote:
>>>>>>>> @@ -965,7 +987,7 @@ static mfn_t ept_get_entry(struct p2m_domain
>>>>>>>> *p2m,
>>>>>>>>         if ( is_epte_valid(ept_entry) )
>>>>>>>>         {
>>>>>>>>             if ( (recalc || ept_entry->recalc) &&
>>>>>>>> -             p2m_is_changeable(ept_entry->sa_p2mt) )
>>>>>>>> +             p2m_check_changeable(ept_entry->sa_p2mt) )
>>>>>>> I think the distinction between these two is rather arbitrary, and I
>>>>>>> also think this is part of the problem above: Distinguishing
>>>>>>> log-dirty
>>>>>>> from ram-rw requires auxiliary data to be consulted. The same
>>>>>>> ought to apply to ioreq-server, and then there wouldn't be a need
>>>>>>> to have two p2m_*_changeable() flavors.
>>>>>> Well, I think we have also discussed this quite long ago, here is
>>>>>> the link.
>>>>>> https://lists.xen.org/archives/html/xen-devel/2016-09/msg01017.html
>>>>> That was a different discussion, namely not considering this ...
>>>>>
>>>>>>> Of course the subsequent use p2m_is_logdirty_range() may then
>>>>>>> need amending.
>>>>>>>
>>>>>>> In the end it looks like you have the inverse problem here compared
>>>>>>> to above: You should return ram-rw when the reset was already
>>>>>>> initiated. At least that's how I would see the logic to match up
>>>>>>> with
>>>>>>> the log-dirty handling (where the _effective_ rather than the last
>>>>>>> stored type is being returned).
>>>>> ... at all.
>>>> Sorry I still do not get it.
>>> Leaving ioreq-server out of the picture, what the function returns
>>> to the caller is the type as it would be if a re-calculation would have
>>> been done on the entry, even if that hasn't happened yet (the
>>> function here shouldn't change any entries after all). I think we
>>> want the same behavior here for what have been ioreq-server
>>> entries (and which would become ram-rw after the next re-calc).
>> How about something like the attached? (In-lined for convenience.)
>>
>>   -George
>>
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> asynchronously with the current p2m_change_entry_type_global()
>> interface.
>>
>> New field entry_count is introduced in struct p2m_domain, to record
>> the number of p2m_ioreq_server p2m page table entries. One nature of
>> these entries is that they only point to 4K sized page frames, because
>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in
>> p2m_change_type_one(). We do not need to worry about the counting for
>> 2M/1G sized pages.
>>
>> This patch disallows mapping of an ioreq server, when there's still
>> p2m_ioreq_server entry left, in case another mapping occurs right after
>> the current one being unmapped, releases its lock, with p2m table not
>> synced yet.
>>
>> This patch also disallows live migration, when there's remaining
>> p2m_ioreq_server entry in p2m table. The core reason is our current
>> implementation of p2m_change_entry_type_global() lacks information
>> to resync p2m_ioreq_server entries correctly if global_logdirty is
>> on.
>>
>> We still need to handle other recalculations, however; which means
>> that when doing a recalculation, if the current type is
>> p2m_ioreq_server, we check to see if p2m->ioreq.server is valid or
>> not.  If it is, we leave it as type p2m_ioreq_server; if not, we reset
>> it to p2m_ram as appropriate.
>>
>> To avoid code duplication, lift recalc_type() out of p2m-pt.c and use
>> it for all type recalculations (both in p2m-pt.c and p2m-ept.c).
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
>> ---
>>   xen/arch/x86/hvm/ioreq.c  |  8 ++++++
>>   xen/arch/x86/mm/hap/hap.c |  9 ++++++
>>   xen/arch/x86/mm/p2m-ept.c | 73
>> +++++++++++++++++++++++++++++------------------
>>   xen/arch/x86/mm/p2m-pt.c  | 58 ++++++++++++++++++++++++-------------
>>   xen/arch/x86/mm/p2m.c     |  9 ++++++
>>   xen/include/asm-x86/p2m.h | 26 ++++++++++++++++-
>>   6 files changed, 135 insertions(+), 48 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index 5bf3b6d..07a6c26 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain
>> *d, ioservid_t id,
>>
>>       spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
>>
>> +    if ( rc == 0 && flags == 0 )
>> +    {
>> +        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>> +            p2m_change_entry_type_global(d, p2m_ioreq_server,
>> p2m_ram_rw);
>> +    }
>> +
>>       return rc;
>>   }
>>
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index a57b385..4b591fe 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -187,6 +187,15 @@ out:
>>    */
>>   static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
>>   {
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +    /*
>> +     * Refuse to turn on global log-dirty mode if
>> +     * there are outstanding p2m_ioreq_server pages.
>> +     */
>> +    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
>> +        return -EBUSY;
>> +
>>       /* turn on PG_log_dirty bit in paging mode */
>>       paging_lock(d);
>>       d->arch.paging.mode |= PG_log_dirty;
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index cc1eb21..9f41067 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
>> unsigned long gfn)
>>               {
>>                   for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
>>                   {
>> +                    p2m_type_t nt;
>>                       e = atomic_read_ept_entry(&epte[i]);
>>                       if ( e.emt == MTRR_NUM_TYPES )
>>                           e.emt = 0;
>> @@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>                                                  _mfn(e.mfn), 0, &ipat,
>>                                                  e.sa_p2mt ==
>> p2m_mmio_direct);
>>                       e.ipat = ipat;
>> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
>> -                    {
>> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn +
>> i, gfn + i)
>> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
>> +                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn
>> + i);
>> +                    if ( nt != e.sa_p2mt ) {
>> +                         if ( e.sa_p2mt == p2m_ioreq_server )
>> +                         {
>> +                             ASSERT(p2m->ioreq.entry_count > 0);
>> +                             p2m->ioreq.entry_count--;
>> +                         }
>> +
>> +                         e.sa_p2mt = nt;
>>                            ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
>> e.access);
>>                       }
>>                       e.recalc = 0;
>> @@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain
>> *p2m, unsigned long gfn)
>>
>>                   if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>>                   {
>> -                     unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> -
>> -                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> -                                                    gfn | ~mask) )
>> -                     {
>> -                     case 0:
>> -                          e.sa_p2mt = p2m_ram_rw;
>> -                          e.recalc = 0;
>> -                          break;
>> -                     case 1:
>> -                          e.sa_p2mt = p2m_ram_logdirty;
>> -                          e.recalc = 0;
>> -                          break;
>> -                     default: /* Force split. */
>> -                          emt = -1;
>> -                          break;
>> -                     }
>> +                    unsigned long mask = ~0UL << (level *
>> EPT_TABLE_ORDER);
>> +
>> +                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
>> +                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
>> +                                                   gfn | ~mask) )
>> +                    {
>> +                    case 0:
>> +                        e.sa_p2mt = p2m_ram_rw;
>> +                        e.recalc = 0;
>> +                        break;
>> +                    case 1:
>> +                        e.sa_p2mt = p2m_ram_logdirty;
>> +                        e.recalc = 0;
>> +                        break;
>> +                    default: /* Force split. */
>> +                        emt = -1;
>> +                        break;
>> +                    }
> 
> 
> So here I guess only the "ASSERT(e.sa_p2mt != p2m_ioreq_server);" is
> new, right?

Yes, that's right -- the whole block was indented wrong; when I added
the ASSERT() my editor aligned it correctly, and I couldn't very well
leave it that way. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-04-07 13:56             ` George Dunlap
  2017-04-07 14:05               ` Yu Zhang
@ 2017-04-07 14:22               ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2017-04-07 14:22 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, Yu Zhang, zhiyuan.lv, JunNakajima

>>> On 07.04.17 at 15:56, <george.dunlap@citrix.com> wrote:

Most important thing first: The logic looks correct to me now.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -533,6 +533,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>              {
>                  for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
>                  {
> +                    p2m_type_t nt;
>                      e = atomic_read_ept_entry(&epte[i]);

Blank line between these two please.

> @@ -542,10 +543,15 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>                                                 _mfn(e.mfn), 0, &ipat,
>                                                 e.sa_p2mt == p2m_mmio_direct);
>                      e.ipat = ipat;
> -                    if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
> -                    {
> -                         e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
> -                                     ? p2m_ram_logdirty : p2m_ram_rw;
> +                    nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn + i);
> +                    if ( nt != e.sa_p2mt ) {

Brace on its own line please.

> +                         if ( e.sa_p2mt == p2m_ioreq_server )
> +                         {
> +                             ASSERT(p2m->ioreq.entry_count > 0);
> +                             p2m->ioreq.entry_count--;
> +                         }
> +
> +                         e.sa_p2mt = nt;
>                           ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
>                      }

This changes why ept_p2m_type_to_flags() is bypassed. I think
this is correct, but that wasn't immediately clear.

> @@ -562,23 +568,24 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
> 
>                  if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>                  {
> -                     unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
> -
> -                     switch ( p2m_is_logdirty_range(p2m, gfn & mask,
> -                                                    gfn | ~mask) )
> -                     {
> -                     case 0:
> -                          e.sa_p2mt = p2m_ram_rw;
> -                          e.recalc = 0;
> -                          break;
> -                     case 1:
> -                          e.sa_p2mt = p2m_ram_logdirty;
> -                          e.recalc = 0;
> -                          break;
> -                     default: /* Force split. */
> -                          emt = -1;
> -                          break;
> -                     }
> +                    unsigned long mask = ~0UL << (level * EPT_TABLE_ORDER);
> +
> +                    ASSERT(e.sa_p2mt != p2m_ioreq_server);
> +                    switch ( p2m_is_logdirty_range(p2m, gfn & mask,
> +                                                   gfn | ~mask) )
> +                    {
> +                    case 0:
> +                        e.sa_p2mt = p2m_ram_rw;
> +                        e.recalc = 0;
> +                        break;
> +                    case 1:
> +                        e.sa_p2mt = p2m_ram_logdirty;
> +                        e.recalc = 0;
> +                        break;
> +                    default: /* Force split. */
> +                        emt = -1;
> +                        break;
> +                    }
>                  }

So if I'm getting it right the change here really is just the addition of
an ASSERT() and re-indentation? I think you will want to also adjust
indentation in the previous hunk then. (I'm afraid it was me who
broke it back when introducing that code...)

> @@ -606,6 +616,8 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
> 
>      if ( page_order == PAGE_ORDER_4K )
>      {
> +        p2m_type_t p2mt_old;
> +
>          rc = p2m_next_level(p2m, &table, &gfn_remainder, gfn,
>                              L2_PAGETABLE_SHIFT - PAGE_SHIFT,
>                              L2_PAGETABLE_ENTRIES, PGT_l1_page_table, 1);
> @@ -629,6 +641,21 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          if ( entry_content.l1 != 0 )
>              p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags);
> 
> +        p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
> +
> +        /*
> +         * p2m_ioreq_server is only used for 4K pages, so
> +         * the count shall only be performed for level 1 entries.
> +         */
> +        if ( p2mt == p2m_ioreq_server )
> +            p2m->ioreq.entry_count++;
> +
> +        if ( p2mt_old == p2m_ioreq_server )
> +        {
> +            ASSERT(p2m->ioreq.entry_count > 0);
> +            p2m->ioreq.entry_count--;
> +        }
> +
>          /* level 1 entry */
>          p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>          /* NB: paging_write_p2m_entry() handles tlb flushes properly */
> @@ -726,15 +753,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>      return rc;
>  }

I would have hoped for the two ASSERT()s to be added for the
2M and 1G cases, which we did talk about.

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
> 
>  /* Types that can be subject to bulk transitions. */
>  #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
> -                              | p2m_to_mask(p2m_ram_logdirty) )
> +                              | p2m_to_mask(p2m_ram_logdirty) \
> +                              | p2m_to_mask(p2m_ioreq_server) )
> +
> +#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
> 
>  #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
> 
> @@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
>  #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
>  #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
>  #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
> +#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)

I don't think this and P2M_IOREQ_TYPES are needed anymore.

> @@ -744,6 +749,25 @@ static inline p2m_type_t p2m_flags_to_type(unsigned long flags)
>      return (flags >> 12) & 0x7f;
>  }
> 
> +static inline p2m_type_t p2m_recalc_type_range(bool_t recalc, p2m_type_t t,

Just "bool" please (also below).

> +                                               struct p2m_domain *p2m,
> +                                               unsigned long gfn_start,
> +                                               unsigned long gfn_end)
> +{
> +    if ( !recalc || !p2m_is_changeable(t) )
> +        return t;
> +    if ( t == p2m_ioreq_server && p2m->ioreq.server != NULL )
> +        return t;
> +    return p2m_is_logdirty_range(p2m, gfn_start, gfn_end) ? p2m_ram_logdirty
> +        : p2m_ram_rw;

Commonly we'd align the : with either the ? or the start of the
conditional expression.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-04-07 14:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 15:53 [PATCH v12 0/6] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2017-04-06 15:53 ` [PATCH v12 1/6] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2017-04-06 15:53 ` [PATCH v12 2/6] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2017-04-07  7:33   ` Tian, Kevin
2017-04-06 15:53 ` [PATCH v12 3/6] x86/ioreq server: Add device model wrappers for new DMOP Yu Zhang
2017-04-06 15:53 ` [PATCH v12 4/6] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2017-04-06 15:53 ` [PATCH v12 5/6] x86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
2017-04-06 16:45   ` George Dunlap
2017-04-07  7:34   ` Tian, Kevin
2017-04-07  9:40   ` Jan Beulich
2017-04-07  9:53     ` Yu Zhang
2017-04-07 10:22       ` George Dunlap
2017-04-07 10:22         ` Yu Zhang
2017-04-07 10:41           ` Jan Beulich
2017-04-07 10:26       ` Jan Beulich
2017-04-07 10:55         ` Yu Zhang
2017-04-07 11:31           ` Jan Beulich
2017-04-07 13:56             ` George Dunlap
2017-04-07 14:05               ` Yu Zhang
2017-04-07 14:22                 ` George Dunlap
2017-04-07 14:22               ` Jan Beulich
2017-04-07 10:14     ` Yu Zhang
2017-04-07 10:28       ` Jan Beulich
2017-04-07 10:28       ` George Dunlap
2017-04-07 10:50         ` Yu Zhang
2017-04-07 11:28           ` Jan Beulich
2017-04-07 12:17             ` Yu Zhang
2017-04-07 12:36               ` Jan Beulich
2017-04-06 15:53 ` [PATCH v12 6/6] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.