All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
@ 2017-03-08 15:33 Yu Zhang
  2017-03-08 15:33 ` [PATCH v7 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 15:33 UTC (permalink / raw)
  To: xen-devel; +Cc: zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to GPU 
I/O resources, e.g. the PPGTT(per-process graphic translation tables).
Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges
to be emulated. To select an ioreq server, the rangeset is searched to
see if the I/O range is recorded. However, number of ram pages to be
tracked may exceed the upper limit of rangeset.

Previously, one solution was proposed to refactor the rangeset, and 
extend its upper limit. However, after 12 rounds discussion, we have
decided to drop this approach due to security concerns. Now this new 
patch series introduces a new mem type, HVMMEM_ioreq_server, and added
hvm operations to let one ioreq server to claim its ownership of ram 
pages with this type. Accesses to a page of this type will be handled
by the specified ioreq server directly. 

Yu Zhang (5):
  x86/ioreq server: Release the p2m lock after mmio is handled.
  x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to
    an ioreq server.
  x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server
    pages.
  ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server
    entries.
  x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server
    entries when an ioreq server unmaps.

 xen/arch/x86/hvm/dm.c            |  79 +++++++++++++++++++++++--
 xen/arch/x86/hvm/emulate.c       |  99 ++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/hvm.c           |   8 +--
 xen/arch/x86/hvm/ioreq.c         |  46 +++++++++++++++
 xen/arch/x86/mm/hap/hap.c        |   9 +++
 xen/arch/x86/mm/hap/nested_hap.c |   2 +-
 xen/arch/x86/mm/p2m-ept.c        |  16 ++++-
 xen/arch/x86/mm/p2m-pt.c         |  33 ++++++++---
 xen/arch/x86/mm/p2m.c            | 123 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c   |   3 +-
 xen/include/asm-x86/hvm/ioreq.h  |   2 +
 xen/include/asm-x86/p2m.h        |  40 +++++++++++--
 xen/include/public/hvm/dm_op.h   |  29 ++++++++-
 xen/include/public/hvm/hvm_op.h  |   8 ++-
 14 files changed, 463 insertions(+), 34 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v7 1/5] x86/ioreq server: Release the p2m lock after mmio is handled.
  2017-03-08 15:33 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
@ 2017-03-08 15:33 ` Yu Zhang
  2017-03-08 15:33 ` [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 15:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

Routine hvmemul_do_io() may need to peek the p2m type of a gfn to
select the ioreq server. For example, operations on gfns with
p2m_ioreq_server type will be delivered to a corresponding ioreq
server, and this requires that the p2m type not be switched back
to p2m_ram_rw during the emulation process. To avoid this race
condition, we delay the release of p2m lock in hvm_hap_nested_page_fault()
until mmio is handled.

Note: previously in hvm_hap_nested_page_fault(), put_gfn() was moved
before the handling of mmio, due to a deadlock risk between the p2m
lock and the event lock(in commit 77b8dfe). Later, a per-event channel
lock was introduced in commit de6acb7, to send events. So we do not
need to worry about the deadlock issue.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ccfae4f..a9db7f7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1870,18 +1870,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
          (npfec.write_access &&
           (p2m_is_discard_write(p2mt) || (p2mt == p2m_ioreq_server))) )
     {
-        __put_gfn(p2m, gfn);
-        if ( ap2m_active )
-            __put_gfn(hostp2m, gfn);
-
         rc = 0;
         if ( unlikely(is_pvh_domain(currd)) )
-            goto out;
+            goto out_put_gfn;
 
         if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec) )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
         rc = 1;
-        goto out;
+        goto out_put_gfn;
     }
 
     /* Check if the page has been paged out */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-08 15:33 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2017-03-08 15:33 ` [PATCH v7 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
@ 2017-03-08 15:33 ` Yu Zhang
  2017-03-10 15:29   ` Jan Beulich
  2017-03-08 15:33 ` [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul Durrant, zhiyuan.lv, Jan Beulich

A new DMOP - XEN_DMOP_map_mem_type_to_ioreq_server, is added to
let one ioreq server claim/disclaim its responsibility for the
handling of guest pages with p2m type p2m_ioreq_server. Users
of this DMOP can specify which kind of operation is supposed
to be emulated in a parameter named flags. Currently, this DMOP
only support the emulation of write operations. And it can be
further extended to support the emulation of read ones if an
ioreq server has such requirement in the future.

For now, we only support one ioreq server for this p2m type, so
once an ioreq server has claimed its ownership, subsequent calls
of the XEN_DMOP_map_mem_type_to_ioreq_server will fail. Users can
also disclaim the ownership of guest ram pages with p2m_ioreq_server,
by triggering this new DMOP, with ioreq server id set to the current
owner's and flags parameter set to 0.

Note both XEN_DMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
are only supported for HVMs with HAP enabled.

Also note that only after one ioreq server claims its ownership
of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
be allowed.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Tim Deegan <tim@xen.org>

changes in v7: 
  - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
  - According to comments from George: removed domain_pause/unpause() in
    hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
    and we can avoid the: 
    a> deadlock between p2m lock and ioreq server lock by using these locks
       in the same order - solved in patch 4;
    b> for race condition between vm exit and ioreq server unbinding, we can
       just retry this instruction.
  - According to comments from Jan and George: continue to clarify logic in
    hvmemul_do_io().
  - According to comments from Jan: clarify comment in p2m_set_ioreq_server().

changes in v6: 
  - Clarify logic in hvmemul_do_io().
  - Use recursive lock for ioreq server lock.
  - Remove debug print when mapping ioreq server.
  - Clarify code in ept_p2m_type_to_flags() for consistency.
  - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
  - Add comments for HVMMEM_ioreq_server to note only changes
    to/from HVMMEM_ram_rw are permitted.
  - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
    to avoid the race condition when a vm exit happens on a write-
    protected page, just to find the ioreq server has been unmapped
    already.
  - Introduce a seperate patch to delay the release of p2m 
    lock to avoid the race condition.
  - Introduce a seperate patch to handle the read-modify-write
    operations on a write protected page.

changes in v5: 
  - Simplify logic in hvmemul_do_io().
  - Use natual width types instead of fixed width types when possible.
  - Do not grant executable permission for p2m_ioreq_server entries.
  - Clarify comments and commit message.
  - Introduce a seperate patch to recalculate the p2m types after
    the ioreq server unmaps the p2m_ioreq_server.

changes in v4:
  - According to Paul's advice, add comments around the definition
    of HVMMEM_iore_server in hvm_op.h.
  - According to Wei Liu's comments, change the format of the commit
    message.

changes in v3:
  - Only support write emulation in this patch;
  - Remove the code to handle race condition in hvmemul_do_io(),
  - No need to reset the p2m type after an ioreq server has disclaimed
    its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server after an ioreq
    server has claimed its ownership of p2m_ioreq_server;
  - Only allow p2m type change to p2m_ioreq_server from pages with type
    p2m_ram_rw, and vice versa;
  - HVMOP_map_mem_type_to_ioreq_server interface change - use uint16,
    instead of enum to specify the memory type;
  - Function prototype change to p2m_get_ioreq_server();
  - Coding style changes;
  - Commit message changes;
  - Add Tim's Acked-by.

changes in v2:
  - Only support HAP enabled HVMs;
  - Replace p2m_mem_type_changed() with p2m_change_entry_type_global()
    to reset the p2m type, when an ioreq server tries to claim/disclaim
    its ownership of p2m_ioreq_server;
  - Comments changes.
---
 xen/arch/x86/hvm/dm.c            | 40 ++++++++++++++++++++--
 xen/arch/x86/hvm/emulate.c       | 64 ++++++++++++++++++++++++++++++++--
 xen/arch/x86/hvm/ioreq.c         | 38 +++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c |  2 +-
 xen/arch/x86/mm/p2m-ept.c        |  8 ++++-
 xen/arch/x86/mm/p2m-pt.c         | 20 +++++++----
 xen/arch/x86/mm/p2m.c            | 74 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c   |  3 +-
 xen/include/asm-x86/hvm/ioreq.h  |  2 ++
 xen/include/asm-x86/p2m.h        | 26 ++++++++++++--
 xen/include/public/hvm/dm_op.h   | 26 ++++++++++++++
 xen/include/public/hvm/hvm_op.h  |  8 ++++-
 12 files changed, 292 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 2122c45..f97478b 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -173,9 +173,14 @@ static int modified_memory(struct domain *d,
 
 static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new)
 {
+    if ( new == p2m_ioreq_server )
+        return old == p2m_ram_rw;
+
+    if ( old == p2m_ioreq_server )
+        return new == p2m_ram_rw;
+
     return p2m_is_ram(old) ||
-           (p2m_is_hole(old) && new == p2m_mmio_dm) ||
-           (old == p2m_ioreq_server && new == p2m_ram_rw);
+           (p2m_is_hole(old) && new == p2m_mmio_dm);
 }
 
 static int set_mem_type(struct domain *d,
@@ -202,6 +207,19 @@ static int set_mem_type(struct domain *d,
          unlikely(data->mem_type == HVMMEM_unused) )
         return -EINVAL;
 
+    if ( data->mem_type  == HVMMEM_ioreq_server )
+    {
+        unsigned int flags;
+
+        /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */
+        if ( !hap_enabled(d) )
+            return -EINVAL;
+
+        /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */
+        if ( !p2m_get_ioreq_server(d, &flags) )
+            return -EINVAL;
+    }
+
     while ( iter < data->nr )
     {
         unsigned long pfn = data->first_pfn + iter;
@@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
         break;
     }
 
+    case XEN_DMOP_map_mem_type_to_ioreq_server:
+    {
+        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
+            &op.u.map_mem_type_to_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        /* Only support for HAP enabled hvm. */
+        if ( !hap_enabled(d) )
+            break;
+
+        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                              data->type, data->flags);
+        break;
+    }
+
     case XEN_DMOP_set_ioreq_server_state:
     {
         const struct xen_dm_op_set_ioreq_server_state *data =
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 1c66010..fb56f7b 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -99,6 +99,7 @@ static int hvmemul_do_io(
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
     struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
@@ -140,7 +141,7 @@ static int hvmemul_do_io(
              (p.dir != dir) ||
              (p.df != df) ||
              (p.data_is_ptr != data_is_addr) )
-            domain_crash(curr->domain);
+            domain_crash(currd);
 
         if ( data_is_addr )
             return X86EMUL_UNHANDLEABLE;
@@ -177,8 +178,65 @@ static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
     {
-        struct hvm_ioreq_server *s =
-            hvm_select_ioreq_server(curr->domain, &p);
+        /*
+         * Xen isn't emulating the instruction internally, so see if
+         * there's an ioreq server that can handle it. Rules:
+         *
+         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
+         * to choose the ioreq server by range. If no server is found,
+         * the access is ignored.
+         *
+         * - p2m_ioreq_server accesses are handled by the current
+         * ioreq_server for the domain, but there are some corner
+         * cases:
+         *
+         *   - If the domain ioreq_server is NULL, assume this is a
+         *   race between the unbinding of ioreq server and guest fault
+         *   so re-try the instruction.
+         *
+         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
+         *   like a normal PIO or MMIO that doesn't have an ioreq
+         *   server (i.e., by ignoring it).
+         */
+        struct hvm_ioreq_server *s = NULL;
+        p2m_type_t p2mt = p2m_invalid;
+
+        if ( is_mmio )
+        {
+            unsigned long gmfn = paddr_to_pfn(addr);
+
+            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
+
+            if ( p2mt == p2m_ioreq_server )
+            {
+                unsigned int flags;
+
+                s = p2m_get_ioreq_server(currd, &flags);
+
+                /*
+                 * If p2mt is ioreq_server but ioreq_server is NULL,
+                 * we probably lost a race with unbinding of ioreq
+                 * server, just retry the access.
+                 */
+                if ( s == NULL )
+                {
+                    rc = X86EMUL_RETRY;
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                /*
+                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
+                 * we should set s to NULL, and just ignore such
+                 * access.
+                 */
+                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
+                    s = NULL;
+            }
+        }
+
+        if ( !s && p2mt != p2m_ioreq_server )
+            s = hvm_select_ioreq_server(currd, &p);
 
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index ebb3eca..fcb9f38 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
         domain_pause(d);
 
+        p2m_destroy_ioreq_server(d, s);
+
         hvm_ioreq_server_disable(s, 0);
 
         list_del(&s->list_entry);
@@ -914,6 +916,42 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    /* For now, only HVMMEM_ioreq_server is supported. */
+    if ( type != HVMMEM_ioreq_server )
+        return -EINVAL;
+
+    /* For now, only write emulation is supported. */
+    if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            rc = p2m_set_ioreq_server(d, flags, s);
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled)
 {
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 162afed..408ea7f 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
     if ( *p2mt == p2m_mmio_direct )
         goto direct_mmio_out;
     rc = NESTEDHVM_PAGEFAULT_MMIO;
-    if ( *p2mt == p2m_mmio_dm )
+    if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server )
         goto out;
 
     rc = NESTEDHVM_PAGEFAULT_L0_ERROR;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 568944f..cc1eb21 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->r = entry->w = entry->x = 1;
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
+        case p2m_ioreq_server:
+            entry->r = 1;
+            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
+            entry->x = 0;
+            entry->a = !!cpu_has_vmx_ept_ad;
+            entry->d = entry->w && entry->a;
+            break;
         case p2m_mmio_direct:
             entry->r = entry->x = 1;
             entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
@@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
             entry->a = entry->d = !!cpu_has_vmx_ept_ad;
             break;
         case p2m_grant_map_ro:
-        case p2m_ioreq_server:
             entry->r = 1;
             entry->w = entry->x = 0;
             entry->a = !!cpu_has_vmx_ept_ad;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index bbfa54e..97dc25d 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -70,7 +70,9 @@ static const unsigned long pgt[] = {
     PGT_l3_page_table
 };
 
-static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
+static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m,
+                                       p2m_type_t t,
+                                       mfn_t mfn,
                                        unsigned int level)
 {
     unsigned long flags;
@@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
     default:
         return flags | _PAGE_NX_BIT;
     case p2m_grant_map_ro:
-    case p2m_ioreq_server:
         return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
+    case p2m_ioreq_server:
+        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
+        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
+            return flags & ~_PAGE_RW;
+        else
+            return flags;
     case p2m_ram_ro:
     case p2m_ram_logdirty:
     case p2m_ram_shared:
@@ -440,7 +447,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
                               ? p2m_ram_logdirty : p2m_ram_rw;
             unsigned long mfn = l1e_get_pfn(e);
-            unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level);
+            unsigned long flags = p2m_type_to_flags(p2m, p2mt,
+                                                    _mfn(mfn), level);
 
             if ( level )
             {
@@ -578,7 +586,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
             ? l3e_from_pfn(mfn_x(mfn),
-                           p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE)
+                           p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
 
@@ -615,7 +623,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             entry_content = p2m_l1e_from_pfn(mfn_x(mfn),
-                                             p2m_type_to_flags(p2mt, mfn, 0));
+                                         p2m_type_to_flags(p2m, p2mt, mfn, 0));
         else
             entry_content = l1e_empty();
 
@@ -652,7 +660,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt, mfn, 1) |
+                                       p2m_type_to_flags(p2m, p2mt, mfn, 1) |
                                        _PAGE_PSE);
         else
             l2e_content = l2e_empty();
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 2eee9cd..0edfc61 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     else
         p2m_pt_init(p2m);
 
+    spin_lock_init(&p2m->ioreq.lock);
+
     return ret;
 }
 
@@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain *d)
     }
 }
 
+int p2m_set_ioreq_server(struct domain *d,
+                         unsigned int flags,
+                         struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int rc;
+
+    /*
+     * Use lock to prevent concurrent setting attempts
+     * from multiple ioreq serers.
+     */
+    spin_lock(&p2m->ioreq.lock);
+
+    /* Unmap ioreq server from p2m type by passing flags with 0. */
+    if ( flags == 0 )
+    {
+        rc = -EINVAL;
+        if ( p2m->ioreq.server != s )
+            goto out;
+
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+    else
+    {
+        rc = -EBUSY;
+        if ( p2m->ioreq.server != NULL )
+            goto out;
+
+        p2m->ioreq.server = s;
+        p2m->ioreq.flags = flags;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock(&p2m->ioreq.lock);
+
+    return rc;
+}
+
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    struct hvm_ioreq_server *s;
+
+    spin_lock(&p2m->ioreq.lock);
+
+    s = p2m->ioreq.server;
+    *flags = p2m->ioreq.flags;
+
+    spin_unlock(&p2m->ioreq.lock);
+    return s;
+}
+
+void p2m_destroy_ioreq_server(const struct domain *d,
+                              const struct hvm_ioreq_server *s)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    spin_lock(&p2m->ioreq.lock);
+
+    if ( p2m->ioreq.server == s )
+    {
+        p2m->ioreq.server = NULL;
+        p2m->ioreq.flags = 0;
+    }
+
+    spin_unlock(&p2m->ioreq.lock);
+}
+
 void p2m_enable_hardware_log_dirty(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 128809d..7414060 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3211,8 +3211,7 @@ static int sh_page_fault(struct vcpu *v,
     }
 
     /* Need to hand off device-model MMIO to the device model */
-    if ( p2mt == p2m_mmio_dm
-         || (p2mt == p2m_ioreq_server && ft == ft_demand_write) )
+    if ( p2mt == p2m_mmio_dm )
     {
         gpa = guest_walk_to_gpa(&gw);
         goto mmio;
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index fbf2c74..b43667a 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
 int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
                                          uint32_t type, uint64_t start,
                                          uint64_t end);
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags);
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool_t enabled);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 470d29d..3786680 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t;
                        | p2m_to_mask(p2m_ram_paging_out)      \
                        | p2m_to_mask(p2m_ram_paged)           \
                        | p2m_to_mask(p2m_ram_paging_in)       \
-                       | p2m_to_mask(p2m_ram_shared))
+                       | p2m_to_mask(p2m_ram_shared)          \
+                       | p2m_to_mask(p2m_ioreq_server))
 
 /* Types that represent a physmap hole that is ok to replace with a shared
  * entry */
@@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t;
 #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty)     \
                       | p2m_to_mask(p2m_ram_ro)         \
                       | p2m_to_mask(p2m_grant_map_ro)   \
-                      | p2m_to_mask(p2m_ram_shared)     \
-                      | p2m_to_mask(p2m_ioreq_server))
+                      | p2m_to_mask(p2m_ram_shared))
 
 /* Write-discard types, which should discard the write operations */
 #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro)     \
@@ -336,6 +336,20 @@ struct p2m_domain {
         struct ept_data ept;
         /* NPT-equivalent structure could be added here. */
     };
+
+     struct {
+         spinlock_t lock;
+         /*
+          * ioreq server who's responsible for the emulation of
+          * gfns with specific p2m type(for now, p2m_ioreq_server).
+          */
+         struct hvm_ioreq_server *server;
+         /*
+          * flags specifies whether read, write or both operations
+          * are to be emulated by an ioreq server.
+          */
+         unsigned int flags;
+     } ioreq;
 };
 
 /* get host p2m table */
@@ -827,6 +841,12 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
     return flags;
 }
 
+int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
+                         struct hvm_ioreq_server *s);
+struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                              unsigned int *flags);
+void p2m_destroy_ioreq_server(const struct domain *d, const struct hvm_ioreq_server *s);
+
 #endif /* _XEN_ASM_X86_P2M_H */
 
 /*
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index f54cece..c643b67 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi {
     uint64_aligned_t addr;
 };
 
+/*
+ * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
+ *                                      to specific memroy type <type>
+ *                                      for specific accesses <flags>
+ *
+ * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
+ * which means only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in future.
+ */
+#define XEN_DMOP_map_mem_type_to_ioreq_server 15
+
+struct xen_dm_op_map_mem_type_to_ioreq_server {
+    ioservid_t id;      /* IN - ioreq server id */
+    uint16_t type;      /* IN - memory type */
+    uint16_t pad;
+    uint32_t flags;     /* IN - types of accesses to be forwarded to the
+                           ioreq server. flags with 0 means to unmap the
+                           ioreq server */
+
+#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0)
+#define XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1)
+};
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -336,6 +360,8 @@ struct xen_dm_op {
         struct xen_dm_op_set_mem_type set_mem_type;
         struct xen_dm_op_inject_event inject_event;
         struct xen_dm_op_inject_msi inject_msi;
+        struct xen_dm_op_map_mem_type_to_ioreq_server
+                map_mem_type_to_ioreq_server;
     } u;
 };
 
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index bc00ef0..0bdafdf 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -93,7 +93,13 @@ typedef enum {
     HVMMEM_unused,             /* Placeholder; setting memory to this type
                                   will fail for code after 4.7.0 */
 #endif
-    HVMMEM_ioreq_server
+    HVMMEM_ioreq_server        /* Memory type claimed by an ioreq server; type
+                                  changes to this value are only allowed after
+                                  an ioreq server has claimed its ownership.
+                                  Only pages with HVMMEM_ram_rw are allowed to
+                                  change to this type; conversely, pages with
+                                  this type are only allowed to be changed back
+                                  to HVMMEM_ram_rw. */
 } hvmmem_type_t;
 
 /* Hint from PV drivers for pagetable destruction. */
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-08 15:33 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
  2017-03-08 15:33 ` [PATCH v7 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
  2017-03-08 15:33 ` [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2017-03-08 15:33 ` Yu Zhang
  2017-03-10 15:33   ` Jan Beulich
  2017-03-08 15:33 ` [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
  2017-03-08 15:33 ` [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  4 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 15:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

In ept_handle_violation(), write violations are also treated as
read violations. And when a VM is accessing a write-protected
address with read-modify-write instructions, the read emulation
process is triggered first.

For p2m_ioreq_server pages, current ioreq server only forwards
the write operations to the device model. Therefore when such page
is being accessed by a read-modify-write instruction, the read
operations should be emulated here in hypervisor. This patch provides
such a handler to copy the data to the buffer.

Note: MMIOs with p2m_mmio_dm type do not need such special treatment
because both reads and writes will go to the device mode.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

changes in v2: 
  - According to comments from Jan: rename mem_ops to ioreq_server_ops.
  - According to comments from Jan: use hvm_copy_from_guest_phys() in
    ioreq_server_read(), instead of do it by myself.
---
 xen/arch/x86/hvm/emulate.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index fb56f7b..9744dcb 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -94,6 +94,26 @@ static const struct hvm_io_handler null_handler = {
     .ops = &null_ops
 };
 
+static int ioreq_server_read(const struct hvm_io_handler *io_handler,
+                    uint64_t addr,
+                    uint32_t size,
+                    uint64_t *data)
+{
+    if ( hvm_copy_from_guest_phys(data, addr, size) != HVMCOPY_okay )
+        return X86EMUL_UNHANDLEABLE;
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_io_ops ioreq_server_ops = {
+    .read = ioreq_server_read,
+    .write = null_write
+};
+
+static const struct hvm_io_handler ioreq_server_handler = {
+    .ops = &ioreq_server_ops
+};
+
 static int hvmemul_do_io(
     bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
     uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
@@ -197,6 +217,10 @@ static int hvmemul_do_io(
          *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
          *   like a normal PIO or MMIO that doesn't have an ioreq
          *   server (i.e., by ignoring it).
+         *
+         *   - If the accesss is a read, this could be part of a
+         *   read-modify-write instruction, emulate the read so that we
+         *   have it.
          */
         struct hvm_ioreq_server *s = NULL;
         p2m_type_t p2mt = p2m_invalid;
@@ -226,6 +250,17 @@ static int hvmemul_do_io(
                 }
 
                 /*
+                 * This is part of a read-modify-write instruction.
+                 * Emulate the read part so we have the value cached.
+                 */
+                if ( dir == IOREQ_READ )
+                {
+                    rc = hvm_process_io_intercept(&ioreq_server_handler, &p);
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                /*
                  * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
                  * we should set s to NULL, and just ignore such
                  * access.
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-08 15:33 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (2 preceding siblings ...)
  2017-03-08 15:33 ` [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
@ 2017-03-08 15:33 ` Yu Zhang
  2017-03-10 16:03   ` Jan Beulich
  2017-03-08 15:33 ` [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  4 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Jun Nakajima, George Dunlap, Andrew Cooper,
	Paul Durrant, zhiyuan.lv, Jan Beulich

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
asynchronously with the current p2m_change_entry_type_global()
interface.

This patch also disallows live migration, when there's still any
outstanding p2m_ioreq_server entry left. The core reason is our
current implementation of p2m_change_entry_type_global() can not
tell the state of p2m_ioreq_server entries(can not decide if an
entry is to be emulated or to be resynced).

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

changes in v3: 
  - Move the synchronously resetting logic into patch 5.
  - According to comments from Jan: introduce p2m_check_changeable()
    to clarify the p2m type change code.
  - According to comments from George: use locks in the same order
    to avoid deadlock, call p2m_change_entry_type_global() after unmap
    of the ioreq server is finished.

changes in v2: 
  - Move the calculation of ioreq server page entry_cout into 
    p2m_change_type_one() so that we do not need a seperate lock.
    Note: entry_count is also calculated in resolve_misconfig()/
    do_recalc(), fortunately callers of both routines have p2m
    lock protected already.
  - Simplify logic in hvmop_set_mem_type().
  - Introduce routine p2m_finish_type_change() to walk the p2m
    table and do the p2m reset.
---
 xen/arch/x86/hvm/ioreq.c  |  8 ++++++++
 xen/arch/x86/mm/hap/hap.c |  9 +++++++++
 xen/arch/x86/mm/p2m-ept.c |  8 +++++++-
 xen/arch/x86/mm/p2m-pt.c  | 13 +++++++++++--
 xen/arch/x86/mm/p2m.c     | 20 ++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  9 ++++++++-
 6 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index fcb9f38..c129eb4 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -949,6 +949,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
 
     spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
 
+    if ( rc == 0 && flags == 0 )
+    {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a57b385..f27a56f 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -187,6 +187,15 @@ out:
  */
 static int hap_enable_log_dirty(struct domain *d, bool_t log_global)
 {
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    /*
+    * Refuse to turn on global log-dirty mode if
+    * there's outstanding p2m_ioreq_server pages.
+    */
+    if ( log_global && read_atomic(&p2m->ioreq.entry_count) )
+        return -EBUSY;
+
     /* turn on PG_log_dirty bit in paging mode */
     paging_lock(d);
     d->arch.paging.mode |= PG_log_dirty;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index cc1eb21..1df3d09 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                     e.ipat = ipat;
                     if ( e.recalc && p2m_is_changeable(e.sa_p2mt) )
                     {
+                         if ( e.sa_p2mt == p2m_ioreq_server )
+                         {
+                             p2m->ioreq.entry_count--;
+                             ASSERT(p2m->ioreq.entry_count >= 0);
+                         }
+
                          e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i)
                                      ? p2m_ram_logdirty : p2m_ram_rw;
                          ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
@@ -965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
     if ( is_epte_valid(ept_entry) )
     {
         if ( (recalc || ept_entry->recalc) &&
-             p2m_is_changeable(ept_entry->sa_p2mt) )
+             p2m_check_changeable(ept_entry->sa_p2mt) )
             *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
                                                       : p2m_ram_rw;
         else
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 97dc25d..d9a7b76 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -437,11 +437,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
          needs_recalc(l1, *pent) )
     {
         l1_pgentry_t e = *pent;
+        p2m_type_t p2mt_old;
 
         if ( !valid_recalc(l1, e) )
             P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n",
                       p2m->domain->domain_id, gfn, level);
-        if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) )
+        p2mt_old = p2m_flags_to_type(l1e_get_flags(e));
+        if ( p2m_is_changeable(p2mt_old) )
         {
             unsigned long mask = ~0UL << (level * PAGETABLE_ORDER);
             p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask)
@@ -461,6 +463,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                      mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
+
+            if ( p2mt_old == p2m_ioreq_server )
+            {
+                p2m->ioreq.entry_count--;
+                ASSERT(p2m->ioreq.entry_count >= 0);
+            }
+
             e = l1e_from_pfn(mfn, flags);
             p2m_add_iommu_flags(&e, level,
                                 (p2mt == p2m_ram_rw)
@@ -730,7 +739,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
                                      struct p2m_domain *p2m, unsigned long gfn)
 {
-    if ( !recalc || !p2m_is_changeable(t) )
+    if ( !recalc || !p2m_check_changeable(t) )
         return t;
     return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty
                                                 : p2m_ram_rw;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 0edfc61..94d7141 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -954,6 +954,26 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
                          p2m->default_access)
          : -EBUSY;
 
+    if ( !rc )
+    {
+        switch ( nt )
+        {
+        case p2m_ram_rw:
+            if ( ot == p2m_ioreq_server )
+            {
+                p2m->ioreq.entry_count--;
+                ASSERT(p2m->ioreq.entry_count >= 0);
+            }
+            break;
+        case p2m_ioreq_server:
+            if ( ot == p2m_ram_rw )
+                p2m->ioreq.entry_count++;
+            break;
+        default:
+            break;
+        }
+    }
+
     gfn_unlock(p2m, gfn, 0);
 
     return rc;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 3786680..395f125 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t;
 
 /* Types that can be subject to bulk transitions. */
 #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \
-                              | p2m_to_mask(p2m_ram_logdirty) )
+                              | p2m_to_mask(p2m_ram_logdirty) \
+                              | p2m_to_mask(p2m_ioreq_server) )
+
+#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server))
 
 #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand))
 
@@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES)
 #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES)
 #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES)
+#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES)
 #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES)
 #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES)
 /* Grant types are *not* considered valid, because they can be
@@ -178,6 +182,8 @@ typedef unsigned int p2m_query_t;
 
 #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) & P2M_INVALID_MFN_TYPES)
 
+#define p2m_check_changeable(t) (p2m_is_changeable(t) && !p2m_is_ioreq(t))
+
 typedef enum {
     p2m_host,
     p2m_nested,
@@ -349,6 +355,7 @@ struct p2m_domain {
           * are to be emulated by an ioreq server.
           */
          unsigned int flags;
+         long entry_count;
      } ioreq;
 };
 
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-08 15:33 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
                   ` (3 preceding siblings ...)
  2017-03-08 15:33 ` [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
@ 2017-03-08 15:33 ` Yu Zhang
  2017-03-10 16:17   ` Jan Beulich
  2017-03-10 16:59   ` Andrew Cooper
  4 siblings, 2 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 15:33 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, Jan Beulich

After an ioreq server has unmapped, the remaining p2m_ioreq_server
entries need to be reset back to p2m_ram_rw. This patch does this
synchronously by iterating the p2m table.

The synchronous resetting is necessary because we need to guarantee
the p2m table is clean before another ioreq server is mapped. And
since the sweeping of p2m table could be time consuming, it is done
with hypercall continuation.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

changes in v1: 
  - This patch is splitted from patch 4 of last version.
  - According to comments from Jan: update the gfn_start for
    when use hypercall continuation to reset the p2m type.
  - According to comments from Jan: use min() to compare gfn_end
    and max mapped pfn in p2m_finish_type_change()
---
 xen/arch/x86/hvm/dm.c          | 43 +++++++++++++++++++++++++++++++++++++-----
 xen/arch/x86/mm/p2m.c          | 29 ++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h      |  5 +++++
 xen/include/public/hvm/dm_op.h |  3 +--
 4 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index f97478b..a92d5d7 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
     return 0;
 }
 
+#define DMOP_op_mask 0xff
 static int dm_op(domid_t domid,
                  unsigned int nr_bufs,
                  xen_dm_op_buf_t bufs[])
@@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
     }
 
     rc = -EINVAL;
-    if ( op.pad )
-        goto out;
 
-    switch ( op.op )
+    switch ( op.op & DMOP_op_mask )
     {
     case XEN_DMOP_create_ioreq_server:
     {
@@ -387,6 +386,10 @@ static int dm_op(domid_t domid,
     {
         const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
             &op.u.map_mem_type_to_ioreq_server;
+        unsigned long gfn_start = op.op & ~DMOP_op_mask;
+        unsigned long gfn_end;
+
+        const_op = false;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -396,8 +399,38 @@ static int dm_op(domid_t domid,
         if ( !hap_enabled(d) )
             break;
 
-        rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
-                                              data->type, data->flags);
+        if ( gfn_start == 0 )
+            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
+                                                  data->type, data->flags);
+        /*
+         * Iterate p2m table when an ioreq server unmaps from p2m_ioreq_server,
+         * and reset the remaining p2m_ioreq_server entries back to p2m_ram_rw.
+         */
+        if ( (gfn_start > 0) || (data->flags == 0 && rc == 0) )
+        {
+            struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+            while ( read_atomic(&p2m->ioreq.entry_count) &&
+                    gfn_start <= p2m->max_mapped_pfn )
+            {
+                gfn_end = gfn_start + DMOP_op_mask;
+
+                p2m_finish_type_change(d, gfn_start, gfn_end,
+                                       p2m_ioreq_server, p2m_ram_rw);
+
+                gfn_start = gfn_end + 1;
+
+                /* Check for continuation if it's not the last iteration. */
+                if ( gfn_start <= p2m->max_mapped_pfn &&
+                     hypercall_preempt_check() )
+                {
+                    rc = -ERESTART;
+                    op.op |= gfn_start;
+                    break;
+                }
+            }
+        }
+
         break;
     }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 94d7141..9a81f00 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1038,6 +1038,35 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
+/* Synchronously modify the p2m type of a range of gfns from ot to nt. */
+void p2m_finish_type_change(struct domain *d,
+                            unsigned long start, unsigned long end,
+                            p2m_type_t ot, p2m_type_t nt)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    p2m_type_t t;
+    unsigned long gfn = start;
+
+    ASSERT(start <= end);
+    ASSERT(ot != nt);
+    ASSERT(p2m_is_changeable(ot) && p2m_is_changeable(nt));
+
+    p2m_lock(p2m);
+
+    end = min(end, p2m->max_mapped_pfn);
+    while ( gfn <= end )
+    {
+        get_gfn_query_unlocked(d, gfn, &t);
+
+        if ( t == ot )
+            p2m_change_type_one(d, gfn, t, nt);
+
+        gfn++;
+    }
+
+    p2m_unlock(p2m);
+}
+
 /*
  * Returns:
  *    0              for success
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 395f125..3eadd89 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
 int p2m_change_type_one(struct domain *d, unsigned long gfn,
                         p2m_type_t ot, p2m_type_t nt);
 
+/* Synchronously change types across a range of p2m entries (start ... end) */
+void p2m_finish_type_change(struct domain *d,
+                            unsigned long start, unsigned long end,
+                            p2m_type_t ot, p2m_type_t nt);
+
 /* Report a change affecting memory types. */
 void p2m_memory_type_changed(struct domain *d);
 
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index c643b67..23b364b 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -343,8 +343,7 @@ struct xen_dm_op_map_mem_type_to_ioreq_server {
 };
 
 struct xen_dm_op {
-    uint32_t op;
-    uint32_t pad;
+    uint64_t op;
     union {
         struct xen_dm_op_create_ioreq_server create_ioreq_server;
         struct xen_dm_op_get_ioreq_server_info get_ioreq_server_info;
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-08 15:33 ` [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
@ 2017-03-10 15:29   ` Jan Beulich
  2017-03-11  8:42     ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-10 15:29 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
> changes in v7: 
>   - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
>   - According to comments from George: removed domain_pause/unpause() in
>     hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>     and we can avoid the: 
>     a> deadlock between p2m lock and ioreq server lock by using these locks
>        in the same order - solved in patch 4;

That is, until patch 4 there is deadlock potential? I think you want to
re-order the patches if so. Or was it that the type can't really be used
until the last patch of the series? (I'm sorry, it's been quite a while
since the previous version.)

> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
>          break;
>      }
>  
> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
> +    {
> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> +            &op.u.map_mem_type_to_ioreq_server;
> +
> +        rc = -EINVAL;
> +        if ( data->pad )
> +            break;
> +
> +        /* Only support for HAP enabled hvm. */
> +        if ( !hap_enabled(d) )
> +            break;

Perhaps better to give an error other than -EINVAL in this case?
If so, then the same error should likely also be used in your
set_mem_type() addition.

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
>      uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
>  {
>      struct vcpu *curr = current;
> +    struct domain *currd = curr->domain;
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      ioreq_t p = {
>          .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
> @@ -140,7 +141,7 @@ static int hvmemul_do_io(
>               (p.dir != dir) ||
>               (p.df != df) ||
>               (p.data_is_ptr != data_is_addr) )
> -            domain_crash(curr->domain);
> +            domain_crash(currd);

If you mean to do this transformation here, then please do so
consistently for the entire function.

> @@ -177,8 +178,65 @@ static int hvmemul_do_io(
>          break;
>      case X86EMUL_UNHANDLEABLE:
>      {
> -        struct hvm_ioreq_server *s =
> -            hvm_select_ioreq_server(curr->domain, &p);
> +        /*
> +         * Xen isn't emulating the instruction internally, so see if
> +         * there's an ioreq server that can handle it. Rules:
> +         *
> +         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
> +         * to choose the ioreq server by range. If no server is found,
> +         * the access is ignored.
> +         *
> +         * - p2m_ioreq_server accesses are handled by the current
> +         * ioreq_server for the domain, but there are some corner
> +         * cases:

Who or what is "the current ioreq_server for the domain"?

> +         *   - If the domain ioreq_server is NULL, assume this is a
> +         *   race between the unbinding of ioreq server and guest fault
> +         *   so re-try the instruction.
> +         *
> +         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
> +         *   like a normal PIO or MMIO that doesn't have an ioreq
> +         *   server (i.e., by ignoring it).
> +         */
> +        struct hvm_ioreq_server *s = NULL;
> +        p2m_type_t p2mt = p2m_invalid;
> +
> +        if ( is_mmio )
> +        {
> +            unsigned long gmfn = paddr_to_pfn(addr);
> +
> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);

Stray cast.

> +            if ( p2mt == p2m_ioreq_server )
> +            {
> +                unsigned int flags;
> +
> +                s = p2m_get_ioreq_server(currd, &flags);
> +
> +                /*
> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
> +                 * we probably lost a race with unbinding of ioreq
> +                 * server, just retry the access.
> +                 */
> +                if ( s == NULL )
> +                {
> +                    rc = X86EMUL_RETRY;
> +                    vio->io_req.state = STATE_IOREQ_NONE;
> +                    break;
> +                }
> +
> +                /*
> +                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
> +                 * we should set s to NULL, and just ignore such
> +                 * access.
> +                 */
> +                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
> +                    s = NULL;

What is this about? You only allow WRITE registrations, so this looks
to be dead code. Yet if it is meant to guard against future enabling
of READ, then this clearly should not be done for reads.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>              entry->r = entry->w = entry->x = 1;
>              entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>              break;
> +        case p2m_ioreq_server:
> +            entry->r = 1;
> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);

Along the lines of the previous comment - if you mean to have the
code cope with READ, please also set ->r accordingly, or add a
comment why this won't have the intended effect (yielding a not
present EPTE).

> @@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>      default:
>          return flags | _PAGE_NX_BIT;
>      case p2m_grant_map_ro:
> -    case p2m_ioreq_server:
>          return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
> +    case p2m_ioreq_server:
> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
> +            return flags & ~_PAGE_RW;
> +        else
> +            return flags;

Stray else. But even better would imo be

        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
            flags &= ~_PAGE_RW;
        return flags;

> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> +                                              unsigned int *flags)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    struct hvm_ioreq_server *s;
> +
> +    spin_lock(&p2m->ioreq.lock);
> +
> +    s = p2m->ioreq.server;
> +    *flags = p2m->ioreq.flags;
> +
> +    spin_unlock(&p2m->ioreq.lock);
> +    return s;
> +}

I'm afraid this question was asked before, but since there's no
comment here or anywhere, I can't recall if there was a reason why
s potentially being stale by the time the caller looks at it is not a
problem.

> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi {
>      uint64_aligned_t addr;
>  };
>  
> +/*
> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
> + *                                      to specific memroy type <type>
> + *                                      for specific accesses <flags>
> + *
> + * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
> + * which means only write operations are to be forwarded to an ioreq server.
> + * Support for the emulation of read operations can be added when an ioreq
> + * server has such requirement in future.
> + */
> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
> +
> +struct xen_dm_op_map_mem_type_to_ioreq_server {
> +    ioservid_t id;      /* IN - ioreq server id */
> +    uint16_t type;      /* IN - memory type */
> +    uint16_t pad;

Perhaps there was padding needed when this was a hvmop, but
now the padding does exactly the wrong thing.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-08 15:33 ` [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
@ 2017-03-10 15:33   ` Jan Beulich
  2017-03-11  8:42     ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-10 15:33 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
> @@ -197,6 +217,10 @@ static int hvmemul_do_io(
>           *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
>           *   like a normal PIO or MMIO that doesn't have an ioreq
>           *   server (i.e., by ignoring it).
> +         *
> +         *   - If the accesss is a read, this could be part of a
> +         *   read-modify-write instruction, emulate the read so that we
> +         *   have it.

"it" being what here? Grammatically the insn, but we don't care
about "having" the insn.

> @@ -226,6 +250,17 @@ static int hvmemul_do_io(
>                  }
>  
>                  /*
> +                 * This is part of a read-modify-write instruction.

"is" or "may be"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-08 15:33 ` [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
@ 2017-03-10 16:03   ` Jan Beulich
  2017-03-11  8:42     ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-10 16:03 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -954,6 +954,26 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
>                           p2m->default_access)
>           : -EBUSY;
>  
> +    if ( !rc )
> +    {
> +        switch ( nt )
> +        {
> +        case p2m_ram_rw:
> +            if ( ot == p2m_ioreq_server )
> +            {
> +                p2m->ioreq.entry_count--;
> +                ASSERT(p2m->ioreq.entry_count >= 0);
> +            }
> +            break;
> +        case p2m_ioreq_server:
> +            if ( ot == p2m_ram_rw )
> +                p2m->ioreq.entry_count++;

I don't think the conditional is needed here. If anything it should be
an ASSERT() imo, as other transitions to p2m_ioreq_server should
not be allowed.

But there's a wider understanding issue I'm having here: What is
an "entry" here? Commonly I would assume this to refer to an
individual (4k) page, but it looks like you really mean table entry,
i.e. possibly representing a 2M or 1G page.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-08 15:33 ` [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
@ 2017-03-10 16:17   ` Jan Beulich
  2017-03-11  8:42     ` Yu Zhang
  2017-03-10 16:59   ` Andrew Cooper
  1 sibling, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-10 16:17 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>      return 0;
>  }
>  
> +#define DMOP_op_mask 0xff
>  static int dm_op(domid_t domid,

Please follow the existing continuation model used here, instead of
re-introducing the hvmop one.

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
>  int p2m_change_type_one(struct domain *d, unsigned long gfn,
>                          p2m_type_t ot, p2m_type_t nt);
>  
> +/* Synchronously change types across a range of p2m entries (start ... end) */
> +void p2m_finish_type_change(struct domain *d,
> +                            unsigned long start, unsigned long end,
> +                            p2m_type_t ot, p2m_type_t nt);

The comment should not give the impression that this is an open
range. I.e. [start, end], but perhaps even better would be to not
use "end" here, as that frequently (e.g. p2m_change_type_range())
this is used to indicate an exclusive range end.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-08 15:33 ` [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
  2017-03-10 16:17   ` Jan Beulich
@ 2017-03-10 16:59   ` Andrew Cooper
  2017-03-11  8:42     ` Yu Zhang
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Cooper @ 2017-03-10 16:59 UTC (permalink / raw)
  To: Yu Zhang, xen-devel; +Cc: George Dunlap, Paul Durrant, zhiyuan.lv, Jan Beulich

On 08/03/17 15:33, Yu Zhang wrote:
> After an ioreq server has unmapped, the remaining p2m_ioreq_server
> entries need to be reset back to p2m_ram_rw. This patch does this
> synchronously by iterating the p2m table.
>
> The synchronous resetting is necessary because we need to guarantee
> the p2m table is clean before another ioreq server is mapped. And
> since the sweeping of p2m table could be time consuming, it is done
> with hypercall continuation.
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
> Cc: Paul Durrant <paul.durrant@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>
> changes in v1: 
>   - This patch is splitted from patch 4 of last version.
>   - According to comments from Jan: update the gfn_start for
>     when use hypercall continuation to reset the p2m type.
>   - According to comments from Jan: use min() to compare gfn_end
>     and max mapped pfn in p2m_finish_type_change()
> ---
>  xen/arch/x86/hvm/dm.c          | 43 +++++++++++++++++++++++++++++++++++++-----
>  xen/arch/x86/mm/p2m.c          | 29 ++++++++++++++++++++++++++++
>  xen/include/asm-x86/p2m.h      |  5 +++++
>  xen/include/public/hvm/dm_op.h |  3 +--
>  4 files changed, 73 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
> index f97478b..a92d5d7 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>      return 0;
>  }
>  
> +#define DMOP_op_mask 0xff
>  static int dm_op(domid_t domid,
>                   unsigned int nr_bufs,
>                   xen_dm_op_buf_t bufs[])
> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>      }
>  
>      rc = -EINVAL;
> -    if ( op.pad )
> -        goto out;
>  
> -    switch ( op.op )
> +    switch ( op.op & DMOP_op_mask )

Nack to changes like this.  HVMop continuations only existed in this
form because we had to retrofit it to longterm-stable ABIs in the past,
and there were literally no free bits anywhere else.

Currently, the DMOP ABI isn't set in stone, so you have until code
freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
consider which other DMops might potentially need to become continuable,
and take preventative action before the freeze.


If we need to make similar changes once the ABI truely is frozen, there
are plenty of free bits in the end of the union which can be used
without breaking the ABI.

~Andrew.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-10 15:29   ` Jan Beulich
@ 2017-03-11  8:42     ` Yu Zhang
  2017-03-13 11:20       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-11  8:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/10/2017 11:29 PM, Jan Beulich wrote:
>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>> changes in v7:
>>    - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
>>    - According to comments from George: removed domain_pause/unpause() in
>>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>>      and we can avoid the:
>>      a> deadlock between p2m lock and ioreq server lock by using these locks
>>         in the same order - solved in patch 4;
> That is, until patch 4 there is deadlock potential? I think you want to
> re-order the patches if so. Or was it that the type can't really be used
> until the last patch of the series? (I'm sorry, it's been quite a while
> since the previous version.)

Oh. There's no deadlock potential in this version patch set. But in v6, 
there was, and I used
domain_pause/unpause() to avoid this. Later on, I realized that if I use 
different locks in the
same order, the deadlock potential can be avoid and we do not need 
domain_pause/unpause
in this version.

>> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
>>           break;
>>       }
>>   
>> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
>> +    {
>> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>> +            &op.u.map_mem_type_to_ioreq_server;
>> +
>> +        rc = -EINVAL;
>> +        if ( data->pad )
>> +            break;
>> +
>> +        /* Only support for HAP enabled hvm. */
>> +        if ( !hap_enabled(d) )
>> +            break;
> Perhaps better to give an error other than -EINVAL in this case?
> If so, then the same error should likely also be used in your
> set_mem_type() addition.
How about -ENOTSUP?

>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -99,6 +99,7 @@ static int hvmemul_do_io(
>>       uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
>>   {
>>       struct vcpu *curr = current;
>> +    struct domain *currd = curr->domain;
>>       struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>>       ioreq_t p = {
>>           .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
>> @@ -140,7 +141,7 @@ static int hvmemul_do_io(
>>                (p.dir != dir) ||
>>                (p.df != df) ||
>>                (p.data_is_ptr != data_is_addr) )
>> -            domain_crash(curr->domain);
>> +            domain_crash(currd);
> If you mean to do this transformation here, then please do so
> consistently for the entire function.

OK. Thanks.

>> @@ -177,8 +178,65 @@ static int hvmemul_do_io(
>>           break;
>>       case X86EMUL_UNHANDLEABLE:
>>       {
>> -        struct hvm_ioreq_server *s =
>> -            hvm_select_ioreq_server(curr->domain, &p);
>> +        /*
>> +         * Xen isn't emulating the instruction internally, so see if
>> +         * there's an ioreq server that can handle it. Rules:
>> +         *
>> +         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
>> +         * to choose the ioreq server by range. If no server is found,
>> +         * the access is ignored.
>> +         *
>> +         * - p2m_ioreq_server accesses are handled by the current
>> +         * ioreq_server for the domain, but there are some corner
>> +         * cases:
> Who or what is "the current ioreq_server for the domain"?
It means "the current ioreq_server which maps the p2m_ioreq_server type 
for this domain"...
I'd like to use a succinct phrase, but now seems not accurate enough. 
Any preference?

>> +         *   - If the domain ioreq_server is NULL, assume this is a
>> +         *   race between the unbinding of ioreq server and guest fault
>> +         *   so re-try the instruction.
>> +         *
>> +         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
>> +         *   like a normal PIO or MMIO that doesn't have an ioreq
>> +         *   server (i.e., by ignoring it).
>> +         */
>> +        struct hvm_ioreq_server *s = NULL;
>> +        p2m_type_t p2mt = p2m_invalid;
>> +
>> +        if ( is_mmio )
>> +        {
>> +            unsigned long gmfn = paddr_to_pfn(addr);
>> +
>> +            (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
> Stray cast.

OK. Will remove it.

>
>> +            if ( p2mt == p2m_ioreq_server )
>> +            {
>> +                unsigned int flags;
>> +
>> +                s = p2m_get_ioreq_server(currd, &flags);
>> +
>> +                /*
>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
>> +                 * we probably lost a race with unbinding of ioreq
>> +                 * server, just retry the access.
>> +                 */
>> +                if ( s == NULL )
>> +                {
>> +                    rc = X86EMUL_RETRY;
>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>> +                    break;
>> +                }
>> +
>> +                /*
>> +                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
>> +                 * we should set s to NULL, and just ignore such
>> +                 * access.
>> +                 */
>> +                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
>> +                    s = NULL;
> What is this about? You only allow WRITE registrations, so this looks
> to be dead code. Yet if it is meant to guard against future enabling
> of READ, then this clearly should not be done for reads.

It's to guard against future emulation of READ. We can remove it for now.

>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>               entry->r = entry->w = entry->x = 1;
>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>               break;
>> +        case p2m_ioreq_server:
>> +            entry->r = 1;
>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
> Along the lines of the previous comment - if you mean to have the
> code cope with READ, please also set ->r accordingly, or add a
> comment why this won't have the intended effect (yielding a not
> present EPTE).

How about we keep this code and do not support READ? I'll remove above 
dead code in hvmemul_do_io().

>> @@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn,
>>       default:
>>           return flags | _PAGE_NX_BIT;
>>       case p2m_grant_map_ro:
>> -    case p2m_ioreq_server:
>>           return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT;
>> +    case p2m_ioreq_server:
>> +        flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
>> +        if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
>> +            return flags & ~_PAGE_RW;
>> +        else
>> +            return flags;
> Stray else. But even better would imo be
>
>          if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
>              flags &= ~_PAGE_RW;
>          return flags;
Oh. Thanks. :)

>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>> +                                              unsigned int *flags)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    struct hvm_ioreq_server *s;
>> +
>> +    spin_lock(&p2m->ioreq.lock);
>> +
>> +    s = p2m->ioreq.server;
>> +    *flags = p2m->ioreq.flags;
>> +
>> +    spin_unlock(&p2m->ioreq.lock);
>> +    return s;
>> +}
> I'm afraid this question was asked before, but since there's no
> comment here or anywhere, I can't recall if there was a reason why
> s potentially being stale by the time the caller looks at it is not a
> problem.

Well, it is possibe that s is stale. I did not take it as a problem 
because the device model
will later discard such io request. And I believe current 
hvm_select_ioreq_server() also
has the same issue - the returned s should be considered to be stale, if 
the MMIO/PIO
address is removed from the ioreq server's rangeset.

Another thought is, if you think it is inappropriate for device model to 
do the check,
we can use spin_lock_recursive on ioreq_server.lock to protect all the 
ioreq server select
and release the lock after the ioreq server is sent out.

>> --- a/xen/include/public/hvm/dm_op.h
>> +++ b/xen/include/public/hvm/dm_op.h
>> @@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi {
>>       uint64_aligned_t addr;
>>   };
>>   
>> +/*
>> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id>
>> + *                                      to specific memroy type <type>
>> + *                                      for specific accesses <flags>
>> + *
>> + * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE,
>> + * which means only write operations are to be forwarded to an ioreq server.
>> + * Support for the emulation of read operations can be added when an ioreq
>> + * server has such requirement in future.
>> + */
>> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15
>> +
>> +struct xen_dm_op_map_mem_type_to_ioreq_server {
>> +    ioservid_t id;      /* IN - ioreq server id */
>> +    uint16_t type;      /* IN - memory type */
>> +    uint16_t pad;
> Perhaps there was padding needed when this was a hvmop, but
> now the padding does exactly the wrong thing.

Right, padding is useless in this new structure. I will remove it if 
other field does not change(I proposed
a change for flag field in reply to Andrew's comments on patch 5). Thank 
you, Jan.

Yu
>
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-10 16:03   ` Jan Beulich
@ 2017-03-11  8:42     ` Yu Zhang
  2017-03-13 11:24       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-11  8:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/11/2017 12:03 AM, Jan Beulich wrote:
>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -954,6 +954,26 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>                            p2m->default_access)
>>            : -EBUSY;
>>   
>> +    if ( !rc )
>> +    {
>> +        switch ( nt )
>> +        {
>> +        case p2m_ram_rw:
>> +            if ( ot == p2m_ioreq_server )
>> +            {
>> +                p2m->ioreq.entry_count--;
>> +                ASSERT(p2m->ioreq.entry_count >= 0);
>> +            }
>> +            break;
>> +        case p2m_ioreq_server:
>> +            if ( ot == p2m_ram_rw )
>> +                p2m->ioreq.entry_count++;
> I don't think the conditional is needed here. If anything it should be
> an ASSERT() imo, as other transitions to p2m_ioreq_server should
> not be allowed.

Thanks, Jan. I'll use ASSERT.

> But there's a wider understanding issue I'm having here: What is
> an "entry" here? Commonly I would assume this to refer to an
> individual (4k) page, but it looks like you really mean table entry,
> i.e. possibly representing a 2M or 1G page.

Well, it should be an entry pointing to a 4K page(only).
For p2m_ioreq_server, we shall not meet huge page. Because they are 
changed from p2m_ram_rw pages
in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry() 
with PAGE_ORDER_4K specified.


Thanks
Yu
> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-10 15:33   ` Jan Beulich
@ 2017-03-11  8:42     ` Yu Zhang
  2017-03-13 11:22       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-11  8:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/10/2017 11:33 PM, Jan Beulich wrote:
>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>> @@ -197,6 +217,10 @@ static int hvmemul_do_io(
>>            *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
>>            *   like a normal PIO or MMIO that doesn't have an ioreq
>>            *   server (i.e., by ignoring it).
>> +         *
>> +         *   - If the accesss is a read, this could be part of a
>> +         *   read-modify-write instruction, emulate the read so that we
>> +         *   have it.
> "it" being what here? Grammatically the insn, but we don't care
> about "having" the insn.

Here "have it" means " to have the value on this address copied in".
Sorry for the inaccurate comment. How about just "emulate the read first"?

>> @@ -226,6 +250,17 @@ static int hvmemul_do_io(
>>                   }
>>   
>>                   /*
>> +                 * This is part of a read-modify-write instruction.
> "is" or "may be"?

I believe should be "is".
It's the only scenario I can imagine when an read operation(only when 
this operation is
also a write one) traps. Otherwise there shall be no VM exit.

Thanks
Yu
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-10 16:17   ` Jan Beulich
@ 2017-03-11  8:42     ` Yu Zhang
  2017-03-13 11:24       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-11  8:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/11/2017 12:17 AM, Jan Beulich wrote:
>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>       return 0;
>>   }
>>   
>> +#define DMOP_op_mask 0xff
>>   static int dm_op(domid_t domid,
> Please follow the existing continuation model used here, instead of
> re-introducing the hvmop one.

Thank you, Jan. Andrew has the same concern with you. So please see my reply
to his comments - I explained the difficulties I met while doing this. 
It would be
great if you and Andrew could give some advice.:-)

>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
>>   int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>                           p2m_type_t ot, p2m_type_t nt);
>>   
>> +/* Synchronously change types across a range of p2m entries (start ... end) */
>> +void p2m_finish_type_change(struct domain *d,
>> +                            unsigned long start, unsigned long end,
>> +                            p2m_type_t ot, p2m_type_t nt);
> The comment should not give the impression that this is an open
> range. I.e. [start, end], but perhaps even better would be to not
> use "end" here, as that frequently (e.g. p2m_change_type_range())
> this is used to indicate an exclusive range end.

So how about [first_gfn, last_gfn]?

Thanks
Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-10 16:59   ` Andrew Cooper
@ 2017-03-11  8:42     ` Yu Zhang
  2017-03-13 11:32       ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-11  8:42 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: George Dunlap, Paul Durrant, zhiyuan.lv, Jan Beulich



On 3/11/2017 12:59 AM, Andrew Cooper wrote:
> On 08/03/17 15:33, Yu Zhang wrote:
>> After an ioreq server has unmapped, the remaining p2m_ioreq_server
>> entries need to be reset back to p2m_ram_rw. This patch does this
>> synchronously by iterating the p2m table.
>>
>> The synchronous resetting is necessary because we need to guarantee
>> the p2m table is clean before another ioreq server is mapped. And
>> since the sweeping of p2m table could be time consuming, it is done
>> with hypercall continuation.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>> Cc: Paul Durrant <paul.durrant@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> changes in v1:
>>    - This patch is splitted from patch 4 of last version.
>>    - According to comments from Jan: update the gfn_start for
>>      when use hypercall continuation to reset the p2m type.
>>    - According to comments from Jan: use min() to compare gfn_end
>>      and max mapped pfn in p2m_finish_type_change()
>> ---
>>   xen/arch/x86/hvm/dm.c          | 43 +++++++++++++++++++++++++++++++++++++-----
>>   xen/arch/x86/mm/p2m.c          | 29 ++++++++++++++++++++++++++++
>>   xen/include/asm-x86/p2m.h      |  5 +++++
>>   xen/include/public/hvm/dm_op.h |  3 +--
>>   4 files changed, 73 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
>> index f97478b..a92d5d7 100644
>> --- a/xen/arch/x86/hvm/dm.c
>> +++ b/xen/arch/x86/hvm/dm.c
>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>       return 0;
>>   }
>>   
>> +#define DMOP_op_mask 0xff
>>   static int dm_op(domid_t domid,
>>                    unsigned int nr_bufs,
>>                    xen_dm_op_buf_t bufs[])
>> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>>       }
>>   
>>       rc = -EINVAL;
>> -    if ( op.pad )
>> -        goto out;
>>   
>> -    switch ( op.op )
>> +    switch ( op.op & DMOP_op_mask )
> Nack to changes like this.  HVMop continuations only existed in this
> form because we had to retrofit it to longterm-stable ABIs in the past,
> and there were literally no free bits anywhere else.

Thank you, Andrew. Frankly, I'm not surprised to see disagreement from 
you and Jan
on this interface. :)

Using the HVMop like continuation is a hard choice for me. I saw you 
removed these
from the DMops, and I also agree that the new interface is much cleaner.

I noticed there are other 2 DMops to continuable, the set_mem_type and 
the modified_mem.
Both definitions of their structure have fields like first_gfn and nr 
which can be updated to
be used in the hypercall continuation.
But for map_mem_type_to_ioreq_server, however we do not need a gfn 
number exposed in
this interface(and I do not think exposing a gfn is correct), it is only 
used when an ioreq server
detaches from p2m_ioreq_server and the p2m sweeping is not finished. So 
I changed definition
of the xen_dm_op, removed the pad field and extend the size of op to 64 
bit(so that we can have
free bits). But I do not think this is an optimal solution either.

> Currently, the DMOP ABI isn't set in stone, so you have until code
> freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
> consider which other DMops might potentially need to become continuable,
> and take preventative action before the freeze.

I can only see set_mem_type and modified_mem need to become continuable, 
and they does
this quite elegantly now.

>
> If we need to make similar changes once the ABI truely is frozen, there
> are plenty of free bits in the end of the union which can be used
> without breaking the ABI.

Another propose is to change the definition of 
xen_dm_op_map_mem_type_to_ioreq_server,
and extend the flags field from uint32_t to uint64_t and use the upper 
bits to store the gfn.
Do you think this proposal acceptable?
Besides using some free bits to store the gfn, do we have any other 
approaches? Any suggestions
would be welcome! :)

Thanks
Yu

> ~Andrew.
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-11  8:42     ` Yu Zhang
@ 2017-03-13 11:20       ` Jan Beulich
  2017-03-14  7:28         ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-13 11:20 UTC (permalink / raw)
  To: Paul Durrant, Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	zhiyuan.lv, Jun Nakajima

>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/10/2017 11:29 PM, Jan Beulich wrote:
>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>> changes in v7:
>>>    - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
>>>    - According to comments from George: removed domain_pause/unpause() in
>>>      hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>>>      and we can avoid the:
>>>      a> deadlock between p2m lock and ioreq server lock by using these locks
>>>         in the same order - solved in patch 4;
>> That is, until patch 4 there is deadlock potential? I think you want to
>> re-order the patches if so. Or was it that the type can't really be used
>> until the last patch of the series? (I'm sorry, it's been quite a while
>> since the previous version.)
> 
> Oh. There's no deadlock potential in this version patch set. But in v6, there was, and I used
> domain_pause/unpause() to avoid this. Later on, I realized that if I use different locks in the
> same order, the deadlock potential can be avoid and we do not need domain_pause/unpause
> in this version.

Well, okay, but in the future please don't add misleading change
info then.

>>> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
>>>           break;
>>>       }
>>>   
>>> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
>>> +    {
>>> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>> +            &op.u.map_mem_type_to_ioreq_server;
>>> +
>>> +        rc = -EINVAL;
>>> +        if ( data->pad )
>>> +            break;
>>> +
>>> +        /* Only support for HAP enabled hvm. */
>>> +        if ( !hap_enabled(d) )
>>> +            break;
>> Perhaps better to give an error other than -EINVAL in this case?
>> If so, then the same error should likely also be used in your
>> set_mem_type() addition.
> How about -ENOTSUP?

If you mean -EOPNOTSUPP, then yes.

>>> @@ -177,8 +178,65 @@ static int hvmemul_do_io(
>>>           break;
>>>       case X86EMUL_UNHANDLEABLE:
>>>       {
>>> -        struct hvm_ioreq_server *s =
>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>> +        /*
>>> +         * Xen isn't emulating the instruction internally, so see if
>>> +         * there's an ioreq server that can handle it. Rules:
>>> +         *
>>> +         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
>>> +         * to choose the ioreq server by range. If no server is found,
>>> +         * the access is ignored.
>>> +         *
>>> +         * - p2m_ioreq_server accesses are handled by the current
>>> +         * ioreq_server for the domain, but there are some corner
>>> +         * cases:
>> Who or what is "the current ioreq_server for the domain"?
> It means "the current ioreq_server which maps the p2m_ioreq_server type 
> for this domain"...
> I'd like to use a succinct phrase, but now seems not accurate enough. 
> Any preference?

Just add "designated" after "current", or replacing "current"?

>>> +            if ( p2mt == p2m_ioreq_server )
>>> +            {
>>> +                unsigned int flags;
>>> +
>>> +                s = p2m_get_ioreq_server(currd, &flags);
>>> +
>>> +                /*
>>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
>>> +                 * we probably lost a race with unbinding of ioreq
>>> +                 * server, just retry the access.
>>> +                 */
>>> +                if ( s == NULL )
>>> +                {
>>> +                    rc = X86EMUL_RETRY;
>>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>>> +                    break;
>>> +                }
>>> +
>>> +                /*
>>> +                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
>>> +                 * we should set s to NULL, and just ignore such
>>> +                 * access.
>>> +                 */
>>> +                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
>>> +                    s = NULL;
>> What is this about? You only allow WRITE registrations, so this looks
>> to be dead code. Yet if it is meant to guard against future enabling
>> of READ, then this clearly should not be done for reads.
> 
> It's to guard against future emulation of READ. We can remove it for now.

I guess first of all you need to settle on what you want the code to
look like generally wrt reads: Do you want it to support the option
as much as possible, reducing code changes to a minimum when
someone wants to actually add support, or do you want to reject
such attempts? Whichever variant you choose, you should carry it
out consistently rather than mixing both.

>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>               entry->r = entry->w = entry->x = 1;
>>>               entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>               break;
>>> +        case p2m_ioreq_server:
>>> +            entry->r = 1;
>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>> Along the lines of the previous comment - if you mean to have the
>> code cope with READ, please also set ->r accordingly, or add a
>> comment why this won't have the intended effect (yielding a not
>> present EPTE).
> 
> How about we keep this code and do not support READ? I'll remove above 
> dead code in hvmemul_do_io().

Sure, as said above: All I'd like to push for is that the result is
consistent across the code base.

>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>>> +                                              unsigned int *flags)
>>> +{
>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +    struct hvm_ioreq_server *s;
>>> +
>>> +    spin_lock(&p2m->ioreq.lock);
>>> +
>>> +    s = p2m->ioreq.server;
>>> +    *flags = p2m->ioreq.flags;
>>> +
>>> +    spin_unlock(&p2m->ioreq.lock);
>>> +    return s;
>>> +}
>> I'm afraid this question was asked before, but since there's no
>> comment here or anywhere, I can't recall if there was a reason why
>> s potentially being stale by the time the caller looks at it is not a
>> problem.
> 
> Well, it is possibe that s is stale. I did not take it as a problem 
> because the device model
> will later discard such io request. And I believe current 
> hvm_select_ioreq_server() also
> has the same issue - the returned s should be considered to be stale, if 
> the MMIO/PIO
> address is removed from the ioreq server's rangeset.
> 
> Another thought is, if you think it is inappropriate for device model to 
> do the check,
> we can use spin_lock_recursive on ioreq_server.lock to protect all the 
> ioreq server select
> and release the lock after the ioreq server is sent out.

Well, let's first ask Paul as to what his perspective here is, both
specifically for this change and more generally regarding what
you say above.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-11  8:42     ` Yu Zhang
@ 2017-03-13 11:22       ` Jan Beulich
  2017-03-14  7:28         ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-13 11:22 UTC (permalink / raw)
  To: Yu Zhang; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/10/2017 11:33 PM, Jan Beulich wrote:
>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>> @@ -197,6 +217,10 @@ static int hvmemul_do_io(
>>>            *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
>>>            *   like a normal PIO or MMIO that doesn't have an ioreq
>>>            *   server (i.e., by ignoring it).
>>> +         *
>>> +         *   - If the accesss is a read, this could be part of a
>>> +         *   read-modify-write instruction, emulate the read so that we
>>> +         *   have it.
>> "it" being what here? Grammatically the insn, but we don't care
>> about "having" the insn.
> 
> Here "have it" means " to have the value on this address copied in".
> Sorry for the inaccurate comment. How about just "emulate the read first"?

Sounds okay.

>>> @@ -226,6 +250,17 @@ static int hvmemul_do_io(
>>>                   }
>>>   
>>>                   /*
>>> +                 * This is part of a read-modify-write instruction.
>> "is" or "may be"?
> 
> I believe should be "is".
> It's the only scenario I can imagine when an read operation(only when 
> this operation is
> also a write one) traps. Otherwise there shall be no VM exit.

Even with a racing update to the type?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-11  8:42     ` Yu Zhang
@ 2017-03-13 11:24       ` Jan Beulich
  2017-03-14  7:42         ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-13 11:24 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/11/2017 12:03 AM, Jan Beulich wrote:
>> But there's a wider understanding issue I'm having here: What is
>> an "entry" here? Commonly I would assume this to refer to an
>> individual (4k) page, but it looks like you really mean table entry,
>> i.e. possibly representing a 2M or 1G page.
> 
> Well, it should be an entry pointing to a 4K page(only).
> For p2m_ioreq_server, we shall not meet huge page. Because they are 
> changed from p2m_ram_rw pages
> in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry() 
> with PAGE_ORDER_4K specified.

And recombination of large pages won't ever end up hitting these?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-11  8:42     ` Yu Zhang
@ 2017-03-13 11:24       ` Jan Beulich
  0 siblings, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-13 11:24 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/11/2017 12:17 AM, Jan Beulich wrote:
>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -611,6 +611,11 @@ void p2m_change_type_range(struct domain *d,
>>>   int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>>                           p2m_type_t ot, p2m_type_t nt);
>>>   
>>> +/* Synchronously change types across a range of p2m entries (start ... end) */
>>> +void p2m_finish_type_change(struct domain *d,
>>> +                            unsigned long start, unsigned long end,
>>> +                            p2m_type_t ot, p2m_type_t nt);
>> The comment should not give the impression that this is an open
>> range. I.e. [start, end], but perhaps even better would be to not
>> use "end" here, as that frequently (e.g. p2m_change_type_range())
>> this is used to indicate an exclusive range end.
> 
> So how about [first_gfn, last_gfn]?

That's fine.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-11  8:42     ` Yu Zhang
@ 2017-03-13 11:32       ` Jan Beulich
  2017-03-14  7:42         ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-13 11:32 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/11/2017 12:59 AM, Andrew Cooper wrote:
>> On 08/03/17 15:33, Yu Zhang wrote:
>>> --- a/xen/arch/x86/hvm/dm.c
>>> +++ b/xen/arch/x86/hvm/dm.c
>>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>>       return 0;
>>>   }
>>>   
>>> +#define DMOP_op_mask 0xff
>>>   static int dm_op(domid_t domid,
>>>                    unsigned int nr_bufs,
>>>                    xen_dm_op_buf_t bufs[])
>>> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>>>       }
>>>   
>>>       rc = -EINVAL;
>>> -    if ( op.pad )
>>> -        goto out;
>>>   
>>> -    switch ( op.op )
>>> +    switch ( op.op & DMOP_op_mask )
>> Nack to changes like this.  HVMop continuations only existed in this
>> form because we had to retrofit it to longterm-stable ABIs in the past,
>> and there were literally no free bits anywhere else.
> 
> Thank you, Andrew. Frankly, I'm not surprised to see disagreement from 
> you and Jan
> on this interface. :)
> 
> Using the HVMop like continuation is a hard choice for me. I saw you 
> removed these
> from the DMops, and I also agree that the new interface is much cleaner.
> 
> I noticed there are other 2 DMops to continuable, the set_mem_type and 
> the modified_mem.
> Both definitions of their structure have fields like first_gfn and nr 
> which can be updated to
> be used in the hypercall continuation.
> But for map_mem_type_to_ioreq_server, however we do not need a gfn 
> number exposed in
> this interface(and I do not think exposing a gfn is correct), it is only 
> used when an ioreq server
> detaches from p2m_ioreq_server and the p2m sweeping is not finished. So 
> I changed definition
> of the xen_dm_op, removed the pad field and extend the size of op to 64 
> bit(so that we can have
> free bits). But I do not think this is an optimal solution either.
> 
>> Currently, the DMOP ABI isn't set in stone, so you have until code
>> freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
>> consider which other DMops might potentially need to become continuable,
>> and take preventative action before the freeze.
> 
> I can only see set_mem_type and modified_mem need to become continuable, 
> and they does
> this quite elegantly now.
> 
>>
>> If we need to make similar changes once the ABI truely is frozen, there
>> are plenty of free bits in the end of the union which can be used
>> without breaking the ABI.
> 
> Another propose is to change the definition of 
> xen_dm_op_map_mem_type_to_ioreq_server,
> and extend the flags field from uint32_t to uint64_t and use the upper 
> bits to store the gfn.

Well, you introduce a brand new sub-op in patch 2. Why would
you even try to re-use part of some other field in such a
situation, when you can right away add an opaque 64-bit field
you require the caller to set to zero upon first call? The caller
not playing by this would - afaict - only harm the guest it controls.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-13 11:20       ` Jan Beulich
@ 2017-03-14  7:28         ` Yu Zhang
  2017-03-14  9:40           ` Paul Durrant
  2017-03-14 10:26           ` Jan Beulich
  0 siblings, 2 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-14  7:28 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	zhiyuan.lv, Jun Nakajima



On 3/13/2017 7:20 PM, Jan Beulich wrote:
>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/10/2017 11:29 PM, Jan Beulich wrote:
>>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>>> changes in v7:
>>>>     - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server.
>>>>     - According to comments from George: removed domain_pause/unpause() in
>>>>       hvm_map_mem_type_to_ioreq_server(), because it's too expensive,
>>>>       and we can avoid the:
>>>>       a> deadlock between p2m lock and ioreq server lock by using these locks
>>>>          in the same order - solved in patch 4;
>>> That is, until patch 4 there is deadlock potential? I think you want to
>>> re-order the patches if so. Or was it that the type can't really be used
>>> until the last patch of the series? (I'm sorry, it's been quite a while
>>> since the previous version.)
>> Oh. There's no deadlock potential in this version patch set. But in v6, there was, and I used
>> domain_pause/unpause() to avoid this. Later on, I realized that if I use different locks in the
>> same order, the deadlock potential can be avoid and we do not need domain_pause/unpause
>> in this version.
> Well, okay, but in the future please don't add misleading change
> info then.

Got it. Thanks. :-)
>>>> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid,
>>>>            break;
>>>>        }
>>>>    
>>>> +    case XEN_DMOP_map_mem_type_to_ioreq_server:
>>>> +    {
>>>> +        const struct xen_dm_op_map_mem_type_to_ioreq_server *data =
>>>> +            &op.u.map_mem_type_to_ioreq_server;
>>>> +
>>>> +        rc = -EINVAL;
>>>> +        if ( data->pad )
>>>> +            break;
>>>> +
>>>> +        /* Only support for HAP enabled hvm. */
>>>> +        if ( !hap_enabled(d) )
>>>> +            break;
>>> Perhaps better to give an error other than -EINVAL in this case?
>>> If so, then the same error should likely also be used in your
>>> set_mem_type() addition.
>> How about -ENOTSUP?
> If you mean -EOPNOTSUPP, then yes.

Yes, I mean -EOPNOTSUPP
>>>> @@ -177,8 +178,65 @@ static int hvmemul_do_io(
>>>>            break;
>>>>        case X86EMUL_UNHANDLEABLE:
>>>>        {
>>>> -        struct hvm_ioreq_server *s =
>>>> -            hvm_select_ioreq_server(curr->domain, &p);
>>>> +        /*
>>>> +         * Xen isn't emulating the instruction internally, so see if
>>>> +         * there's an ioreq server that can handle it. Rules:
>>>> +         *
>>>> +         * - PIO and "normal" mmio run through hvm_select_ioreq_server()
>>>> +         * to choose the ioreq server by range. If no server is found,
>>>> +         * the access is ignored.
>>>> +         *
>>>> +         * - p2m_ioreq_server accesses are handled by the current
>>>> +         * ioreq_server for the domain, but there are some corner
>>>> +         * cases:
>>> Who or what is "the current ioreq_server for the domain"?
>> It means "the current ioreq_server which maps the p2m_ioreq_server type
>> for this domain"...
>> I'd like to use a succinct phrase, but now seems not accurate enough.
>> Any preference?
> Just add "designated" after "current", or replacing "current"?

Replacing "current" with "designated" sounds good to me. :-)
>>>> +            if ( p2mt == p2m_ioreq_server )
>>>> +            {
>>>> +                unsigned int flags;
>>>> +
>>>> +                s = p2m_get_ioreq_server(currd, &flags);
>>>> +
>>>> +                /*
>>>> +                 * If p2mt is ioreq_server but ioreq_server is NULL,
>>>> +                 * we probably lost a race with unbinding of ioreq
>>>> +                 * server, just retry the access.
>>>> +                 */
>>>> +                if ( s == NULL )
>>>> +                {
>>>> +                    rc = X86EMUL_RETRY;
>>>> +                    vio->io_req.state = STATE_IOREQ_NONE;
>>>> +                    break;
>>>> +                }
>>>> +
>>>> +                /*
>>>> +                 * If the IOREQ_MEM_ACCESS_WRITE flag is not set,
>>>> +                 * we should set s to NULL, and just ignore such
>>>> +                 * access.
>>>> +                 */
>>>> +                if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) )
>>>> +                    s = NULL;
>>> What is this about? You only allow WRITE registrations, so this looks
>>> to be dead code. Yet if it is meant to guard against future enabling
>>> of READ, then this clearly should not be done for reads.
>> It's to guard against future emulation of READ. We can remove it for now.
> I guess first of all you need to settle on what you want the code to
> look like generally wrt reads: Do you want it to support the option
> as much as possible, reducing code changes to a minimum when
> someone wants to actually add support, or do you want to reject
> such attempts? Whichever variant you choose, you should carry it
> out consistently rather than mixing both.
>
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>                entry->r = entry->w = entry->x = 1;
>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>                break;
>>>> +        case p2m_ioreq_server:
>>>> +            entry->r = 1;
>>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>>> Along the lines of the previous comment - if you mean to have the
>>> code cope with READ, please also set ->r accordingly, or add a
>>> comment why this won't have the intended effect (yielding a not
>>> present EPTE).
>> How about we keep this code and do not support READ? I'll remove above
>> dead code in hvmemul_do_io().
> Sure, as said above: All I'd like to push for is that the result is
> consistent across the code base.

Thank you, Jan. I should have keep a consistent principle in this code.
My preference is to remove the possible read emulation logic. But could 
we still keep the
definition of XEN_DMOP_IOREQ_MEM_ACCESS_READ, so that people can still 
know this
interface can be extended in the future?

>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>>>> +                                              unsigned int *flags)
>>>> +{
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    struct hvm_ioreq_server *s;
>>>> +
>>>> +    spin_lock(&p2m->ioreq.lock);
>>>> +
>>>> +    s = p2m->ioreq.server;
>>>> +    *flags = p2m->ioreq.flags;
>>>> +
>>>> +    spin_unlock(&p2m->ioreq.lock);
>>>> +    return s;
>>>> +}
>>> I'm afraid this question was asked before, but since there's no
>>> comment here or anywhere, I can't recall if there was a reason why
>>> s potentially being stale by the time the caller looks at it is not a
>>> problem.
>> Well, it is possibe that s is stale. I did not take it as a problem
>> because the device model
>> will later discard such io request. And I believe current
>> hvm_select_ioreq_server() also
>> has the same issue - the returned s should be considered to be stale, if
>> the MMIO/PIO
>> address is removed from the ioreq server's rangeset.
>>
>> Another thought is, if you think it is inappropriate for device model to
>> do the check,
>> we can use spin_lock_recursive on ioreq_server.lock to protect all the
>> ioreq server select
>> and release the lock after the ioreq server is sent out.
> Well, let's first ask Paul as to what his perspective here is, both
> specifically for this change and more generally regarding what
> you say above.

Paul, any suggestions on this and the above one? :)

Thanks
Yu
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages.
  2017-03-13 11:22       ` Jan Beulich
@ 2017-03-14  7:28         ` Yu Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-14  7:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/13/2017 7:22 PM, Jan Beulich wrote:
>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/10/2017 11:33 PM, Jan Beulich wrote:
>>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>>> @@ -197,6 +217,10 @@ static int hvmemul_do_io(
>>>>             *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
>>>>             *   like a normal PIO or MMIO that doesn't have an ioreq
>>>>             *   server (i.e., by ignoring it).
>>>> +         *
>>>> +         *   - If the accesss is a read, this could be part of a
>>>> +         *   read-modify-write instruction, emulate the read so that we
>>>> +         *   have it.
>>> "it" being what here? Grammatically the insn, but we don't care
>>> about "having" the insn.
>> Here "have it" means " to have the value on this address copied in".
>> Sorry for the inaccurate comment. How about just "emulate the read first"?
> Sounds okay.
>
>>>> @@ -226,6 +250,17 @@ static int hvmemul_do_io(
>>>>                    }
>>>>    
>>>>                    /*
>>>> +                 * This is part of a read-modify-write instruction.
>>> "is" or "may be"?
>> I believe should be "is".
>> It's the only scenario I can imagine when an read operation(only when
>> this operation is
>> also a write one) traps. Otherwise there shall be no VM exit.
> Even with a racing update to the type?

Yes.
With patch 1/5, there shall be no racing update to the p2m type during 
this process.
Besides, I believe even without patch 1/5, and we allow the racing p2m 
change, it will
only happen on a p2m_ioreq_server page changed to p2m_ram_rw(in such 
case it is shall
only be a write or a read-modify-write op that cause such vm exit). The 
race will not
happen for a p2m_ram_rw page changed to p2m_ioreq_server, because there 
shall be no
vm exit at all.

Thanks
Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-13 11:24       ` Jan Beulich
@ 2017-03-14  7:42         ` Yu Zhang
  2017-03-14 10:49           ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-14  7:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/13/2017 7:24 PM, Jan Beulich wrote:
>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/11/2017 12:03 AM, Jan Beulich wrote:
>>> But there's a wider understanding issue I'm having here: What is
>>> an "entry" here? Commonly I would assume this to refer to an
>>> individual (4k) page, but it looks like you really mean table entry,
>>> i.e. possibly representing a 2M or 1G page.
>> Well, it should be an entry pointing to a 4K page(only).
>> For p2m_ioreq_server, we shall not meet huge page. Because they are
>> changed from p2m_ram_rw pages
>> in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry()
>> with PAGE_ORDER_4K specified.
> And recombination of large pages won't ever end up hitting these?

Well, by recombination I guess you refer to the POD pages? I do not 
think p2m_ioreq_server
pages will be combined now, which means we do not need to worry about 
recounting the
p2m_ioreq_server entries while a split happens.

And as to type change from p2m_ram_rw to p2m_ioreq_server, even if this 
is done on a large
page, p2m_change_type_one() will split the page and only mark one ept 
entry(which maps to
a 4K page) as p2m_ioreq_server(other 511 entries remains as p2m_ram_rw). 
So I still believe
counting p2m_ioreq_server entries here is correct.

Besides, if we look from XenGT requirement side, it is guest graphic 
page tables we are trying to
write-protect, which are 4K in size.

Thanks
Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-13 11:32       ` Jan Beulich
@ 2017-03-14  7:42         ` Yu Zhang
  2017-03-14 10:51           ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-14  7:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/13/2017 7:32 PM, Jan Beulich wrote:
>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/11/2017 12:59 AM, Andrew Cooper wrote:
>>> On 08/03/17 15:33, Yu Zhang wrote:
>>>> --- a/xen/arch/x86/hvm/dm.c
>>>> +++ b/xen/arch/x86/hvm/dm.c
>>>> @@ -288,6 +288,7 @@ static int inject_event(struct domain *d,
>>>>        return 0;
>>>>    }
>>>>    
>>>> +#define DMOP_op_mask 0xff
>>>>    static int dm_op(domid_t domid,
>>>>                     unsigned int nr_bufs,
>>>>                     xen_dm_op_buf_t bufs[])
>>>> @@ -315,10 +316,8 @@ static int dm_op(domid_t domid,
>>>>        }
>>>>    
>>>>        rc = -EINVAL;
>>>> -    if ( op.pad )
>>>> -        goto out;
>>>>    
>>>> -    switch ( op.op )
>>>> +    switch ( op.op & DMOP_op_mask )
>>> Nack to changes like this.  HVMop continuations only existed in this
>>> form because we had to retrofit it to longterm-stable ABIs in the past,
>>> and there were literally no free bits anywhere else.
>> Thank you, Andrew. Frankly, I'm not surprised to see disagreement from
>> you and Jan
>> on this interface. :)
>>
>> Using the HVMop like continuation is a hard choice for me. I saw you
>> removed these
>> from the DMops, and I also agree that the new interface is much cleaner.
>>
>> I noticed there are other 2 DMops to continuable, the set_mem_type and
>> the modified_mem.
>> Both definitions of their structure have fields like first_gfn and nr
>> which can be updated to
>> be used in the hypercall continuation.
>> But for map_mem_type_to_ioreq_server, however we do not need a gfn
>> number exposed in
>> this interface(and I do not think exposing a gfn is correct), it is only
>> used when an ioreq server
>> detaches from p2m_ioreq_server and the p2m sweeping is not finished. So
>> I changed definition
>> of the xen_dm_op, removed the pad field and extend the size of op to 64
>> bit(so that we can have
>> free bits). But I do not think this is an optimal solution either.
>>
>>> Currently, the DMOP ABI isn't set in stone, so you have until code
>>> freeze in 4.9 to make changes.  (i.e. soon now.)  We should also
>>> consider which other DMops might potentially need to become continuable,
>>> and take preventative action before the freeze.
>> I can only see set_mem_type and modified_mem need to become continuable,
>> and they does
>> this quite elegantly now.
>>
>>> If we need to make similar changes once the ABI truely is frozen, there
>>> are plenty of free bits in the end of the union which can be used
>>> without breaking the ABI.
>> Another propose is to change the definition of
>> xen_dm_op_map_mem_type_to_ioreq_server,
>> and extend the flags field from uint32_t to uint64_t and use the upper
>> bits to store the gfn.
> Well, you introduce a brand new sub-op in patch 2. Why would
> you even try to re-use part of some other field in such a
> situation, when you can right away add an opaque 64-bit field
> you require the caller to set to zero upon first call? The caller
> not playing by this would - afaict - only harm the guest it controls.

Thanks, Jan.
So you mean change the definition of to 
xen_dm_op_map_mem_type_to_ioreq_server
to something like this?

struct xen_dm_op_map_mem_type_to_ioreq_server {
     ioservid_t id;      /* IN - ioreq server id */
     uint16_t type;      /* IN - memory type */
     uint32_t flags;     /* IN - types of accesses to be forwarded to the
                            ioreq server. flags with 0 means to unmap the
                            ioreq server */

     uint64_t opaque;    /* only used for hypercall continuation, should
                            be set to zero by the caller */
};

If so, is there any approach in hypervisor to differentiate the first 
call from the continued
hypercall? Do we need some form of check on this opaque? And yes, not 
playing by this
will only harm the guest the device model controls.

Thanks
Yu

> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14  7:28         ` Yu Zhang
@ 2017-03-14  9:40           ` Paul Durrant
  2017-03-14  9:52             ` Yu Zhang
  2017-03-14 10:26           ` Jan Beulich
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Durrant @ 2017-03-14  9:40 UTC (permalink / raw)
  To: 'Yu Zhang', Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, zhiyuan.lv, Jun Nakajima

> -----Original Message-----
[snip]
> >>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >>>> +                                              unsigned int *flags)
> >>>> +{
> >>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >>>> +    struct hvm_ioreq_server *s;
> >>>> +
> >>>> +    spin_lock(&p2m->ioreq.lock);
> >>>> +
> >>>> +    s = p2m->ioreq.server;
> >>>> +    *flags = p2m->ioreq.flags;
> >>>> +
> >>>> +    spin_unlock(&p2m->ioreq.lock);
> >>>> +    return s;
> >>>> +}
> >>> I'm afraid this question was asked before, but since there's no
> >>> comment here or anywhere, I can't recall if there was a reason why
> >>> s potentially being stale by the time the caller looks at it is not a
> >>> problem.
> >> Well, it is possibe that s is stale. I did not take it as a problem
> >> because the device model
> >> will later discard such io request. And I believe current
> >> hvm_select_ioreq_server() also
> >> has the same issue - the returned s should be considered to be stale, if
> >> the MMIO/PIO
> >> address is removed from the ioreq server's rangeset.

An enabled emulator has to be prepared to receive ioreqs for ranges it has unmapped since there is no domain_pause() to prevent a race.

> >>
> >> Another thought is, if you think it is inappropriate for device model to
> >> do the check,
> >> we can use spin_lock_recursive on ioreq_server.lock to protect all the
> >> ioreq server select
> >> and release the lock after the ioreq server is sent out.
> > Well, let's first ask Paul as to what his perspective here is, both
> > specifically for this change and more generally regarding what
> > you say above.
> 
> Paul, any suggestions on this and the above one? :)

Well, as I said above, the device model has to check whether it is willing to handle and ioreq it is passed and terminate it appropriately under all circumstances. There is no option for it to reject the I/O.
This may be ok for MMIO regions coming and going, but is there anything more to consider here it we change a page time from ioreq_server back to RAM? Clearly we need to make sure that there is no scope for I/O to that page being incorrectly handled during transition.

  Paul

> 
> Thanks
> Yu
> > Jan
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14  9:40           ` Paul Durrant
@ 2017-03-14  9:52             ` Yu Zhang
  2017-03-14 10:40               ` Paul Durrant
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-14  9:52 UTC (permalink / raw)
  To: Paul Durrant, Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, zhiyuan.lv, Jun Nakajima



On 3/14/2017 5:40 PM, Paul Durrant wrote:
>> -----Original Message-----
> [snip]
>>>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>>>>>> +                                              unsigned int *flags)
>>>>>> +{
>>>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>>>> +    struct hvm_ioreq_server *s;
>>>>>> +
>>>>>> +    spin_lock(&p2m->ioreq.lock);
>>>>>> +
>>>>>> +    s = p2m->ioreq.server;
>>>>>> +    *flags = p2m->ioreq.flags;
>>>>>> +
>>>>>> +    spin_unlock(&p2m->ioreq.lock);
>>>>>> +    return s;
>>>>>> +}
>>>>> I'm afraid this question was asked before, but since there's no
>>>>> comment here or anywhere, I can't recall if there was a reason why
>>>>> s potentially being stale by the time the caller looks at it is not a
>>>>> problem.
>>>> Well, it is possibe that s is stale. I did not take it as a problem
>>>> because the device model
>>>> will later discard such io request. And I believe current
>>>> hvm_select_ioreq_server() also
>>>> has the same issue - the returned s should be considered to be stale, if
>>>> the MMIO/PIO
>>>> address is removed from the ioreq server's rangeset.
> An enabled emulator has to be prepared to receive ioreqs for ranges it has unmapped since there is no domain_pause() to prevent a race.

Thank you for the reply, Paul.
So you mean using the ioreq server lock around this process will not 
prevent this either? Why?

>>>> Another thought is, if you think it is inappropriate for device model to
>>>> do the check,
>>>> we can use spin_lock_recursive on ioreq_server.lock to protect all the
>>>> ioreq server select
>>>> and release the lock after the ioreq server is sent out.
>>> Well, let's first ask Paul as to what his perspective here is, both
>>> specifically for this change and more generally regarding what
>>> you say above.
>> Paul, any suggestions on this and the above one? :)
> Well, as I said above, the device model has to check whether it is willing to handle and ioreq it is passed and terminate it appropriately under all circumstances. There is no option for it to reject the I/O.
> This may be ok for MMIO regions coming and going, but is there anything more to consider here it we change a page time from ioreq_server back to RAM? Clearly we need to make sure that there is no scope for I/O to that page being incorrectly handled during transition.

I don't think we need to worry about the p2m type change, patch 1 
prevents this. The s returned by
p2m_get_ioreq_server() may be stale(and is then discarded by device 
model in our current code), but
p2m type will not be stale. I agree device model has responsibility to 
do such check, but I also wonder
if it is possible for the hypervisor to provide some kind insurance.

Thanks
Yu

>    Paul
>
>> Thanks
>> Yu
>>> Jan
>>>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14  7:28         ` Yu Zhang
  2017-03-14  9:40           ` Paul Durrant
@ 2017-03-14 10:26           ` Jan Beulich
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2017-03-14 10:26 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, Tim Deegan, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 14.03.17 at 08:28, <yu.c.zhang@linux.intel.com> wrote:
> On 3/13/2017 7:20 PM, Jan Beulich wrote:
>>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>>> On 3/10/2017 11:29 PM, Jan Beulich wrote:
>>>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote:
>>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>>>>>                entry->r = entry->w = entry->x = 1;
>>>>>                entry->a = entry->d = !!cpu_has_vmx_ept_ad;
>>>>>                break;
>>>>> +        case p2m_ioreq_server:
>>>>> +            entry->r = 1;
>>>>> +            entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE);
>>>> Along the lines of the previous comment - if you mean to have the
>>>> code cope with READ, please also set ->r accordingly, or add a
>>>> comment why this won't have the intended effect (yielding a not
>>>> present EPTE).
>>> How about we keep this code and do not support READ? I'll remove above
>>> dead code in hvmemul_do_io().
>> Sure, as said above: All I'd like to push for is that the result is
>> consistent across the code base.
> 
> Thank you, Jan. I should have keep a consistent principle in this code.
> My preference is to remove the possible read emulation logic. But could 
> we still keep the
> definition of XEN_DMOP_IOREQ_MEM_ACCESS_READ, so that people can still 
> know this
> interface can be extended in the future?

Of course.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14  9:52             ` Yu Zhang
@ 2017-03-14 10:40               ` Paul Durrant
  2017-03-14 12:03                 ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Paul Durrant @ 2017-03-14 10:40 UTC (permalink / raw)
  To: 'Yu Zhang', Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, zhiyuan.lv, Jun Nakajima

> -----Original Message-----
> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
> Sent: 14 March 2017 09:53
> To: Paul Durrant <Paul.Durrant@citrix.com>; Jan Beulich
> <JBeulich@suse.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>; zhiyuan.lv@intel.com; xen-
> devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram
> with p2m_ioreq_server to an ioreq server.
> 
> 
> 
> On 3/14/2017 5:40 PM, Paul Durrant wrote:
> >> -----Original Message-----
> > [snip]
> >>>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain
> *d,
> >>>>>> +                                              unsigned int *flags)
> >>>>>> +{
> >>>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >>>>>> +    struct hvm_ioreq_server *s;
> >>>>>> +
> >>>>>> +    spin_lock(&p2m->ioreq.lock);
> >>>>>> +
> >>>>>> +    s = p2m->ioreq.server;
> >>>>>> +    *flags = p2m->ioreq.flags;
> >>>>>> +
> >>>>>> +    spin_unlock(&p2m->ioreq.lock);
> >>>>>> +    return s;
> >>>>>> +}
> >>>>> I'm afraid this question was asked before, but since there's no
> >>>>> comment here or anywhere, I can't recall if there was a reason why
> >>>>> s potentially being stale by the time the caller looks at it is not a
> >>>>> problem.
> >>>> Well, it is possibe that s is stale. I did not take it as a problem
> >>>> because the device model
> >>>> will later discard such io request. And I believe current
> >>>> hvm_select_ioreq_server() also
> >>>> has the same issue - the returned s should be considered to be stale, if
> >>>> the MMIO/PIO
> >>>> address is removed from the ioreq server's rangeset.
> > An enabled emulator has to be prepared to receive ioreqs for ranges it has
> unmapped since there is no domain_pause() to prevent a race.
> 
> Thank you for the reply, Paul.
> So you mean using the ioreq server lock around this process will not
> prevent this either? Why?
> 

Well, if emulation as already sampled the value on another vcpu then that emulation request may race with the range being disabled. Remember that (for good reason) the lock is not held by hvm_select_ioreq_server() and is only held by hvm_unmap_io_range_from_ioreq_server() to protect against other invocations of similar manipulation functions.

> >>>> Another thought is, if you think it is inappropriate for device model to
> >>>> do the check,
> >>>> we can use spin_lock_recursive on ioreq_server.lock to protect all the
> >>>> ioreq server select
> >>>> and release the lock after the ioreq server is sent out.
> >>> Well, let's first ask Paul as to what his perspective here is, both
> >>> specifically for this change and more generally regarding what
> >>> you say above.
> >> Paul, any suggestions on this and the above one? :)
> > Well, as I said above, the device model has to check whether it is willing to
> handle and ioreq it is passed and terminate it appropriately under all
> circumstances. There is no option for it to reject the I/O.
> > This may be ok for MMIO regions coming and going, but is there anything
> more to consider here it we change a page time from ioreq_server back to
> RAM? Clearly we need to make sure that there is no scope for I/O to that
> page being incorrectly handled during transition.
> 
> I don't think we need to worry about the p2m type change, patch 1
> prevents this. The s returned by
> p2m_get_ioreq_server() may be stale(and is then discarded by device
> model in our current code), but
> p2m type will not be stale. I agree device model has responsibility to
> do such check, but I also wonder
> if it is possible for the hypervisor to provide some kind insurance.
> 

Not in a cheap way. More locking code be added but it's likely to be convoluted and have a detrimental effect on performance.

  Paul

> Thanks
> Yu
> 
> >    Paul
> >
> >> Thanks
> >> Yu
> >>> Jan
> >>>
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-14  7:42         ` Yu Zhang
@ 2017-03-14 10:49           ` Jan Beulich
  2017-03-14 12:18             ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-14 10:49 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
> On 3/13/2017 7:24 PM, Jan Beulich wrote:
>>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>>> On 3/11/2017 12:03 AM, Jan Beulich wrote:
>>>> But there's a wider understanding issue I'm having here: What is
>>>> an "entry" here? Commonly I would assume this to refer to an
>>>> individual (4k) page, but it looks like you really mean table entry,
>>>> i.e. possibly representing a 2M or 1G page.
>>> Well, it should be an entry pointing to a 4K page(only).
>>> For p2m_ioreq_server, we shall not meet huge page. Because they are
>>> changed from p2m_ram_rw pages
>>> in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry()
>>> with PAGE_ORDER_4K specified.
>> And recombination of large pages won't ever end up hitting these?
> 
> Well, by recombination I guess you refer to the POD pages? I do not 
> think p2m_ioreq_server
> pages will be combined now, which means we do not need to worry about 
> recounting the
> p2m_ioreq_server entries while a split happens.

No, I didn't think about PoD here. But indeed I was misremembering:
We don't currently make any attempt to transparently recreate large
pages.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-14  7:42         ` Yu Zhang
@ 2017-03-14 10:51           ` Jan Beulich
  2017-03-14 12:22             ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-14 10:51 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
> So you mean change the definition of to 
> xen_dm_op_map_mem_type_to_ioreq_server
> to something like this?
> 
> struct xen_dm_op_map_mem_type_to_ioreq_server {
>      ioservid_t id;      /* IN - ioreq server id */
>      uint16_t type;      /* IN - memory type */
>      uint32_t flags;     /* IN - types of accesses to be forwarded to the
>                             ioreq server. flags with 0 means to unmap the
>                             ioreq server */
> 
>      uint64_t opaque;    /* only used for hypercall continuation, should
>                             be set to zero by the caller */
> };

Yes, with the field marked IN OUT and "should" replaced by "has to".

> If so, is there any approach in hypervisor to differentiate the first 
> call from the continued
> hypercall? Do we need some form of check on this opaque?

I don't see a need (other than - of course - the obvious upper
bound imposed here), due to ...

> And yes, not 
> playing by this
> will only harm the guest the device model controls.

... this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14 10:40               ` Paul Durrant
@ 2017-03-14 12:03                 ` Yu Zhang
  2017-03-14 13:10                   ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-14 12:03 UTC (permalink / raw)
  To: Paul Durrant, Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, zhiyuan.lv, Jun Nakajima



On 3/14/2017 6:40 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com]
>> Sent: 14 March 2017 09:53
>> To: Paul Durrant <Paul.Durrant@citrix.com>; Jan Beulich
>> <JBeulich@suse.com>
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
>> <George.Dunlap@citrix.com>; Jun Nakajima <jun.nakajima@intel.com>;
>> Kevin Tian <kevin.tian@intel.com>; zhiyuan.lv@intel.com; xen-
>> devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>
>> Subject: Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram
>> with p2m_ioreq_server to an ioreq server.
>>
>>
>>
>> On 3/14/2017 5:40 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>> [snip]
>>>>>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain
>> *d,
>>>>>>>> +                                              unsigned int *flags)
>>>>>>>> +{
>>>>>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>>>>>> +    struct hvm_ioreq_server *s;
>>>>>>>> +
>>>>>>>> +    spin_lock(&p2m->ioreq.lock);
>>>>>>>> +
>>>>>>>> +    s = p2m->ioreq.server;
>>>>>>>> +    *flags = p2m->ioreq.flags;
>>>>>>>> +
>>>>>>>> +    spin_unlock(&p2m->ioreq.lock);
>>>>>>>> +    return s;
>>>>>>>> +}
>>>>>>> I'm afraid this question was asked before, but since there's no
>>>>>>> comment here or anywhere, I can't recall if there was a reason why
>>>>>>> s potentially being stale by the time the caller looks at it is not a
>>>>>>> problem.
>>>>>> Well, it is possibe that s is stale. I did not take it as a problem
>>>>>> because the device model
>>>>>> will later discard such io request. And I believe current
>>>>>> hvm_select_ioreq_server() also
>>>>>> has the same issue - the returned s should be considered to be stale, if
>>>>>> the MMIO/PIO
>>>>>> address is removed from the ioreq server's rangeset.
>>> An enabled emulator has to be prepared to receive ioreqs for ranges it has
>> unmapped since there is no domain_pause() to prevent a race.
>>
>> Thank you for the reply, Paul.
>> So you mean using the ioreq server lock around this process will not
>> prevent this either? Why?
>>
> Well, if emulation as already sampled the value on another vcpu then that emulation request may race with the range being disabled. Remember that (for good reason) the lock is not held by hvm_select_ioreq_server() and is only held by hvm_unmap_io_range_from_ioreq_server() to protect against other invocations of similar manipulation functions.

Oh. So that's why you mentioned the domain_pause() to prevent the race, 
right?
>>>>>> Another thought is, if you think it is inappropriate for device model to
>>>>>> do the check,
>>>>>> we can use spin_lock_recursive on ioreq_server.lock to protect all the
>>>>>> ioreq server select
>>>>>> and release the lock after the ioreq server is sent out.
>>>>> Well, let's first ask Paul as to what his perspective here is, both
>>>>> specifically for this change and more generally regarding what
>>>>> you say above.
>>>> Paul, any suggestions on this and the above one? :)
>>> Well, as I said above, the device model has to check whether it is willing to
>> handle and ioreq it is passed and terminate it appropriately under all
>> circumstances. There is no option for it to reject the I/O.
>>> This may be ok for MMIO regions coming and going, but is there anything
>> more to consider here it we change a page time from ioreq_server back to
>> RAM? Clearly we need to make sure that there is no scope for I/O to that
>> page being incorrectly handled during transition.
>>
>> I don't think we need to worry about the p2m type change, patch 1
>> prevents this. The s returned by
>> p2m_get_ioreq_server() may be stale(and is then discarded by device
>> model in our current code), but
>> p2m type will not be stale. I agree device model has responsibility to
>> do such check, but I also wonder
>> if it is possible for the hypervisor to provide some kind insurance.
>>
> Not in a cheap way. More locking code be added but it's likely to be convoluted and have a detrimental effect on performance.

Yep. Sounds reasonable. :)

So, Jan. Could we draw a conclusion here, that although the selected 
ioreq server may be stale, but
it should be the device model to do  the check?

Thanks
Yu

>    Paul
>
>> Thanks
>> Yu
>>
>>>     Paul
>>>
>>>> Thanks
>>>> Yu
>>>>> Jan
>>>>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-14 10:49           ` Jan Beulich
@ 2017-03-14 12:18             ` Yu Zhang
  2017-03-14 13:11               ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-14 12:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/14/2017 6:49 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/13/2017 7:24 PM, Jan Beulich wrote:
>>>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 3/11/2017 12:03 AM, Jan Beulich wrote:
>>>>> But there's a wider understanding issue I'm having here: What is
>>>>> an "entry" here? Commonly I would assume this to refer to an
>>>>> individual (4k) page, but it looks like you really mean table entry,
>>>>> i.e. possibly representing a 2M or 1G page.
>>>> Well, it should be an entry pointing to a 4K page(only).
>>>> For p2m_ioreq_server, we shall not meet huge page. Because they are
>>>> changed from p2m_ram_rw pages
>>>> in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry()
>>>> with PAGE_ORDER_4K specified.
>>> And recombination of large pages won't ever end up hitting these?
>> Well, by recombination I guess you refer to the POD pages? I do not
>> think p2m_ioreq_server
>> pages will be combined now, which means we do not need to worry about
>> recounting the
>> p2m_ioreq_server entries while a split happens.
> No, I didn't think about PoD here. But indeed I was misremembering:
> We don't currently make any attempt to transparently recreate large
> pages.

Thanks Jan. So do you now think the current counting logic for 
p2m_ioreq_server is correct? :-)

Yu
> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-14 10:51           ` Jan Beulich
@ 2017-03-14 12:22             ` Yu Zhang
  2017-03-14 13:12               ` Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: Yu Zhang @ 2017-03-14 12:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/14/2017 6:51 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>> So you mean change the definition of to
>> xen_dm_op_map_mem_type_to_ioreq_server
>> to something like this?
>>
>> struct xen_dm_op_map_mem_type_to_ioreq_server {
>>       ioservid_t id;      /* IN - ioreq server id */
>>       uint16_t type;      /* IN - memory type */
>>       uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>                              ioreq server. flags with 0 means to unmap the
>>                              ioreq server */
>>
>>       uint64_t opaque;    /* only used for hypercall continuation, should
>>                              be set to zero by the caller */
>> };
> Yes, with the field marked IN OUT and "should" replaced by "has to".

Got it. BTW, I can define this opaque field in patch 2/5 - the 
introduction of this dm_op,
or add it in this patch 5/5 when we use it in the hypercall 
continuation. Which one do
you incline?

>> If so, is there any approach in hypervisor to differentiate the first
>> call from the continued
>> hypercall? Do we need some form of check on this opaque?
> I don't see a need (other than - of course - the obvious upper
> bound imposed here), due to ...
>
>> And yes, not
>> playing by this
>> will only harm the guest the device model controls.
> ... this.

OK. I'm fine with this too. :-)

Thanks
Yu

> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14 12:03                 ` Yu Zhang
@ 2017-03-14 13:10                   ` Jan Beulich
  2017-03-14 13:28                     ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-14 13:10 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 14.03.17 at 13:03, <yu.c.zhang@linux.intel.com> wrote:
> So, Jan. Could we draw a conclusion here, that although the selected 
> ioreq server may be stale, but
> it should be the device model to do  the check?

Yes. In order for this to not come up yet another time, please add
a comment though.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-14 12:18             ` Yu Zhang
@ 2017-03-14 13:11               ` Jan Beulich
  2017-03-14 13:29                 ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-14 13:11 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima

>>> On 14.03.17 at 13:18, <yu.c.zhang@linux.intel.com> wrote:

> 
> On 3/14/2017 6:49 PM, Jan Beulich wrote:
>>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>>> On 3/13/2017 7:24 PM, Jan Beulich wrote:
>>>>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>>>>> On 3/11/2017 12:03 AM, Jan Beulich wrote:
>>>>>> But there's a wider understanding issue I'm having here: What is
>>>>>> an "entry" here? Commonly I would assume this to refer to an
>>>>>> individual (4k) page, but it looks like you really mean table entry,
>>>>>> i.e. possibly representing a 2M or 1G page.
>>>>> Well, it should be an entry pointing to a 4K page(only).
>>>>> For p2m_ioreq_server, we shall not meet huge page. Because they are
>>>>> changed from p2m_ram_rw pages
>>>>> in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry()
>>>>> with PAGE_ORDER_4K specified.
>>>> And recombination of large pages won't ever end up hitting these?
>>> Well, by recombination I guess you refer to the POD pages? I do not
>>> think p2m_ioreq_server
>>> pages will be combined now, which means we do not need to worry about
>>> recounting the
>>> p2m_ioreq_server entries while a split happens.
>> No, I didn't think about PoD here. But indeed I was misremembering:
>> We don't currently make any attempt to transparently recreate large
>> pages.
> 
> Thanks Jan. So do you now think the current counting logic for 
> p2m_ioreq_server is correct? :-)

Yes. But please make sure you mention the 4k page dependency
at least in the commit message.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-14 12:22             ` Yu Zhang
@ 2017-03-14 13:12               ` Jan Beulich
  2017-03-14 13:29                 ` Yu Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2017-03-14 13:12 UTC (permalink / raw)
  To: Yu Zhang
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel

>>> On 14.03.17 at 13:22, <yu.c.zhang@linux.intel.com> wrote:
> On 3/14/2017 6:51 PM, Jan Beulich wrote:
>>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>>> So you mean change the definition of to
>>> xen_dm_op_map_mem_type_to_ioreq_server
>>> to something like this?
>>>
>>> struct xen_dm_op_map_mem_type_to_ioreq_server {
>>>       ioservid_t id;      /* IN - ioreq server id */
>>>       uint16_t type;      /* IN - memory type */
>>>       uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>                              ioreq server. flags with 0 means to unmap the
>>>                              ioreq server */
>>>
>>>       uint64_t opaque;    /* only used for hypercall continuation, should
>>>                              be set to zero by the caller */
>>> };
>> Yes, with the field marked IN OUT and "should" replaced by "has to".
> 
> Got it. BTW, I can define this opaque field in patch 2/5 - the 
> introduction of this dm_op,
> or add it in this patch 5/5 when we use it in the hypercall 
> continuation. Which one do
> you incline?

I'd prefer if you added it right away, i.e. in patch 2.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server.
  2017-03-14 13:10                   ` Jan Beulich
@ 2017-03-14 13:28                     ` Yu Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-14 13:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/14/2017 9:10 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 13:03, <yu.c.zhang@linux.intel.com> wrote:
>> So, Jan. Could we draw a conclusion here, that although the selected
>> ioreq server may be stale, but
>> it should be the device model to do  the check?
> Yes. In order for this to not come up yet another time, please add
> a comment though.

Got it. I should have done this earlier. :-)

Yu
> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries.
  2017-03-14 13:11               ` Jan Beulich
@ 2017-03-14 13:29                 ` Yu Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-14 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, George Dunlap, Andrew Cooper, xen-devel,
	Paul Durrant, zhiyuan.lv, Jun Nakajima



On 3/14/2017 9:11 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 13:18, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/14/2017 6:49 PM, Jan Beulich wrote:
>>>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>>>> On 3/13/2017 7:24 PM, Jan Beulich wrote:
>>>>>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> On 3/11/2017 12:03 AM, Jan Beulich wrote:
>>>>>>> But there's a wider understanding issue I'm having here: What is
>>>>>>> an "entry" here? Commonly I would assume this to refer to an
>>>>>>> individual (4k) page, but it looks like you really mean table entry,
>>>>>>> i.e. possibly representing a 2M or 1G page.
>>>>>> Well, it should be an entry pointing to a 4K page(only).
>>>>>> For p2m_ioreq_server, we shall not meet huge page. Because they are
>>>>>> changed from p2m_ram_rw pages
>>>>>> in set_mem_type() -> p2m_change_type_one(), which calls p2m_set_entry()
>>>>>> with PAGE_ORDER_4K specified.
>>>>> And recombination of large pages won't ever end up hitting these?
>>>> Well, by recombination I guess you refer to the POD pages? I do not
>>>> think p2m_ioreq_server
>>>> pages will be combined now, which means we do not need to worry about
>>>> recounting the
>>>> p2m_ioreq_server entries while a split happens.
>>> No, I didn't think about PoD here. But indeed I was misremembering:
>>> We don't currently make any attempt to transparently recreate large
>>> pages.
>> Thanks Jan. So do you now think the current counting logic for
>> p2m_ioreq_server is correct? :-)
> Yes. But please make sure you mention the 4k page dependency
> at least in the commit message.

OK. Will do.

Thanks
Yu
> Jan
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps.
  2017-03-14 13:12               ` Jan Beulich
@ 2017-03-14 13:29                 ` Yu Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-14 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Paul Durrant, zhiyuan.lv, xen-devel



On 3/14/2017 9:12 PM, Jan Beulich wrote:
>>>> On 14.03.17 at 13:22, <yu.c.zhang@linux.intel.com> wrote:
>> On 3/14/2017 6:51 PM, Jan Beulich wrote:
>>>>>> On 14.03.17 at 08:42, <yu.c.zhang@linux.intel.com> wrote:
>>>> So you mean change the definition of to
>>>> xen_dm_op_map_mem_type_to_ioreq_server
>>>> to something like this?
>>>>
>>>> struct xen_dm_op_map_mem_type_to_ioreq_server {
>>>>        ioservid_t id;      /* IN - ioreq server id */
>>>>        uint16_t type;      /* IN - memory type */
>>>>        uint32_t flags;     /* IN - types of accesses to be forwarded to the
>>>>                               ioreq server. flags with 0 means to unmap the
>>>>                               ioreq server */
>>>>
>>>>        uint64_t opaque;    /* only used for hypercall continuation, should
>>>>                               be set to zero by the caller */
>>>> };
>>> Yes, with the field marked IN OUT and "should" replaced by "has to".
>> Got it. BTW, I can define this opaque field in patch 2/5 - the
>> introduction of this dm_op,
>> or add it in this patch 5/5 when we use it in the hypercall
>> continuation. Which one do
>> you incline?
> I'd prefer if you added it right away, i.e. in patch 2.

Got it. Thank you, Jan. :)

Yu

> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type.
@ 2017-03-08 13:32 Yu Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yu Zhang @ 2017-03-08 13:32 UTC (permalink / raw)
  To: xen-devel; +Cc: zhiyuan.lv

XenGT leverages ioreq server to track and forward the accesses to GPU 
I/O resources, e.g. the PPGTT(per-process graphic translation tables).
Currently, ioreq server uses rangeset to track the BDF/ PIO/MMIO ranges
to be emulated. To select an ioreq server, the rangeset is searched to
see if the I/O range is recorded. However, number of ram pages to be
tracked may exceed the upper limit of rangeset.

Previously, one solution was proposed to refactor the rangeset, and 
extend its upper limit. However, after 12 rounds discussion, we have
decided to drop this approach due to security concerns. Now this new 
patch series introduces a new mem type, HVMMEM_ioreq_server, and added
hvm operations to let one ioreq server to claim its ownership of ram 
pages with this type. Accesses to a page of this type will be handled
by the specified ioreq server directly. 

Yu Zhang (5):
  x86/ioreq server: Release the p2m lock after mmio is handled.
  x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to
    an ioreq server.
  x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server
    pages.
  ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server
    entries.
  x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server
    entries when an ioreq server unmaps.

 xen/arch/x86/hvm/dm.c            |  79 +++++++++++++++++++++++--
 xen/arch/x86/hvm/emulate.c       |  99 ++++++++++++++++++++++++++++++-
 xen/arch/x86/hvm/hvm.c           |   8 +--
 xen/arch/x86/hvm/ioreq.c         |  46 +++++++++++++++
 xen/arch/x86/mm/hap/hap.c        |   9 +++
 xen/arch/x86/mm/hap/nested_hap.c |   2 +-
 xen/arch/x86/mm/p2m-ept.c        |  16 ++++-
 xen/arch/x86/mm/p2m-pt.c         |  33 ++++++++---
 xen/arch/x86/mm/p2m.c            | 123 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/multi.c   |   3 +-
 xen/include/asm-x86/hvm/ioreq.h  |   2 +
 xen/include/asm-x86/p2m.h        |  40 +++++++++++--
 xen/include/public/hvm/dm_op.h   |  29 ++++++++-
 xen/include/public/hvm/hvm_op.h  |   8 ++-
 14 files changed, 463 insertions(+), 34 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2017-03-14 13:29 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08 15:33 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang
2017-03-08 15:33 ` [PATCH v7 1/5] x86/ioreq server: Release the p2m lock after mmio is handled Yu Zhang
2017-03-08 15:33 ` [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram with p2m_ioreq_server to an ioreq server Yu Zhang
2017-03-10 15:29   ` Jan Beulich
2017-03-11  8:42     ` Yu Zhang
2017-03-13 11:20       ` Jan Beulich
2017-03-14  7:28         ` Yu Zhang
2017-03-14  9:40           ` Paul Durrant
2017-03-14  9:52             ` Yu Zhang
2017-03-14 10:40               ` Paul Durrant
2017-03-14 12:03                 ` Yu Zhang
2017-03-14 13:10                   ` Jan Beulich
2017-03-14 13:28                     ` Yu Zhang
2017-03-14 10:26           ` Jan Beulich
2017-03-08 15:33 ` [PATCH v7 3/5] x86/ioreq server: Handle read-modify-write cases for p2m_ioreq_server pages Yu Zhang
2017-03-10 15:33   ` Jan Beulich
2017-03-11  8:42     ` Yu Zhang
2017-03-13 11:22       ` Jan Beulich
2017-03-14  7:28         ` Yu Zhang
2017-03-08 15:33 ` [PATCH v7 4/5] ix86/ioreq server: Asynchronously reset outstanding p2m_ioreq_server entries Yu Zhang
2017-03-10 16:03   ` Jan Beulich
2017-03-11  8:42     ` Yu Zhang
2017-03-13 11:24       ` Jan Beulich
2017-03-14  7:42         ` Yu Zhang
2017-03-14 10:49           ` Jan Beulich
2017-03-14 12:18             ` Yu Zhang
2017-03-14 13:11               ` Jan Beulich
2017-03-14 13:29                 ` Yu Zhang
2017-03-08 15:33 ` [PATCH v7 5/5] x86/ioreq server: Synchronously reset outstanding p2m_ioreq_server entries when an ioreq server unmaps Yu Zhang
2017-03-10 16:17   ` Jan Beulich
2017-03-11  8:42     ` Yu Zhang
2017-03-13 11:24       ` Jan Beulich
2017-03-10 16:59   ` Andrew Cooper
2017-03-11  8:42     ` Yu Zhang
2017-03-13 11:32       ` Jan Beulich
2017-03-14  7:42         ` Yu Zhang
2017-03-14 10:51           ` Jan Beulich
2017-03-14 12:22             ` Yu Zhang
2017-03-14 13:12               ` Jan Beulich
2017-03-14 13:29                 ` Yu Zhang
  -- strict thread matches above, loose matches on Subject: below --
2017-03-08 13:32 [PATCH v7 0/5] x86/ioreq server: Introduce HVMMEM_ioreq_server mem type Yu Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.