Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-07-28 17:55 [PATCH v1 0/2] Extend ioreq-server to support page write protection Wei Ye
@ 2014-07-28  8:24 ` Jan Beulich
  2014-08-04  5:05   ` Ye, Wei
  2014-07-28 17:55 ` [PATCH v1 1/2] x86: add p2m_ram_wp Wei Ye
  2014-07-28 17:55 ` [PATCH v1 2/2] ioreq-server: Support scatter page forwarding Wei Ye
  2 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-07-28  8:24 UTC (permalink / raw)
  To: Wei Ye
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, xen-devel,
	Paul.Durrant

>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> ioreq-server is proposed to forward PIO and MMIO request to multiple device
> models according to the io range. XenGT (Intel Graphics Virtualization 
> technology,
> please refer to 
> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization- 
> xengt) driver reside in Dom0 as a virtual graphics device model also need to 
> trap
> and emulate the guest's write operation to some specific memory pages, like 
> the
> memory pages used by guest graphics driver as PPGTT(per-process graphics 
> translation table). We add an new p2m type "p2m_ram_wp" to trap the page 
> write
> operation. Page of this new p2m type is read only and for write, the request 
> will
> go to device model via ioreq-server.

So how is this write-protection supposed to work on the IOMMU side
when sharing page tables?

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/2] x86: add p2m_ram_wp
  2014-07-28 17:55 ` [PATCH v1 1/2] x86: add p2m_ram_wp Wei Ye
@ 2014-07-28  8:31   ` Jan Beulich
  2014-08-04  5:10     ` Ye, Wei
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-07-28  8:31 UTC (permalink / raw)
  To: Wei Ye
  Cc: keir, ian.campbell, stefano.stabellini, Tim Deegan, ian.jackson,
	xen-devel, Paul.Durrant

>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>  xen/arch/x86/hvm/hvm.c    |    8 +++++++-
>  xen/arch/x86/mm/p2m-ept.c |    1 +
>  xen/include/asm-x86/p2m.h |    8 +++++++-
>  3 files changed, 15 insertions(+), 2 deletions(-)

This can't be complete: At the very least p2m-pt.c also needs a change
similar to the one to p2m-ept.c.

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2738,7 +2738,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>       * If this GFN is emulated MMIO or marked as read-only, pass the fault
>       * to the mmio handler.
>       */
> -    if ( (p2mt == p2m_mmio_dm) || 
> +    if ( (p2mt == p2m_mmio_dm) ||
> +         (p2mt == p2m_ram_wp) || 
>           (access_w && (p2mt == p2m_ram_ro)) )

Comparing your change with the surrounding existing code, you
would pass even reads happening to fault (perhaps for another
reason) to the DM, other than done for p2m_ram_ro. I don't think
that's correct.

> @@ -3829,6 +3830,11 @@ static enum hvm_copy_result __hvm_copy(
>              put_page(page);
>              return HVMCOPY_unhandleable;
>          }
> +        if ( p2m_is_wp_ram(p2mt) )
> +        {
> +            put_page(page);
> +            return HVMCOPY_bad_gfn_to_mfn;
> +        }

And again this change can't be complete: __hvm_clear() would also
need a similar change.

> @@ -167,6 +169,9 @@ typedef unsigned int p2m_query_t;
>   * and must not be touched. */
>  #define P2M_BROKEN_TYPES (p2m_to_mask(p2m_ram_broken))
>  
> +/* Write protection types */
> +#define P2M_WP_TYPES (p2m_to_mask(p2m_ram_wp))
> +
>  /* Useful predicates */
>  #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES)
>  #define p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES)
> @@ -191,6 +196,7 @@ typedef unsigned int p2m_query_t;
>  #define p2m_is_any_ram(_t)  (p2m_to_mask(_t) &                   \
>                               (P2M_RAM_TYPES | P2M_GRANT_TYPES |  \
>                                p2m_to_mask(p2m_map_foreign)))
> +#define p2m_is_wp_ram(_t)   (p2m_to_mask(_t) & P2M_WP_TYPES)

To me such single-type classes don't seem very useful. Agreed
there are a number of pre-existing ones, but for classes where
future extensions are rather hard to imagine I wouldn't needlessly
add classes right away. But Tim (whom you failed to Cc anyway)
will have the final say here.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
  2014-07-28 17:55 ` [PATCH v1 2/2] ioreq-server: Support scatter page forwarding Wei Ye
@ 2014-07-28  8:57   ` Jan Beulich
  2014-08-04  5:41     ` Ye, Wei
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-07-28  8:57 UTC (permalink / raw)
  To: Wei Ye
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, xen-devel,
	Paul.Durrant

>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> +static struct pglist_hash_table *hvm_ioreq_server_lookup_pglist_hash_table(
> +                          struct pglist_hash_table *pglist_ht, unsigned long gpfn)
> +{
> +    int index = pglist_hash(gpfn);

Here and elsewhere: "unsigned int" for the hash index.

> +static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table *chain)
> +{
> +    struct pglist_hash_table *p = chain;
> +    struct pglist_hash_table *n;
> +
> +    while (p)

Coding style.

> +static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table *pglist_ht,
> +                                            unsigned long gpfn)
> +{
> +    int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *ne;
> +
> +    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) != NULL )
> +        return -EINVAL;
> +
> +    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
> +        pglist_ht[index].gpfn = gpfn;
> +    else
> +    {
> +        ne = xmalloc_bytes(sizeof(pglist_ht[0]));

xmalloc() please.

> +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table *pglist_ht,
> +                                            unsigned long gpfn)
> +{
> +    int index = pglist_hash(gpfn);
> +    struct pglist_hash_table *next, *prev;
> +
> +    if ( pglist_ht[index].gpfn == gpfn )
> +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
> +    else
> +    {
> +        prev = &pglist_ht[index];
> +        while ( 1 )
> +        {
> +            next = prev->next;
> +            if ( !next )
> +            {
> +                printk("hvm_ioreq_server_pglist_hash_rem hash_table %p remove"
> +                       " %lx not found\n", pglist_ht, gpfn);

The value of this log message is questionable anyway, but there's no
way for this to be acceptable without a suitably low guest level being
set.

> @@ -948,7 +1040,14 @@ static int hvm_ioreq_server_init(struct 
> hvm_ioreq_server *s, struct domain *d,
>      spin_lock_init(&s->lock);
>      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
>      spin_lock_init(&s->bufioreq_lock);
> +    spin_lock_init(&s->pglist_hash_lock);
>  
> +    rc = -ENOMEM;
> +    s->pglist_hash_base = xmalloc_bytes(PGLIST_HASH_ENTRY_SIZE * PGLIST_HASH_SIZE);

xmalloc_array() please.

> +    if ( s->pglist_hash_base == NULL )
> +        goto fail1;
> +    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE * PGLIST_HASH_SIZE);

And together with this, xzalloc_array().

> +static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                        uint16_t set, uint16_t nr_pages,
> +                                        unsigned long *gpfn)
> +{
> +    struct hvm_ioreq_server *s;
> +    int rc;
> +    unsigned int i;
> +    p2m_type_t ot, nt;
> +
> +    if ( set )
> +    {
> +        ot = p2m_ram_rw;
> +        nt = p2m_ram_wp;
> +    }
> +    else
> +    {
> +        ot = p2m_ram_wp;
> +        nt = p2m_ram_rw;
> +    }
> +
> +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( s,
> +                          &d->arch.hvm_domain.ioreq_server.list,
> +                          list_entry )
> +    {
> +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> +            continue;
> +
> +        if ( s->id == id )
> +        {

Consistency please: Either both if()-s use "continue" (and perhaps get
combined into one), or (less desirable imo) both get combined to use
a {} body.

> +            spin_lock(&s->pglist_hash_lock);
> +
> +            for ( i = 0; i < nr_pages; i++ )
> +            {
> +                p2m_change_type_one(d, gpfn[i], ot, nt);

Error handling?

> +                if ( set )
> +                    rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base, gpfn[i]);
> +                else
> +                    rc = hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base, gpfn[i]);
> +
> +                 if ( rc != 0 )
> +                     break;
> +            }
> +
> +            spin_unlock(&s->pglist_hash_lock);
> +            flush_tlb_mask(d->domain_dirty_cpumask);

Why?

> @@ -2294,6 +2453,8 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>      uint32_t cf8;
>      uint8_t type;
>      uint64_t addr;
> +    unsigned long gpfn;
> +    struct pglist_hash_table *he;

These ought to be moved down into a more narrow scope.

> @@ -2361,6 +2522,15 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>              if ( rangeset_contains_range(r, addr, end) )
>                  return s;
>  
> +            gpfn = addr >> PAGE_SHIFT;
> +            spin_lock(&s->pglist_hash_lock);
> +            he = hvm_ioreq_server_lookup_pglist_hash_table(s->pglist_hash_base, gpfn);
> +            spin_unlock(&s->pglist_hash_lock);
> +
> +            if ( he ) {

Coding style again.

> +                return s;
> +            }
> +

And now the most fundamental question: Why is the existing (range
set based) mechanism not enough? Is that because we limit the
number of entries permitted on the range set? If so, the obvious
question is - why is there no limit being enforced for your new type
of pages?

> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -48,6 +48,16 @@ struct hvm_ioreq_vcpu {
>      evtchn_port_t    ioreq_evtchn;
>  };
>  
> +struct pglist_hash_table {
> +    struct pglist_hash_table *next;
> +    unsigned long gpfn;
> +};
> +#define PGLIST_HASH_SIZE_SHIFT    8
> +#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
> +#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
> +#define PGLIST_INVALID_GPFN       0
> +#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)

None of this has any point in being in a header: There's only a
single source file using these, so they should be private to that
one.

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state {
>  typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>  
> +#define HVMOP_map_pages_to_ioreq_server 23
> +#define MAX_MAP_BATCH_PAGES             128

Too generic a name.

> +struct xen_hvm_map_pages_to_ioreq_server {
> +    domid_t domid;   /* IN - domain to be serviced */
> +    ioservid_t id;   /* IN - server id */
> +    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
> +    uint16_t nr_pages;  /* IN - page number */

"page count" or "number of pages".

> +    unsigned long page_list[MAX_MAP_BATCH_PAGES]; /* IN - gpfn list */

No "unsigned long" in public interfaces. And no variable width types
in interfaces not being subject to compat mode translation.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v1 0/2] Extend ioreq-server to support page write protection
@ 2014-07-28 17:55 Wei Ye
  2014-07-28  8:24 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Wei Ye @ 2014-07-28 17:55 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	Paul.Durrant, jbeulich, Wei Ye

ioreq-server is proposed to forward PIO and MMIO request to multiple device
models according to the io range. XenGT (Intel Graphics Virtualization technology,
please refer to https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
xengt) driver reside in Dom0 as a virtual graphics device model also need to trap
and emulate the guest's write operation to some specific memory pages, like the
memory pages used by guest graphics driver as PPGTT(per-process graphics 
translation table). We add an new p2m type "p2m_ram_wp" to trap the page write
operation. Page of this new p2m type is read only and for write, the request will
go to device model via ioreq-server.

Wei Ye (2):
  x86: add p2m_ram_wp
  ioreq-server: Support scatter page forwarding

 tools/libxc/xc_domain.c          |   32 ++++++
 tools/libxc/xenctrl.h            |   18 ++++
 xen/arch/x86/hvm/hvm.c           |  214 +++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/mm/p2m-ept.c        |    1 +
 xen/include/asm-x86/hvm/domain.h |   14 +++
 xen/include/asm-x86/p2m.h        |    8 +-
 xen/include/public/hvm/hvm_op.h  |   12 +++
 7 files changed, 296 insertions(+), 3 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v1 1/2] x86: add p2m_ram_wp
  2014-07-28 17:55 [PATCH v1 0/2] Extend ioreq-server to support page write protection Wei Ye
  2014-07-28  8:24 ` Jan Beulich
@ 2014-07-28 17:55 ` Wei Ye
  2014-07-28  8:31   ` Jan Beulich
  2014-07-28 17:55 ` [PATCH v1 2/2] ioreq-server: Support scatter page forwarding Wei Ye
  2 siblings, 1 reply; 46+ messages in thread
From: Wei Ye @ 2014-07-28 17:55 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	Paul.Durrant, jbeulich, Wei Ye

Add a new p2m type p2m_ram_wp. Page of p2m_ram_wp is read only, and
write will go to the device model for emulation.

Signed-off-by: Wei Ye <wei.ye@intel.com>
---
 xen/arch/x86/hvm/hvm.c    |    8 +++++++-
 xen/arch/x86/mm/p2m-ept.c |    1 +
 xen/include/asm-x86/p2m.h |    8 +++++++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 17ff011..0f20b62 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2738,7 +2738,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
      * If this GFN is emulated MMIO or marked as read-only, pass the fault
      * to the mmio handler.
      */
-    if ( (p2mt == p2m_mmio_dm) || 
+    if ( (p2mt == p2m_mmio_dm) ||
+         (p2mt == p2m_ram_wp) || 
          (access_w && (p2mt == p2m_ram_ro)) )
     {
         put_gfn(p2m->domain, gfn);
@@ -3829,6 +3830,11 @@ static enum hvm_copy_result __hvm_copy(
             put_page(page);
             return HVMCOPY_unhandleable;
         }
+        if ( p2m_is_wp_ram(p2mt) )
+        {
+            put_page(page);
+            return HVMCOPY_bad_gfn_to_mfn;
+        }
 
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 15c6e83..1b4a83e 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -125,6 +125,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
                                                     entry->mfn);
             break;
         case p2m_ram_logdirty:
+        case p2m_ram_wp:
         case p2m_ram_ro:
         case p2m_ram_shared:
             entry->r = entry->x = 1;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 0ddbadb..bbec847 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -71,6 +71,7 @@ typedef enum {
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
     p2m_map_foreign  = 14,        /* ram pages from foreign domain */
+    p2m_ram_wp = 15,              /* Read-only; write go to the device model */
 } p2m_type_t;
 
 /*
@@ -116,7 +117,8 @@ typedef unsigned int p2m_query_t;
                        | p2m_to_mask(p2m_ram_paging_out)      \
                        | p2m_to_mask(p2m_ram_paged)           \
                        | p2m_to_mask(p2m_ram_paging_in)       \
-                       | p2m_to_mask(p2m_ram_shared))
+                       | p2m_to_mask(p2m_ram_shared)          \
+                       | p2m_to_mask(p2m_ram_wp))
 
 /* Types that represent a physmap hole that is ok to replace with a shared
  * entry */
@@ -167,6 +169,9 @@ typedef unsigned int p2m_query_t;
  * and must not be touched. */
 #define P2M_BROKEN_TYPES (p2m_to_mask(p2m_ram_broken))
 
+/* Write protection types */
+#define P2M_WP_TYPES (p2m_to_mask(p2m_ram_wp))
+
 /* Useful predicates */
 #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES)
 #define p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES)
@@ -191,6 +196,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_any_ram(_t)  (p2m_to_mask(_t) &                   \
                              (P2M_RAM_TYPES | P2M_GRANT_TYPES |  \
                               p2m_to_mask(p2m_map_foreign)))
+#define p2m_is_wp_ram(_t)   (p2m_to_mask(_t) & P2M_WP_TYPES)
 
 /* Per-p2m-table state */
 struct p2m_domain {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
  2014-07-28 17:55 [PATCH v1 0/2] Extend ioreq-server to support page write protection Wei Ye
  2014-07-28  8:24 ` Jan Beulich
  2014-07-28 17:55 ` [PATCH v1 1/2] x86: add p2m_ram_wp Wei Ye
@ 2014-07-28 17:55 ` Wei Ye
  2014-07-28  8:57   ` Jan Beulich
  2 siblings, 1 reply; 46+ messages in thread
From: Wei Ye @ 2014-07-28 17:55 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson,
	Paul.Durrant, jbeulich, Wei Ye

Extend the interface to support add/remove scatter page list to be
forwarded by a dedicated ioreq-server instance. Check and select
a right ioreq-server instance for forwarding the wirte operation
for a write protected page.

Signed-off-by: Wei Ye <wei.ye@intel.com>
---
 tools/libxc/xc_domain.c          |   32 ++++++
 tools/libxc/xenctrl.h            |   18 ++++
 xen/arch/x86/hvm/hvm.c           |  206 +++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/domain.h |   14 +++
 xen/include/public/hvm/hvm_op.h  |   12 +++
 5 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 37ed141..9afc843 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1485,6 +1485,38 @@ int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch, domid_t domid,
     return rc;
 }
 
+int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch, domid_t domid,
+                                     ioservid_t id, uint16_t set,
+                                     uint16_t nr_pages, unsigned long *gpfn)
+{
+    DECLARE_HYPERCALL;
+    DECLARE_HYPERCALL_BUFFER(xen_hvm_map_pages_to_ioreq_server_t, arg);
+    int pg, rc = -1;
+
+    if ( arg == NULL )
+        return -1;
+    
+    if (nr_pages > MAX_MAP_BATCH_PAGES)
+        goto out;
+
+    hypercall.op     = __HYPERVISOR_hvm_op;
+    hypercall.arg[0] = HVMOP_map_pages_to_ioreq_server;
+    hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+    arg->domid = domid;
+    arg->id = id;
+    arg->set = set;
+    arg->nr_pages = nr_pages;
+    for ( pg = 0; pg < nr_pages; pg++ )
+        arg->page_list[pg] = gpfn[pg];
+
+    rc = do_xen_hypercall(xch, &hypercall);
+
+out:
+    xc_hypercall_buffer_free(xch, arg);
+    return rc;
+}
+
 int xc_hvm_destroy_ioreq_server(xc_interface *xch,
                                 domid_t domid,
                                 ioservid_t id)
diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
index b55d857..e5e37b0 100644
--- a/tools/libxc/xenctrl.h
+++ b/tools/libxc/xenctrl.h
@@ -1943,6 +1943,24 @@ int xc_hvm_unmap_pcidev_from_ioreq_server(xc_interface *xch,
                                           uint8_t function);
 
 /**
+ * This function registers/deregisters a set of pages to be write protected.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm set whether registers or deregisters: 1 for register, 0 for deregister
+ * @parm nr_pages the number of pages
+ * @parm pgfn the guest page frame number list
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_pages_to_ioreq_server(xc_interface *xch,
+                                          domid_t domid,
+                                          ioservid_t id,
+                                          uint16_t set,
+                                          uint16_t nr_pages,
+                                          unsigned long *gpfn);
+
+/**
  * This function destroys an IOREQ Server.
  *
  * @parm xch a handle to an open hypervisor interface.
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0f20b62..44e0640 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -744,6 +744,98 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
+static struct pglist_hash_table *hvm_ioreq_server_lookup_pglist_hash_table(
+                          struct pglist_hash_table *pglist_ht, unsigned long gpfn)
+{
+    int index = pglist_hash(gpfn);
+    struct pglist_hash_table *entry;
+
+    for ( entry = &pglist_ht[index]; entry != NULL; entry = entry->next )
+        if ( entry->gpfn != PGLIST_INVALID_GPFN && entry->gpfn == gpfn )
+            break;
+
+    return entry;
+}
+
+static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table *chain)
+{
+    struct pglist_hash_table *p = chain;
+    struct pglist_hash_table *n;
+
+    while (p)
+    {
+        n = p->next;
+        xfree(p);
+        p = n;
+    }
+}
+
+static void hvm_ioreq_server_free_pglist_hash(struct pglist_hash_table *pglist_ht)
+{
+    unsigned int i;
+
+    for ( i = 0; i < PGLIST_HASH_SIZE; i++ )
+        if ( pglist_ht[i].next != NULL )
+            hvm_ioreq_server_free_hash_chain(pglist_ht[i].next);
+}
+
+static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table *pglist_ht,
+                                            unsigned long gpfn)
+{
+    int index = pglist_hash(gpfn);
+    struct pglist_hash_table *ne;
+
+    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) != NULL )
+        return -EINVAL;
+
+    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
+        pglist_ht[index].gpfn = gpfn;
+    else
+    {
+        ne = xmalloc_bytes(sizeof(pglist_ht[0]));
+        if ( !ne )
+            return -ENOMEM;
+        ne->next = pglist_ht[index].next;
+        ne->gpfn = gpfn;
+        pglist_ht[index].next = ne;
+    }
+
+    return 0;
+}
+
+static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table *pglist_ht,
+                                            unsigned long gpfn)
+{
+    int index = pglist_hash(gpfn);
+    struct pglist_hash_table *next, *prev;
+
+    if ( pglist_ht[index].gpfn == gpfn )
+        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
+    else
+    {
+        prev = &pglist_ht[index];
+        while ( 1 )
+        {
+            next = prev->next;
+            if ( !next )
+            {
+                printk("hvm_ioreq_server_pglist_hash_rem hash_table %p remove"
+                       " %lx not found\n", pglist_ht, gpfn);
+                return -EINVAL;
+            }
+            if ( next->gpfn == gpfn )
+            {
+                prev->next = next->next;
+                xfree(next);
+                break;
+            }
+            prev = next;
+         }
+    }
+
+    return 0;
+}
+
 static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s,
                                       bool_t is_default, bool_t handle_bufioreq)
 {
@@ -948,7 +1040,14 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, struct domain *d,
     spin_lock_init(&s->lock);
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
+    spin_lock_init(&s->pglist_hash_lock);
 
+    rc = -ENOMEM;
+    s->pglist_hash_base = xmalloc_bytes(PGLIST_HASH_ENTRY_SIZE * PGLIST_HASH_SIZE);
+    if ( s->pglist_hash_base == NULL )
+        goto fail1;
+    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE * PGLIST_HASH_SIZE);
+ 
     rc = hvm_ioreq_server_alloc_rangesets(s, is_default);
     if ( rc )
         goto fail1;
@@ -1087,6 +1186,9 @@ static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
         
         hvm_ioreq_server_deinit(s, 0);
 
+        hvm_ioreq_server_free_pglist_hash(s->pglist_hash_base);
+        xfree(s->pglist_hash_base);
+
         domain_unpause(d);
 
         xfree(s);
@@ -1279,6 +1381,63 @@ static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
     return rc;
 }
 
+static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t id,
+                                        uint16_t set, uint16_t nr_pages,
+                                        unsigned long *gpfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+    unsigned int i;
+    p2m_type_t ot, nt;
+
+    if ( set )
+    {
+        ot = p2m_ram_rw;
+        nt = p2m_ram_wp;
+    }
+    else
+    {
+        ot = p2m_ram_wp;
+        nt = p2m_ram_rw;
+    }
+
+    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( s,
+                          &d->arch.hvm_domain.ioreq_server.list,
+                          list_entry )
+    {
+        if ( s == d->arch.hvm_domain.default_ioreq_server )
+            continue;
+
+        if ( s->id == id )
+        {
+            spin_lock(&s->pglist_hash_lock);
+
+            for ( i = 0; i < nr_pages; i++ )
+            {
+                p2m_change_type_one(d, gpfn[i], ot, nt);
+                if ( set )
+                    rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base, gpfn[i]);
+                else
+                    rc = hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base, gpfn[i]);
+
+                 if ( rc != 0 )
+                     break;
+            }
+
+            spin_unlock(&s->pglist_hash_lock);
+            flush_tlb_mask(d->domain_dirty_cpumask);
+            break;
+        }
+    }
+
+    spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
+
+    return rc;
+}
+
 static int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 {
     struct hvm_ioreq_server *s;
@@ -2294,6 +2453,8 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     uint32_t cf8;
     uint8_t type;
     uint64_t addr;
+    unsigned long gpfn;
+    struct pglist_hash_table *he;
 
     if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
         return NULL;
@@ -2361,6 +2522,15 @@ static struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
             if ( rangeset_contains_range(r, addr, end) )
                 return s;
 
+            gpfn = addr >> PAGE_SHIFT;
+            spin_lock(&s->pglist_hash_lock);
+            he = hvm_ioreq_server_lookup_pglist_hash_table(s->pglist_hash_base, gpfn);
+            spin_unlock(&s->pglist_hash_lock);
+
+            if ( he ) {
+                return s;
+            }
+
             break;
         case IOREQ_TYPE_PCI_CONFIG:
             if ( rangeset_contains_singleton(r, addr >> 32) )
@@ -5314,6 +5484,35 @@ static int hvmop_set_ioreq_server_state(
     return rc;
 }
 
+static int hvmop_map_pages_to_ioreq_server(
+    XEN_GUEST_HANDLE_PARAM(xen_hvm_map_pages_to_ioreq_server_t) uop)
+{
+    xen_hvm_map_pages_to_ioreq_server_t op;
+    struct domain *d;
+    int rc;
+
+    if ( copy_from_guest(&op, uop, 1) )
+        return -EFAULT;
+
+    rc = rcu_lock_remote_domain_by_id(op.domid, &d);
+    if ( rc != 0 )
+        return rc;
+
+    rc = -EINVAL;
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d, HVMOP_map_pages_to_ioreq_server);
+    if ( rc != 0 )
+        goto out;
+
+    rc = hvm_map_pages_to_ioreq_server(d, op.id, op.set, op.nr_pages, op.page_list);
+
+out:
+    rcu_unlock_domain(d);
+    return rc;
+}
+
 static int hvmop_destroy_ioreq_server(
     XEN_GUEST_HANDLE_PARAM(xen_hvm_destroy_ioreq_server_t) uop)
 {
@@ -5378,7 +5577,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = hvmop_set_ioreq_server_state(
             guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t));
         break;
-    
+
+    case HVMOP_map_pages_to_ioreq_server:
+        rc = hvmop_map_pages_to_ioreq_server(
+            guest_handle_cast(arg, xen_hvm_map_pages_to_ioreq_server_t));
+        break;
+ 
     case HVMOP_destroy_ioreq_server:
         rc = hvmop_destroy_ioreq_server(
             guest_handle_cast(arg, xen_hvm_destroy_ioreq_server_t));
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 291a2e0..47172ca 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -48,6 +48,16 @@ struct hvm_ioreq_vcpu {
     evtchn_port_t    ioreq_evtchn;
 };
 
+struct pglist_hash_table {
+    struct pglist_hash_table *next;
+    unsigned long gpfn;
+};
+#define PGLIST_HASH_SIZE_SHIFT    8
+#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
+#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
+#define PGLIST_INVALID_GPFN       0
+#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)
+
 #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
 #define MAX_NR_IO_RANGES  256
 
@@ -70,6 +80,10 @@ struct hvm_ioreq_server {
     evtchn_port_t          bufioreq_evtchn;
     struct rangeset        *range[NR_IO_RANGE_TYPES];
     bool_t                 enabled;
+
+    /* scatter page list need write protection */
+    struct pglist_hash_table   *pglist_hash_base;
+    spinlock_t             pglist_hash_lock;
 };
 
 struct hvm_domain {
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index eeb0a60..53a966b 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state {
 typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t;
 DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
 
+#define HVMOP_map_pages_to_ioreq_server 23
+#define MAX_MAP_BATCH_PAGES             128
+struct xen_hvm_map_pages_to_ioreq_server {
+    domid_t domid;   /* IN - domain to be serviced */
+    ioservid_t id;   /* IN - server id */
+    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
+    uint16_t nr_pages;  /* IN - page number */
+    unsigned long page_list[MAX_MAP_BATCH_PAGES]; /* IN - gpfn list */
+};
+typedef struct xen_hvm_map_pages_to_ioreq_server xen_hvm_map_pages_to_ioreq_server_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_pages_to_ioreq_server_t);
+
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-07-28  8:24 ` Jan Beulich
@ 2014-08-04  5:05   ` Ye, Wei
  2014-08-04  7:35     ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Ye, Wei @ 2014-08-04  5:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, Paul.Durrant, Lv, Zhiyuan



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, July 28, 2014 4:25 PM
> To: Ye, Wei
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; xen-
> devel@lists.xen.org; keir@xen.org
> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page write
> protection
> 
> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> > ioreq-server is proposed to forward PIO and MMIO request to multiple
> > device models according to the io range. XenGT (Intel Graphics
> > Virtualization technology, please refer to
> > https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
> > xengt) driver reside in Dom0 as a virtual graphics device model also
> > need to trap and emulate the guest's write operation to some specific
> > memory pages, like the memory pages used by guest graphics driver as
> > PPGTT(per-process graphics translation table). We add an new p2m type
> > "p2m_ram_wp" to trap the page write operation. Page of this new p2m
> > type is read only and for write, the request will go to device model
> > via ioreq-server.
> 
> So how is this write-protection supposed to work on the IOMMU side when
> sharing page tables?

Thanks for pointing out this question. Write-protection is not supposed to work when sharing page tables between EPT and vt-d.
An explicit command line "iommu=no-sharept" should be setted for avoiding undesirable iommu fault.

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/2] x86: add p2m_ram_wp
  2014-07-28  8:31   ` Jan Beulich
@ 2014-08-04  5:10     ` Ye, Wei
  2014-08-04  7:37       ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Ye, Wei @ 2014-08-04  5:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Paul.Durrant, Lv, Zhiyuan



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, July 28, 2014 4:32 PM
> To: Ye, Wei
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; xen-
> devel@lists.xen.org; keir@xen.org; Tim Deegan
> Subject: Re: [PATCH v1 1/2] x86: add p2m_ram_wp
> 
> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >  xen/arch/x86/hvm/hvm.c    |    8 +++++++-
> >  xen/arch/x86/mm/p2m-ept.c |    1 +
> >  xen/include/asm-x86/p2m.h |    8 +++++++-
> >  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> This can't be complete: At the very least p2m-pt.c also needs a change similar
> to the one to p2m-ept.c.

Yes, got it.

> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -2738,7 +2738,8 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >       * If this GFN is emulated MMIO or marked as read-only, pass the fault
> >       * to the mmio handler.
> >       */
> > -    if ( (p2mt == p2m_mmio_dm) ||
> > +    if ( (p2mt == p2m_mmio_dm) ||
> > +         (p2mt == p2m_ram_wp) ||
> >           (access_w && (p2mt == p2m_ram_ro)) )
> 
> Comparing your change with the surrounding existing code, you
> would pass even reads happening to fault (perhaps for another
> reason) to the DM, other than done for p2m_ram_ro. I don't think
> that's correct.
> 

Yes, should also access_w like the p2m_ram_ro.

> > @@ -3829,6 +3830,11 @@ static enum hvm_copy_result __hvm_copy(
> >              put_page(page);
> >              return HVMCOPY_unhandleable;
> >          }
> > +        if ( p2m_is_wp_ram(p2mt) )
> > +        {
> > +            put_page(page);
> > +            return HVMCOPY_bad_gfn_to_mfn;
> > +        }
> 
> And again this change can't be complete: __hvm_clear() would also
> need a similar change.
> 

Got it and thanks.

> > @@ -167,6 +169,9 @@ typedef unsigned int p2m_query_t;
> >   * and must not be touched. */
> >  #define P2M_BROKEN_TYPES (p2m_to_mask(p2m_ram_broken))
> >
> > +/* Write protection types */
> > +#define P2M_WP_TYPES (p2m_to_mask(p2m_ram_wp))
> > +
> >  /* Useful predicates */
> >  #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES)
> >  #define p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES)
> > @@ -191,6 +196,7 @@ typedef unsigned int p2m_query_t;
> >  #define p2m_is_any_ram(_t)  (p2m_to_mask(_t) &                   \
> >                               (P2M_RAM_TYPES | P2M_GRANT_TYPES |  \
> >                                p2m_to_mask(p2m_map_foreign)))
> > +#define p2m_is_wp_ram(_t)   (p2m_to_mask(_t) & P2M_WP_TYPES)
> 
> To me such single-type classes don't seem very useful. Agreed
> there are a number of pre-existing ones, but for classes where
> future extensions are rather hard to imagine I wouldn't needlessly
> add classes right away. But Tim (whom you failed to Cc anyway)
> will have the final say here.
> 

How about using the existing p2m_mmio_dm, in that way, both read & write access will go to device model for emulation.

> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
  2014-07-28  8:57   ` Jan Beulich
@ 2014-08-04  5:41     ` Ye, Wei
  2014-08-04  7:47       ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Ye, Wei @ 2014-08-04  5:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Paul.Durrant, Lv, Zhiyuan



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, July 28, 2014 4:58 PM
> To: Ye, Wei
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; xen-
> devel@lists.xen.org; keir@xen.org
> Subject: Re: [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
> 
> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> > +static struct pglist_hash_table
> *hvm_ioreq_server_lookup_pglist_hash_table(
> > +                          struct pglist_hash_table *pglist_ht,
> > +unsigned long gpfn) {
> > +    int index = pglist_hash(gpfn);
> 
> Here and elsewhere: "unsigned int" for the hash index.
> 
> > +static void hvm_ioreq_server_free_hash_chain(struct pglist_hash_table
> > +*chain) {
> > +    struct pglist_hash_table *p = chain;
> > +    struct pglist_hash_table *n;
> > +
> > +    while (p)
> 
> Coding style.

Will improve, thanks.

> 
> > +static int hvm_ioreq_server_pglist_hash_add(struct pglist_hash_table
> *pglist_ht,
> > +                                            unsigned long gpfn) {
> > +    int index = pglist_hash(gpfn);
> > +    struct pglist_hash_table *ne;
> > +
> > +    if ( hvm_ioreq_server_lookup_pglist_hash_table(pglist_ht, gpfn) !=
> NULL )
> > +        return -EINVAL;
> > +
> > +    if ( pglist_ht[index].gpfn == PGLIST_INVALID_GPFN )
> > +        pglist_ht[index].gpfn = gpfn;
> > +    else
> > +    {
> > +        ne = xmalloc_bytes(sizeof(pglist_ht[0]));
> 
> xmalloc() please.

Ok.

> 
> > +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table
> *pglist_ht,
> > +                                            unsigned long gpfn) {
> > +    int index = pglist_hash(gpfn);
> > +    struct pglist_hash_table *next, *prev;
> > +
> > +    if ( pglist_ht[index].gpfn == gpfn )
> > +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
> > +    else
> > +    {
> > +        prev = &pglist_ht[index];
> > +        while ( 1 )
> > +        {
> > +            next = prev->next;
> > +            if ( !next )
> > +            {
> > +                printk("hvm_ioreq_server_pglist_hash_rem hash_table %p
> remove"
> > +                       " %lx not found\n", pglist_ht, gpfn);
> 
> The value of this log message is questionable anyway, but there's no way for
> this to be acceptable without a suitably low guest level being set.

You mean should also print out the VM id in this log?

> 
> > @@ -948,7 +1040,14 @@ static int hvm_ioreq_server_init(struct
> > hvm_ioreq_server *s, struct domain *d,
> >      spin_lock_init(&s->lock);
> >      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
> >      spin_lock_init(&s->bufioreq_lock);
> > +    spin_lock_init(&s->pglist_hash_lock);
> >
> > +    rc = -ENOMEM;
> > +    s->pglist_hash_base = xmalloc_bytes(PGLIST_HASH_ENTRY_SIZE *
> > + PGLIST_HASH_SIZE);
> 
> xmalloc_array() please
> 
> > +    if ( s->pglist_hash_base == NULL )
> > +        goto fail1;
> > +    memset(s->pglist_hash_base, 0, PGLIST_HASH_ENTRY_SIZE *
> > + PGLIST_HASH_SIZE);
> 
> And together with this, xzalloc_array().
> 

Ok.

> > +static int hvm_map_pages_to_ioreq_server(struct domain *d, ioservid_t
> id,
> > +                                        uint16_t set, uint16_t nr_pages,
> > +                                        unsigned long *gpfn) {
> > +    struct hvm_ioreq_server *s;
> > +    int rc;
> > +    unsigned int i;
> > +    p2m_type_t ot, nt;
> > +
> > +    if ( set )
> > +    {
> > +        ot = p2m_ram_rw;
> > +        nt = p2m_ram_wp;
> > +    }
> > +    else
> > +    {
> > +        ot = p2m_ram_wp;
> > +        nt = p2m_ram_rw;
> > +    }
> > +
> > +    spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
> > +
> > +    rc = -ENOENT;
> > +    list_for_each_entry ( s,
> > +                          &d->arch.hvm_domain.ioreq_server.list,
> > +                          list_entry )
> > +    {
> > +        if ( s == d->arch.hvm_domain.default_ioreq_server )
> > +            continue;
> > +
> > +        if ( s->id == id )
> > +        {
> 
> Consistency please: Either both if()-s use "continue" (and perhaps get
> combined into one), or (less desirable imo) both get combined to use a {}
> body.

Ok.

> 
> > +            spin_lock(&s->pglist_hash_lock);
> > +
> > +            for ( i = 0; i < nr_pages; i++ )
> > +            {
> > +                p2m_change_type_one(d, gpfn[i], ot, nt);
> 
> Error handling?

Forget error handling, thanks for your reminding.

> 
> > +                if ( set )
> > +                    rc = hvm_ioreq_server_pglist_hash_add(s->pglist_hash_base,
> gpfn[i]);
> > +                else
> > +                    rc =
> > + hvm_ioreq_server_pglist_hash_rem(s->pglist_hash_base, gpfn[i]);
> > +
> > +                 if ( rc != 0 )
> > +                     break;
> > +            }
> > +
> > +            spin_unlock(&s->pglist_hash_lock);
> > +            flush_tlb_mask(d->domain_dirty_cpumask);
> 
> Why?

Will remove it.

> 
> > @@ -2294,6 +2453,8 @@ static struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
> >      uint32_t cf8;
> >      uint8_t type;
> >      uint64_t addr;
> > +    unsigned long gpfn;
> > +    struct pglist_hash_table *he;
> 
> These ought to be moved down into a more narrow scope.
> 
> > @@ -2361,6 +2522,15 @@ static struct hvm_ioreq_server
> *hvm_select_ioreq_server(struct domain *d,
> >              if ( rangeset_contains_range(r, addr, end) )
> >                  return s;
> >
> > +            gpfn = addr >> PAGE_SHIFT;
> > +            spin_lock(&s->pglist_hash_lock);
> > +            he = hvm_ioreq_server_lookup_pglist_hash_table(s-
> >pglist_hash_base, gpfn);
> > +            spin_unlock(&s->pglist_hash_lock);
> > +
> > +            if ( he ) {
> 
> Coding style again.
> 
> > +                return s;
> > +            }
> > +
> 
Ok.


> And now the most fundamental question: Why is the existing (range set
> based) mechanism not enough? Is that because we limit the number of
> entries permitted on the range set? If so, the obvious question is - why is
> there no limit being enforced for your new type of pages?
> 

In our usage, we should write protect all the pages allocated by GFX driver using as Per-process graphics translation table, 
This kind of pages is much more than MAX_NR_IO_RANGES(256). So we need a scatter page list for containing all the pages.

> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -48,6 +48,16 @@ struct hvm_ioreq_vcpu {
> >      evtchn_port_t    ioreq_evtchn;
> >  };
> >
> > +struct pglist_hash_table {
> > +    struct pglist_hash_table *next;
> > +    unsigned long gpfn;
> > +};
> > +#define PGLIST_HASH_SIZE_SHIFT    8
> > +#define PGLIST_HASH_SIZE    (1 << PGLIST_HASH_SIZE_SHIFT)
> > +#define pglist_hash(x)      ((x) % PGLIST_HASH_SIZE)
> > +#define PGLIST_INVALID_GPFN       0
> > +#define PGLIST_HASH_ENTRY_SIZE    sizeof(struct pglist_hash_table)
> 
> None of this has any point in being in a header: There's only a single source
> file using these, so they should be private to that one.

Will move to hvm.c

> 
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state
> {  typedef
> > struct xen_hvm_set_ioreq_server_state
> > xen_hvm_set_ioreq_server_state_t;
> > DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
> >
> > +#define HVMOP_map_pages_to_ioreq_server 23
> > +#define MAX_MAP_BATCH_PAGES             128
> 
> Too generic a name.

How about MAX_IOREQSVR_MAP_BATCH_PAGES ?

> 
> > +struct xen_hvm_map_pages_to_ioreq_server {
> > +    domid_t domid;   /* IN - domain to be serviced */
> > +    ioservid_t id;   /* IN - server id */
> > +    uint16_t set;    /* IN - 1: map pages, 0: unmap pages */
> > +    uint16_t nr_pages;  /* IN - page number */
> 
> "page count" or "number of pages".


Should be page count.

> 
> > +    unsigned long page_list[MAX_MAP_BATCH_PAGES]; /* IN - gpfn list
> > + */
> 
> No "unsigned long" in public interfaces. And no variable width types in
> interfaces not being subject to compat mode translation.
> 
I'll change to use uint64_t.


> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-04  5:05   ` Ye, Wei
@ 2014-08-04  7:35     ` Jan Beulich
  2014-08-04 21:34       ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-08-04  7:35 UTC (permalink / raw)
  To: Wei Ye
  Cc: Kevin Tian, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, xen-devel, Paul.Durrant, Zhiyuan Lv

>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:

> 
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, July 28, 2014 4:25 PM
>> To: Ye, Wei
>> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
>> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; xen-
>> devel@lists.xen.org; keir@xen.org 
>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page write
>> protection
>> 
>> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>> > ioreq-server is proposed to forward PIO and MMIO request to multiple
>> > device models according to the io range. XenGT (Intel Graphics
>> > Virtualization technology, please refer to
>> > https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization- 
>> > xengt) driver reside in Dom0 as a virtual graphics device model also
>> > need to trap and emulate the guest's write operation to some specific
>> > memory pages, like the memory pages used by guest graphics driver as
>> > PPGTT(per-process graphics translation table). We add an new p2m type
>> > "p2m_ram_wp" to trap the page write operation. Page of this new p2m
>> > type is read only and for write, the request will go to device model
>> > via ioreq-server.
>> 
>> So how is this write-protection supposed to work on the IOMMU side when
>> sharing page tables?
> 
> Thanks for pointing out this question. Write-protection is not supposed to 
> work when sharing page tables between EPT and vt-d.
> An explicit command line "iommu=no-sharept" should be setted for avoiding 
> undesirable iommu fault.

Requiring the unconditional use of a specific command line option is
certainly fine for experimental code, but not beyond that. Behavior
should be correct by default in production code.

But what's worse here: The option _allows_ device side writes from
the guest. Why would device side writes be okay, but CPU side ones
not?

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/2] x86: add p2m_ram_wp
  2014-08-04  5:10     ` Ye, Wei
@ 2014-08-04  7:37       ` Jan Beulich
  2014-08-05  7:09         ` Ye, Wei
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-08-04  7:37 UTC (permalink / raw)
  To: Wei Ye
  Cc: Kevin Tian, keir, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Paul.Durrant, Zhiyuan Lv

>>> On 04.08.14 at 07:10, <wei.ye@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>> > @@ -167,6 +169,9 @@ typedef unsigned int p2m_query_t;
>> >   * and must not be touched. */
>> >  #define P2M_BROKEN_TYPES (p2m_to_mask(p2m_ram_broken))
>> >
>> > +/* Write protection types */
>> > +#define P2M_WP_TYPES (p2m_to_mask(p2m_ram_wp))
>> > +
>> >  /* Useful predicates */
>> >  #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES)
>> >  #define p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES)
>> > @@ -191,6 +196,7 @@ typedef unsigned int p2m_query_t;
>> >  #define p2m_is_any_ram(_t)  (p2m_to_mask(_t) &                   \
>> >                               (P2M_RAM_TYPES | P2M_GRANT_TYPES |  \
>> >                                p2m_to_mask(p2m_map_foreign)))
>> > +#define p2m_is_wp_ram(_t)   (p2m_to_mask(_t) & P2M_WP_TYPES)
>> 
>> To me such single-type classes don't seem very useful. Agreed
>> there are a number of pre-existing ones, but for classes where
>> future extensions are rather hard to imagine I wouldn't needlessly
>> add classes right away. But Tim (whom you failed to Cc anyway)
>> will have the final say here.
> 
> How about using the existing p2m_mmio_dm, in that way, both read & write 
> access will go to device model for emulation.

Perhaps you didn't read what I wrote the way I intended it: I'm
not opposed to the new P2M type. What I dislike is the single type
class that you introduce. To check for that in the code, you could
as well use direct comparisons with p2m_ram_wp.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
  2014-08-04  5:41     ` Ye, Wei
@ 2014-08-04  7:47       ` Jan Beulich
  2014-08-04 21:39         ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-08-04  7:47 UTC (permalink / raw)
  To: Wei Ye
  Cc: Kevin Tian, keir, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Paul.Durrant, Zhiyuan Lv

>>> On 04.08.14 at 07:41, <wei.ye@intel.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>> > +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table
>> *pglist_ht,
>> > +                                            unsigned long gpfn) {
>> > +    int index = pglist_hash(gpfn);
>> > +    struct pglist_hash_table *next, *prev;
>> > +
>> > +    if ( pglist_ht[index].gpfn == gpfn )
>> > +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
>> > +    else
>> > +    {
>> > +        prev = &pglist_ht[index];
>> > +        while ( 1 )
>> > +        {
>> > +            next = prev->next;
>> > +            if ( !next )
>> > +            {
>> > +                printk("hvm_ioreq_server_pglist_hash_rem hash_table %p
>> remove"
>> > +                       " %lx not found\n", pglist_ht, gpfn);
>> 
>> The value of this log message is questionable anyway, but there's no way for
>> this to be acceptable without a suitably low guest level being set.
> 
> You mean should also print out the VM id in this log?

That's a secondary thing: Of course you need to ask yourself what
use you could make of the message if you see it in some log, including
when there was more than one VM running. But my point was that this
lacks a suitable XENLOG_G_* prefix, thus representing a security issue
(guest being able to flood the hypervisor log).

>> And now the most fundamental question: Why is the existing (range set
>> based) mechanism not enough? Is that because we limit the number of
>> entries permitted on the range set? If so, the obvious question is - why is
>> there no limit being enforced for your new type of pages?
>> 
> 
> In our usage, we should write protect all the pages allocated by GFX driver 
> using as Per-process graphics translation table, 
> This kind of pages is much more than MAX_NR_IO_RANGES(256). So we need a 
> scatter page list for containing all the pages.

So am I reading this right that you could do with the existing
mechanism if the limit was higher? Together with you not answering
the question about the lack of an enforced limit in your new
mechanism, this clearly tells me that you should use the existing one,
raising the limit and - if necessary - dealing with the (security) fallout.

>> > --- a/xen/include/public/hvm/hvm_op.h
>> > +++ b/xen/include/public/hvm/hvm_op.h
>> > @@ -367,6 +367,18 @@ struct xen_hvm_set_ioreq_server_state
>> {  typedef
>> > struct xen_hvm_set_ioreq_server_state
>> > xen_hvm_set_ioreq_server_state_t;
>> > DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
>> >
>> > +#define HVMOP_map_pages_to_ioreq_server 23
>> > +#define MAX_MAP_BATCH_PAGES             128
>> 
>> Too generic a name.
> 
> How about MAX_IOREQSVR_MAP_BATCH_PAGES ?

Butter, but still insufficient. XEN_IOREQSRV_MAX_MAP_BATCH_PAGES
would go in the direction of being acceptable in a public header. But
with the comments earlier on I suppose this will go away anyway.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-04  7:35     ` Jan Beulich
@ 2014-08-04 21:34       ` Tian, Kevin
  2014-08-05  6:35         ` Jan Beulich
  2014-08-05  7:35         ` Zhang, Yang Z
  0 siblings, 2 replies; 46+ messages in thread
From: Tian, Kevin @ 2014-08-04 21:34 UTC (permalink / raw)
  To: Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, August 04, 2014 12:35 AM
> 
> >>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> 
> >
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Monday, July 28, 2014 4:25 PM
> >> To: Ye, Wei
> >> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> >> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; xen-
> >> devel@lists.xen.org; keir@xen.org
> >> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page write
> >> protection
> >>
> >> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >> > ioreq-server is proposed to forward PIO and MMIO request to multiple
> >> > device models according to the io range. XenGT (Intel Graphics
> >> > Virtualization technology, please refer to
> >> > https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
> >> > xengt) driver reside in Dom0 as a virtual graphics device model also
> >> > need to trap and emulate the guest's write operation to some specific
> >> > memory pages, like the memory pages used by guest graphics driver as
> >> > PPGTT(per-process graphics translation table). We add an new p2m type
> >> > "p2m_ram_wp" to trap the page write operation. Page of this new p2m
> >> > type is read only and for write, the request will go to device model
> >> > via ioreq-server.
> >>
> >> So how is this write-protection supposed to work on the IOMMU side when
> >> sharing page tables?
> >
> > Thanks for pointing out this question. Write-protection is not supposed to
> > work when sharing page tables between EPT and vt-d.
> > An explicit command line "iommu=no-sharept" should be setted for avoiding
> > undesirable iommu fault.
> 
> Requiring the unconditional use of a specific command line option is
> certainly fine for experimental code, but not beyond that. Behavior
> should be correct by default in production code.
> 
> But what's worse here: The option _allows_ device side writes from
> the guest. Why would device side writes be okay, but CPU side ones
> not?
> 

right, whether ept is shared or not doesn't address the concern here. 
In both cases we need maintain the same p2m view between CPU and 
device side, otherwise it's broken...

One option is to treat wp similar to logdirty, i.e. exclusive to VT-d
device assignment, until in the future VT-d supports page fault. We
can provide a boot option to override the default policy if user thinks
OK.

2nd option, like Wei mentioned in another mail, is to treat such write
protected PPGTT page tables as MMIO. new hypercall is required to 
change the p2m type between p2m_io and p2m_ram, based on 
allocation/free of guest page table. This way may impact performance
on read though.

Comments?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
  2014-08-04  7:47       ` Jan Beulich
@ 2014-08-04 21:39         ` Tian, Kevin
  2014-08-05  6:38           ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-04 21:39 UTC (permalink / raw)
  To: Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, Tim Deegan, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, August 04, 2014 12:47 AM
> 
> >>> On 04.08.14 at 07:41, <wei.ye@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >> > +static int hvm_ioreq_server_pglist_hash_rem(struct pglist_hash_table
> >> *pglist_ht,
> >> > +                                            unsigned long
> gpfn) {
> >> > +    int index = pglist_hash(gpfn);
> >> > +    struct pglist_hash_table *next, *prev;
> >> > +
> >> > +    if ( pglist_ht[index].gpfn == gpfn )
> >> > +        pglist_ht[index].gpfn = PGLIST_INVALID_GPFN;
> >> > +    else
> >> > +    {
> >> > +        prev = &pglist_ht[index];
> >> > +        while ( 1 )
> >> > +        {
> >> > +            next = prev->next;
> >> > +            if ( !next )
> >> > +            {
> >> > +                printk("hvm_ioreq_server_pglist_hash_rem
> hash_table %p
> >> remove"
> >> > +                       " %lx not found\n", pglist_ht, gpfn);
> >>
> >> The value of this log message is questionable anyway, but there's no way
> for
> >> this to be acceptable without a suitably low guest level being set.
> >
> > You mean should also print out the VM id in this log?
> 
> That's a secondary thing: Of course you need to ask yourself what
> use you could make of the message if you see it in some log, including
> when there was more than one VM running. But my point was that this
> lacks a suitable XENLOG_G_* prefix, thus representing a security issue
> (guest being able to flood the hypervisor log).
> 
> >> And now the most fundamental question: Why is the existing (range set
> >> based) mechanism not enough? Is that because we limit the number of
> >> entries permitted on the range set? If so, the obvious question is - why is
> >> there no limit being enforced for your new type of pages?
> >>
> >
> > In our usage, we should write protect all the pages allocated by GFX driver
> > using as Per-process graphics translation table,
> > This kind of pages is much more than MAX_NR_IO_RANGES(256). So we
> need a
> > scatter page list for containing all the pages.
> 
> So am I reading this right that you could do with the existing
> mechanism if the limit was higher? Together with you not answering
> the question about the lack of an enforced limit in your new
> mechanism, this clearly tells me that you should use the existing one,
> raising the limit and - if necessary - dealing with the (security) fallout.
> 

essentially those pages are not contiguous and dynamic, and thus a fixed 
range based structure even with a higher limitation doesn't fit here. Just
think about a structure similar to what we have done to maintain shadow
pfn is required here. A hash list is more flexible to serve such purpose.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-04 21:34       ` Tian, Kevin
@ 2014-08-05  6:35         ` Jan Beulich
  2014-08-05  6:46           ` Ye, Wei
  2014-08-05  7:35         ` Zhang, Yang Z
  1 sibling, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-08-05  6:35 UTC (permalink / raw)
  To: Kevin Tian
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Zhiyuan Lv, Wei Ye

>>> On 04.08.14 at 23:34, <kevin.tian@intel.com> wrote:
> 2nd option, like Wei mentioned in another mail, is to treat such write
> protected PPGTT page tables as MMIO. new hypercall is required to 
> change the p2m type between p2m_io and p2m_ram, based on 
> allocation/free of guest page table. This way may impact performance
> on read though.

Depends on whether you mean mmio_dm or mmio_direct here.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 2/2] ioreq-server: Support scatter page forwarding
  2014-08-04 21:39         ` Tian, Kevin
@ 2014-08-05  6:38           ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2014-08-05  6:38 UTC (permalink / raw)
  To: Kevin Tian
  Cc: keir, ian.campbell, stefano.stabellini, Tim Deegan, ian.jackson,
	xen-devel, Paul.Durrant, Zhiyuan Lv, Wei Ye

>>> On 04.08.14 at 23:39, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> So am I reading this right that you could do with the existing
>> mechanism if the limit was higher? Together with you not answering
>> the question about the lack of an enforced limit in your new
>> mechanism, this clearly tells me that you should use the existing one,
>> raising the limit and - if necessary - dealing with the (security) fallout.
>> 
> 
> essentially those pages are not contiguous and dynamic, and thus a fixed 
> range based structure even with a higher limitation doesn't fit here. Just
> think about a structure similar to what we have done to maintain shadow
> pfn is required here. A hash list is more flexible to serve such purpose.

I don't follow - range sets aren't a "fixed range based structure" to
me. All we impose is a limit on the number of ranges you may have
in place to avoid unbounded amounts of memory being tied up by a
domain, or unbounded amounts of time are spent in accessing that
tree. Obviously the same would need to be done for any new
mechanism, so a new mechanism is only first choice if it provides
other benefits over the original one.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-05  6:35         ` Jan Beulich
@ 2014-08-05  6:46           ` Ye, Wei
  2014-08-05  7:51             ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Ye, Wei @ 2014-08-05  6:46 UTC (permalink / raw)
  To: Jan Beulich, Tian, Kevin
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, August 5, 2014 2:35 PM
> To: Tian, Kevin
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Ye, Wei; Lv,
> Zhiyuan; xen-devel@lists.xen.org; keir@xen.org; tim@xen.org
> Subject: RE: [PATCH v1 0/2] Extend ioreq-server to support page write
> protection
> 
> >>> On 04.08.14 at 23:34, <kevin.tian@intel.com> wrote:
> > 2nd option, like Wei mentioned in another mail, is to treat such write
> > protected PPGTT page tables as MMIO. new hypercall is required to
> > change the p2m type between p2m_io and p2m_ram, based on
> > allocation/free of guest page table. This way may impact performance
> > on read though.
> 
> Depends on whether you mean mmio_dm or mmio_direct here.
> 
> Jan

It's p2m_mmio_dm.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 1/2] x86: add p2m_ram_wp
  2014-08-04  7:37       ` Jan Beulich
@ 2014-08-05  7:09         ` Ye, Wei
  0 siblings, 0 replies; 46+ messages in thread
From: Ye, Wei @ 2014-08-05  7:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, Tim Deegan,
	ian.jackson, xen-devel, Paul.Durrant, Lv, Zhiyuan



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, August 4, 2014 3:38 PM
> To: Ye, Wei
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Tian, Kevin; Lv,
> Zhiyuan; xen-devel@lists.xen.org; keir@xen.org; Tim Deegan
> Subject: RE: [PATCH v1 1/2] x86: add p2m_ram_wp
> 
> >>> On 04.08.14 at 07:10, <wei.ye@intel.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> >>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >> > @@ -167,6 +169,9 @@ typedef unsigned int p2m_query_t;
> >> >   * and must not be touched. */
> >> >  #define P2M_BROKEN_TYPES (p2m_to_mask(p2m_ram_broken))
> >> >
> >> > +/* Write protection types */
> >> > +#define P2M_WP_TYPES (p2m_to_mask(p2m_ram_wp))
> >> > +
> >> >  /* Useful predicates */
> >> >  #define p2m_is_ram(_t) (p2m_to_mask(_t) & P2M_RAM_TYPES)
> #define
> >> > p2m_is_hole(_t) (p2m_to_mask(_t) & P2M_HOLE_TYPES) @@ -191,6
> +196,7
> >> > @@ typedef unsigned int p2m_query_t;
> >> >  #define p2m_is_any_ram(_t)  (p2m_to_mask(_t) &                   \
> >> >                               (P2M_RAM_TYPES | P2M_GRANT_TYPES |  \
> >> >                                p2m_to_mask(p2m_map_foreign)))
> >> > +#define p2m_is_wp_ram(_t)   (p2m_to_mask(_t) & P2M_WP_TYPES)
> >>
> >> To me such single-type classes don't seem very useful. Agreed there
> >> are a number of pre-existing ones, but for classes where future
> >> extensions are rather hard to imagine I wouldn't needlessly add
> >> classes right away. But Tim (whom you failed to Cc anyway) will have
> >> the final say here.
> >
> > How about using the existing p2m_mmio_dm, in that way, both read &
> > write access will go to device model for emulation.
> 
> Perhaps you didn't read what I wrote the way I intended it: I'm not opposed
> to the new P2M type. What I dislike is the single type class that you introduce.
> To check for that in the code, you could as well use direct comparisons with
> p2m_ram_wp.

Sorry for my misunderstanding. Agree that this single class may be unnecessary. So I'll
remove this class type and compare it directly with p2m_ram_wp. But another option
maybe we can remove this new introduced p2m type, since it doesn't work well when page
table sharing with IOMMU case. And just using the existing p2m type "p2m_mmio_dm".

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-04 21:34       ` Tian, Kevin
  2014-08-05  6:35         ` Jan Beulich
@ 2014-08-05  7:35         ` Zhang, Yang Z
  2014-08-05  7:51           ` Jan Beulich
  1 sibling, 1 reply; 46+ messages in thread
From: Zhang, Yang Z @ 2014-08-05  7:35 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

Tian, Kevin wrote on 2014-08-05:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, August 04, 2014 12:35 AM
>> 
>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
>> 
>>> 
>>> Jan Beulich wrote on 2014-07-28:
>>>> devel@lists.xen.org; keir@xen.org
>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page
>>>> write protection
>>>> 
>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>>>>> ioreq-server is proposed to forward PIO and MMIO request to
>>>>> multiple device models according to the io range. XenGT (Intel
>>>>> Graphics Virtualization technology, please refer to
>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
>>>>> xengt) driver reside in Dom0 as a virtual graphics device model
>>>>> also need to trap and emulate the guest's write operation to
>>>>> some specific memory pages, like the memory pages used by guest
>>>>> graphics driver as PPGTT(per-process graphics translation table).
>>>>> We add an new p2m type "p2m_ram_wp" to trap the page write
>>>>> operation. Page of this new p2m type is read only and for
>>>>> write, the request will go to device model via ioreq-server.
>>>> 
>>>> So how is this write-protection supposed to work on the IOMMU
>>>> side when sharing page tables?
>>> 
>>> Thanks for pointing out this question. Write-protection is not
>>> supposed to work when sharing page tables between EPT and vt-d.
>>> An explicit command line "iommu=no-sharept" should be setted for
>>> avoiding undesirable iommu fault.
>> 
>> Requiring the unconditional use of a specific command line option is
>> certainly fine for experimental code, but not beyond that. Behavior
>> should be correct by default in production code.
>> 
>> But what's worse here: The option _allows_ device side writes from
>> the guest. Why would device side writes be okay, but CPU side ones not?
>> 
> 
> right, whether ept is shared or not doesn't address the concern here.
> In both cases we need maintain the same p2m view between CPU and
> device side, otherwise it's broken...
> 
> One option is to treat wp similar to logdirty, i.e. exclusive to VT-d
> device assignment, until in the future VT-d supports page fault. We
> can provide a boot option to override the default policy if user
> thinks OK.
> 
> 2nd option, like Wei mentioned in another mail, is to treat such write
> protected PPGTT page tables as MMIO. new hypercall is required to
> change the p2m type between p2m_io and p2m_ram, based on
> allocation/free of guest page table. This way may impact performance
> on read though.
> 
> Comments?

Another solution is using the EPT misconfiguration mechanism like what Xen does for MTRR emulation currently.

> 
> Thanks
> Kevin
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Best regards,
Yang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-05  6:46           ` Ye, Wei
@ 2014-08-05  7:51             ` Jan Beulich
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Beulich @ 2014-08-05  7:51 UTC (permalink / raw)
  To: Kevin Tian, Wei Ye
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Zhiyuan Lv

>>> On 05.08.14 at 08:46, <wei.ye@intel.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, August 5, 2014 2:35 PM
>> To: Tian, Kevin
>> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
>> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Ye, Wei; Lv,
>> Zhiyuan; xen-devel@lists.xen.org; keir@xen.org; tim@xen.org 
>> Subject: RE: [PATCH v1 0/2] Extend ioreq-server to support page write
>> protection
>> 
>> >>> On 04.08.14 at 23:34, <kevin.tian@intel.com> wrote:
>> > 2nd option, like Wei mentioned in another mail, is to treat such write
>> > protected PPGTT page tables as MMIO. new hypercall is required to
>> > change the p2m type between p2m_io and p2m_ram, based on
>> > allocation/free of guest page table. This way may impact performance
>> > on read though.
>> 
>> Depends on whether you mean mmio_dm or mmio_direct here.
> 
> It's p2m_mmio_dm.

In which case - how would this help? You'd still not be able to deal
with device side accesses. (And as a side note - a hypercall to
convert between p2m_ram_rw and p2m_mmio_dm already exists:
HVMOP_set_mem_type.)

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-05  7:35         ` Zhang, Yang Z
@ 2014-08-05  7:51           ` Jan Beulich
  2014-08-05  8:20             ` Ye, Wei
  2014-08-05 15:41             ` Tian, Kevin
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Beulich @ 2014-08-05  7:51 UTC (permalink / raw)
  To: Kevin Tian, Wei Ye, Yang Z Zhang
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Zhiyuan Lv

>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
> Tian, Kevin wrote on 2014-08-05:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Monday, August 04, 2014 12:35 AM
>>> 
>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
>>> 
>>>> 
>>>> Jan Beulich wrote on 2014-07-28:
>>>>> devel@lists.xen.org; keir@xen.org 
>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page
>>>>> write protection
>>>>> 
>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
>>>>>> multiple device models according to the io range. XenGT (Intel
>>>>>> Graphics Virtualization technology, please refer to
>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization- 
>>>>>> xengt) driver reside in Dom0 as a virtual graphics device model
>>>>>> also need to trap and emulate the guest's write operation to
>>>>>> some specific memory pages, like the memory pages used by guest
>>>>>> graphics driver as PPGTT(per-process graphics translation table).
>>>>>> We add an new p2m type "p2m_ram_wp" to trap the page write
>>>>>> operation. Page of this new p2m type is read only and for
>>>>>> write, the request will go to device model via ioreq-server.
>>>>> 
>>>>> So how is this write-protection supposed to work on the IOMMU
>>>>> side when sharing page tables?
>>>> 
>>>> Thanks for pointing out this question. Write-protection is not
>>>> supposed to work when sharing page tables between EPT and vt-d.
>>>> An explicit command line "iommu=no-sharept" should be setted for
>>>> avoiding undesirable iommu fault.
>>> 
>>> Requiring the unconditional use of a specific command line option is
>>> certainly fine for experimental code, but not beyond that. Behavior
>>> should be correct by default in production code.
>>> 
>>> But what's worse here: The option _allows_ device side writes from
>>> the guest. Why would device side writes be okay, but CPU side ones not?
>>> 
>> 
>> right, whether ept is shared or not doesn't address the concern here.
>> In both cases we need maintain the same p2m view between CPU and
>> device side, otherwise it's broken...
>> 
>> One option is to treat wp similar to logdirty, i.e. exclusive to VT-d
>> device assignment, until in the future VT-d supports page fault. We
>> can provide a boot option to override the default policy if user
>> thinks OK.
>> 
>> 2nd option, like Wei mentioned in another mail, is to treat such write
>> protected PPGTT page tables as MMIO. new hypercall is required to
>> change the p2m type between p2m_io and p2m_ram, based on
>> allocation/free of guest page table. This way may impact performance
>> on read though.
>> 
>> Comments?
> 
> Another solution is using the EPT misconfiguration mechanism like what Xen 
> does for MTRR emulation currently.

That would cause faults on reads as well, making it necessary to
emulate them.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-05  7:51           ` Jan Beulich
@ 2014-08-05  8:20             ` Ye, Wei
  2014-08-05 15:41             ` Tian, Kevin
  1 sibling, 0 replies; 46+ messages in thread
From: Ye, Wei @ 2014-08-05  8:20 UTC (permalink / raw)
  To: Jan Beulich, Tian, Kevin, Zhang, Yang Z
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, August 5, 2014 3:52 PM
> To: Tian, Kevin; Ye, Wei; Zhang, Yang Z
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Lv, Zhiyuan;
> xen-devel@lists.xen.org; keir@xen.org; tim@xen.org
> Subject: RE: [Xen-devel] [PATCH v1 0/2] Extend ioreq-server to support page
> write protection
> 
> >>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
> > Tian, Kevin wrote on 2014-08-05:
> >>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>> Sent: Monday, August 04, 2014 12:35 AM
> >>>
> >>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> >>>
> >>>>
> >>>> Jan Beulich wrote on 2014-07-28:
> >>>>> devel@lists.xen.org; keir@xen.org
> >>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page
> >>>>> write protection
> >>>>>
> >>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >>>>>> ioreq-server is proposed to forward PIO and MMIO request to
> >>>>>> multiple device models according to the io range. XenGT (Intel
> >>>>>> Graphics Virtualization technology, please refer to
> >>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
> >>>>>> xengt) driver reside in Dom0 as a virtual graphics device model
> >>>>>> also need to trap and emulate the guest's write operation to some
> >>>>>> specific memory pages, like the memory pages used by guest
> >>>>>> graphics driver as PPGTT(per-process graphics translation table).
> >>>>>> We add an new p2m type "p2m_ram_wp" to trap the page write
> >>>>>> operation. Page of this new p2m type is read only and for write,
> >>>>>> the request will go to device model via ioreq-server.
> >>>>>
> >>>>> So how is this write-protection supposed to work on the IOMMU side
> >>>>> when sharing page tables?
> >>>>
> >>>> Thanks for pointing out this question. Write-protection is not
> >>>> supposed to work when sharing page tables between EPT and vt-d.
> >>>> An explicit command line "iommu=no-sharept" should be setted for
> >>>> avoiding undesirable iommu fault.
> >>>
> >>> Requiring the unconditional use of a specific command line option is
> >>> certainly fine for experimental code, but not beyond that. Behavior
> >>> should be correct by default in production code.
> >>>
> >>> But what's worse here: The option _allows_ device side writes from
> >>> the guest. Why would device side writes be okay, but CPU side ones not?
> >>>
> >>
> >> right, whether ept is shared or not doesn't address the concern here.
> >> In both cases we need maintain the same p2m view between CPU and
> >> device side, otherwise it's broken...
> >>
> >> One option is to treat wp similar to logdirty, i.e. exclusive to VT-d
> >> device assignment, until in the future VT-d supports page fault. We
> >> can provide a boot option to override the default policy if user
> >> thinks OK.
> >>
> >> 2nd option, like Wei mentioned in another mail, is to treat such
> >> write protected PPGTT page tables as MMIO. new hypercall is required
> >> to change the p2m type between p2m_io and p2m_ram, based on
> >> allocation/free of guest page table. This way may impact performance
> >> on read though.
> >>
> >> Comments?
> >
> > Another solution is using the EPT misconfiguration mechanism like what
> > Xen does for MTRR emulation currently.
> 
> That would cause faults on reads as well, making it necessary to emulate
> them.

In our usage case, the read access is much less than write (zero read in Linux guest and 1/4 for read access compared to write in windows guest). 

> 
> Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-05  7:51           ` Jan Beulich
  2014-08-05  8:20             ` Ye, Wei
@ 2014-08-05 15:41             ` Tian, Kevin
  2014-08-06  2:11               ` Zhang, Yang Z
  1 sibling, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-05 15:41 UTC (permalink / raw)
  To: Jan Beulich, Ye, Wei, Zhang, Yang Z
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, August 05, 2014 12:52 AM
> 
> >>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
> > Tian, Kevin wrote on 2014-08-05:
> >>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>> Sent: Monday, August 04, 2014 12:35 AM
> >>>
> >>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> >>>
> >>>>
> >>>> Jan Beulich wrote on 2014-07-28:
> >>>>> devel@lists.xen.org; keir@xen.org
> >>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support page
> >>>>> write protection
> >>>>>
> >>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >>>>>> ioreq-server is proposed to forward PIO and MMIO request to
> >>>>>> multiple device models according to the io range. XenGT (Intel
> >>>>>> Graphics Virtualization technology, please refer to
> >>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-
> >>>>>> xengt) driver reside in Dom0 as a virtual graphics device model
> >>>>>> also need to trap and emulate the guest's write operation to
> >>>>>> some specific memory pages, like the memory pages used by guest
> >>>>>> graphics driver as PPGTT(per-process graphics translation table).
> >>>>>> We add an new p2m type "p2m_ram_wp" to trap the page write
> >>>>>> operation. Page of this new p2m type is read only and for
> >>>>>> write, the request will go to device model via ioreq-server.
> >>>>>
> >>>>> So how is this write-protection supposed to work on the IOMMU
> >>>>> side when sharing page tables?
> >>>>
> >>>> Thanks for pointing out this question. Write-protection is not
> >>>> supposed to work when sharing page tables between EPT and vt-d.
> >>>> An explicit command line "iommu=no-sharept" should be setted for
> >>>> avoiding undesirable iommu fault.
> >>>
> >>> Requiring the unconditional use of a specific command line option is
> >>> certainly fine for experimental code, but not beyond that. Behavior
> >>> should be correct by default in production code.
> >>>
> >>> But what's worse here: The option _allows_ device side writes from
> >>> the guest. Why would device side writes be okay, but CPU side ones not?
> >>>
> >>
> >> right, whether ept is shared or not doesn't address the concern here.
> >> In both cases we need maintain the same p2m view between CPU and
> >> device side, otherwise it's broken...
> >>
> >> One option is to treat wp similar to logdirty, i.e. exclusive to VT-d
> >> device assignment, until in the future VT-d supports page fault. We
> >> can provide a boot option to override the default policy if user
> >> thinks OK.
> >>
> >> 2nd option, like Wei mentioned in another mail, is to treat such write
> >> protected PPGTT page tables as MMIO. new hypercall is required to
> >> change the p2m type between p2m_io and p2m_ram, based on
> >> allocation/free of guest page table. This way may impact performance
> >> on read though.
> >>
> >> Comments?
> >
> > Another solution is using the EPT misconfiguration mechanism like what Xen
> > does for MTRR emulation currently.
> 
> That would cause faults on reads as well, making it necessary to
> emulate them.
> 

Then it's same effect of p2m_mmio_dm.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-05 15:41             ` Tian, Kevin
@ 2014-08-06  2:11               ` Zhang, Yang Z
  2014-08-06  2:33                 ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Zhang, Yang Z @ 2014-08-06  2:11 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

Tian, Kevin wrote on 2014-08-05:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Tuesday, August 05, 2014 12:52 AM
>> 
>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote: Tian, Kevin
>>>>> wrote on 2014-08-05: From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>> Sent: Monday, August 04, 2014 12:35 AM
>>>>> 
>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
>>>>> 
>>>>>> 
>>>>>> Jan Beulich wrote on 2014-07-28:
>>>>>>> devel@lists.xen.org; keir@xen.org
>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support
>>>>>>> page write protection
>>>>>>> 
>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
>>>>>>>> multiple device models according to the io range. XenGT
>>>>>>>> (Intel Graphics Virtualization technology, please refer to
>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualizatio
>>>>>>>> n-
>>>>>>>> xengt) driver reside in Dom0 as a virtual graphics device
>>>>>>>> model also need to trap and emulate the guest's write
>>>>>>>> operation to some specific memory pages, like the memory
>>>>>>>> pages used by guest graphics driver as PPGTT(per-process graphics translation table).
>>>>>>>> We add an new p2m type "p2m_ram_wp" to trap the page write
>>>>>>>> operation. Page of this new p2m type is read only and for
>>>>>>>> write, the request will go to device model via ioreq-server.
>>>>>>> 
>>>>>>> So how is this write-protection supposed to work on the IOMMU
>>>>>>> side when sharing page tables?
>>>>>> 
>>>>>> Thanks for pointing out this question. Write-protection is not
>>>>>> supposed to work when sharing page tables between EPT and vt-d.
>>>>>> An explicit command line "iommu=no-sharept" should be setted
>>>>>> for avoiding undesirable iommu fault.
>>>>> 
>>>>> Requiring the unconditional use of a specific command line
>>>>> option is certainly fine for experimental code, but not beyond that.
>>>>> Behavior should be correct by default in production code.
>>>>> 
>>>>> But what's worse here: The option _allows_ device side writes
>>>>> from the guest. Why would device side writes be okay, but CPU side ones not?
>>>>> 
>>>> 
>>>> right, whether ept is shared or not doesn't address the concern here.
>>>> In both cases we need maintain the same p2m view between CPU and
>>>> device side, otherwise it's broken...
>>>> 
>>>> One option is to treat wp similar to logdirty, i.e. exclusive to
>>>> VT-d device assignment, until in the future VT-d supports page
>>>> fault. We can provide a boot option to override the default
>>>> policy if user thinks OK.
>>>> 
>>>> 2nd option, like Wei mentioned in another mail, is to treat such
>>>> write protected PPGTT page tables as MMIO. new hypercall is
>>>> required to change the p2m type between p2m_io and p2m_ram, based
>>>> on allocation/free of guest page table. This way may impact
>>>> performance on read though.
>>>> 
>>>> Comments?
>>> 
>>> Another solution is using the EPT misconfiguration mechanism like
>>> what Xen does for MTRR emulation currently.
>> 
>> That would cause faults on reads as well, making it necessary to
>> emulate them.
>> 
> 
> Then it's same effect of p2m_mmio_dm.

p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.

> 
> Thanks
> Kevin


Best regards,
Yang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06  2:11               ` Zhang, Yang Z
@ 2014-08-06  2:33                 ` Tian, Kevin
  2014-08-06  2:40                   ` Zhang, Yang Z
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-06  2:33 UTC (permalink / raw)
  To: Zhang, Yang Z, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> From: Zhang, Yang Z
> Sent: Tuesday, August 05, 2014 7:11 PM
> 
> Tian, Kevin wrote on 2014-08-05:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Tuesday, August 05, 2014 12:52 AM
> >>
> >>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote: Tian, Kevin
> >>>>> wrote on 2014-08-05: From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>>> Sent: Monday, August 04, 2014 12:35 AM
> >>>>>
> >>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> >>>>>
> >>>>>>
> >>>>>> Jan Beulich wrote on 2014-07-28:
> >>>>>>> devel@lists.xen.org; keir@xen.org
> >>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support
> >>>>>>> page write protection
> >>>>>>>
> >>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
> >>>>>>>> multiple device models according to the io range. XenGT
> >>>>>>>> (Intel Graphics Virtualization technology, please refer to
> >>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualizatio
> >>>>>>>> n-
> >>>>>>>> xengt) driver reside in Dom0 as a virtual graphics device
> >>>>>>>> model also need to trap and emulate the guest's write
> >>>>>>>> operation to some specific memory pages, like the memory
> >>>>>>>> pages used by guest graphics driver as PPGTT(per-process graphics
> translation table).
> >>>>>>>> We add an new p2m type "p2m_ram_wp" to trap the page write
> >>>>>>>> operation. Page of this new p2m type is read only and for
> >>>>>>>> write, the request will go to device model via ioreq-server.
> >>>>>>>
> >>>>>>> So how is this write-protection supposed to work on the IOMMU
> >>>>>>> side when sharing page tables?
> >>>>>>
> >>>>>> Thanks for pointing out this question. Write-protection is not
> >>>>>> supposed to work when sharing page tables between EPT and vt-d.
> >>>>>> An explicit command line "iommu=no-sharept" should be setted
> >>>>>> for avoiding undesirable iommu fault.
> >>>>>
> >>>>> Requiring the unconditional use of a specific command line
> >>>>> option is certainly fine for experimental code, but not beyond that.
> >>>>> Behavior should be correct by default in production code.
> >>>>>
> >>>>> But what's worse here: The option _allows_ device side writes
> >>>>> from the guest. Why would device side writes be okay, but CPU side
> ones not?
> >>>>>
> >>>>
> >>>> right, whether ept is shared or not doesn't address the concern here.
> >>>> In both cases we need maintain the same p2m view between CPU and
> >>>> device side, otherwise it's broken...
> >>>>
> >>>> One option is to treat wp similar to logdirty, i.e. exclusive to
> >>>> VT-d device assignment, until in the future VT-d supports page
> >>>> fault. We can provide a boot option to override the default
> >>>> policy if user thinks OK.
> >>>>
> >>>> 2nd option, like Wei mentioned in another mail, is to treat such
> >>>> write protected PPGTT page tables as MMIO. new hypercall is
> >>>> required to change the p2m type between p2m_io and p2m_ram, based
> >>>> on allocation/free of guest page table. This way may impact
> >>>> performance on read though.
> >>>>
> >>>> Comments?
> >>>
> >>> Another solution is using the EPT misconfiguration mechanism like
> >>> what Xen does for MTRR emulation currently.
> >>
> >> That would cause faults on reads as well, making it necessary to
> >> emulate them.
> >>
> >
> > Then it's same effect of p2m_mmio_dm.
> 
> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
> 

I don't understand. Treating it as emulated io just means no valid p2m
entry in EPT/VT-d. Note XenGT itself doesn't require VT-d, while you
can still assign other devices this way. 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06  2:33                 ` Tian, Kevin
@ 2014-08-06  2:40                   ` Zhang, Yang Z
  2014-08-06  2:49                     ` Tian, Kevin
  2014-08-06  2:50                     ` Tian, Kevin
  0 siblings, 2 replies; 46+ messages in thread
From: Zhang, Yang Z @ 2014-08-06  2:40 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

Tian, Kevin wrote on 2014-08-06:
>> From: Zhang, Yang Z
>> Sent: Tuesday, August 05, 2014 7:11 PM
>> 
>> Tian, Kevin wrote on 2014-08-05:
>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>> Sent: Tuesday, August 05, 2014 12:52 AM
>>>> 
>>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote: Tian,
>>>>>>> Kevin wrote on 2014-08-05: From: Jan Beulich
>>>>>>> [mailto:JBeulich@suse.com]
>>>>>>> Sent: Monday, August 04, 2014 12:35 AM
>>>>>>> 
>>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Jan Beulich wrote on 2014-07-28:
>>>>>>>>> devel@lists.xen.org; keir@xen.org
>>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support
>>>>>>>>> page write protection
>>>>>>>>> 
>>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
>>>>>>>>>> multiple device models according to the io range. XenGT (Intel
>>>>>>>>>> Graphics Virtualization technology, please refer to
>>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualizat io
>>>>>>>>>> n- xengt) driver reside in Dom0 as a virtual graphics device
>>>>>>>>>> model also need to trap and emulate the guest's write operation
>>>>>>>>>> to some specific memory pages, like the memory pages used by
>>>>>>>>>> guest graphics driver as PPGTT(per-process graphics translation
>>>>>>>>>> table). We add an new p2m type "p2m_ram_wp" to trap the page
>>>>>>>>>> write operation. Page of this new p2m type is read only and for
>>>>>>>>>> write, the request will go to device model via ioreq-server.
>>>>>>>>> 
>>>>>>>>> So how is this write-protection supposed to work on the
>>>>>>>>> IOMMU side when sharing page tables?
>>>>>>>> 
>>>>>>>> Thanks for pointing out this question. Write-protection is
>>>>>>>> not supposed to work when sharing page tables between EPT and vt-d.
>>>>>>>> An explicit command line "iommu=no-sharept" should be setted
>>>>>>>> for avoiding undesirable iommu fault.
>>>>>>> 
>>>>>>> Requiring the unconditional use of a specific command line
>>>>>>> option is certainly fine for experimental code, but not beyond that.
>>>>>>> Behavior should be correct by default in production code.
>>>>>>> 
>>>>>>> But what's worse here: The option _allows_ device side writes from
>>>>>>> the guest. Why would device side writes be okay, but CPU side ones
>>>>>>> not?
>>>>>>> 
>>>>>> 
>>>>>> right, whether ept is shared or not doesn't address the concern here.
>>>>>> In both cases we need maintain the same p2m view between CPU
>>>>>> and device side, otherwise it's broken...
>>>>>> 
>>>>>> One option is to treat wp similar to logdirty, i.e. exclusive
>>>>>> to VT-d device assignment, until in the future VT-d supports
>>>>>> page fault. We can provide a boot option to override the
>>>>>> default policy if user thinks OK.
>>>>>> 
>>>>>> 2nd option, like Wei mentioned in another mail, is to treat
>>>>>> such write protected PPGTT page tables as MMIO. new hypercall
>>>>>> is required to change the p2m type between p2m_io and p2m_ram,
>>>>>> based on allocation/free of guest page table. This way may
>>>>>> impact performance on read though.
>>>>>> 
>>>>>> Comments?
>>>>> 
>>>>> Another solution is using the EPT misconfiguration mechanism
>>>>> like what Xen does for MTRR emulation currently.
>>>> 
>>>> That would cause faults on reads as well, making it necessary to
>>>> emulate them.
>>>> 
>>> 
>>> Then it's same effect of p2m_mmio_dm.
>> 
>> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
>> 
> 
> I don't understand. Treating it as emulated io just means no valid p2m
> entry in EPT/VT-d. Note XenGT itself doesn't require VT-d, while you
> can still assign other devices this way.

>From guest's view, it doesn't know the page is not valid in EPT/VT-d table. So if guest CPU touch this page, then hypervisor will intercept the access from EPT violation and handle it. But if guest uses that page as DMA buffer (a malicious guest may do it), then VT-d fault happens. Since DMA is not restartable, guest will never receive the right data.

> 
> Thanks
> Kevin


Best regards,
Yang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06  2:40                   ` Zhang, Yang Z
@ 2014-08-06  2:49                     ` Tian, Kevin
  2014-08-06  2:50                     ` Tian, Kevin
  1 sibling, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2014-08-06  2:49 UTC (permalink / raw)
  To: Zhang, Yang Z, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> From: Zhang, Yang Z
> Sent: Tuesday, August 05, 2014 7:40 PM
> 
> Tian, Kevin wrote on 2014-08-06:
> >> From: Zhang, Yang Z
> >> Sent: Tuesday, August 05, 2014 7:11 PM
> >>
> >> Tian, Kevin wrote on 2014-08-05:
> >>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>> Sent: Tuesday, August 05, 2014 12:52 AM
> >>>>
> >>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote: Tian,
> >>>>>>> Kevin wrote on 2014-08-05: From: Jan Beulich
> >>>>>>> [mailto:JBeulich@suse.com]
> >>>>>>> Sent: Monday, August 04, 2014 12:35 AM
> >>>>>>>
> >>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Jan Beulich wrote on 2014-07-28:
> >>>>>>>>> devel@lists.xen.org; keir@xen.org
> >>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support
> >>>>>>>>> page write protection
> >>>>>>>>>
> >>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
> >>>>>>>>>> multiple device models according to the io range. XenGT (Intel
> >>>>>>>>>> Graphics Virtualization technology, please refer to
> >>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualizat io
> >>>>>>>>>> n- xengt) driver reside in Dom0 as a virtual graphics device
> >>>>>>>>>> model also need to trap and emulate the guest's write operation
> >>>>>>>>>> to some specific memory pages, like the memory pages used by
> >>>>>>>>>> guest graphics driver as PPGTT(per-process graphics translation
> >>>>>>>>>> table). We add an new p2m type "p2m_ram_wp" to trap the page
> >>>>>>>>>> write operation. Page of this new p2m type is read only and for
> >>>>>>>>>> write, the request will go to device model via ioreq-server.
> >>>>>>>>>
> >>>>>>>>> So how is this write-protection supposed to work on the
> >>>>>>>>> IOMMU side when sharing page tables?
> >>>>>>>>
> >>>>>>>> Thanks for pointing out this question. Write-protection is
> >>>>>>>> not supposed to work when sharing page tables between EPT and
> vt-d.
> >>>>>>>> An explicit command line "iommu=no-sharept" should be setted
> >>>>>>>> for avoiding undesirable iommu fault.
> >>>>>>>
> >>>>>>> Requiring the unconditional use of a specific command line
> >>>>>>> option is certainly fine for experimental code, but not beyond that.
> >>>>>>> Behavior should be correct by default in production code.
> >>>>>>>
> >>>>>>> But what's worse here: The option _allows_ device side writes from
> >>>>>>> the guest. Why would device side writes be okay, but CPU side ones
> >>>>>>> not?
> >>>>>>>
> >>>>>>
> >>>>>> right, whether ept is shared or not doesn't address the concern here.
> >>>>>> In both cases we need maintain the same p2m view between CPU
> >>>>>> and device side, otherwise it's broken...
> >>>>>>
> >>>>>> One option is to treat wp similar to logdirty, i.e. exclusive
> >>>>>> to VT-d device assignment, until in the future VT-d supports
> >>>>>> page fault. We can provide a boot option to override the
> >>>>>> default policy if user thinks OK.
> >>>>>>
> >>>>>> 2nd option, like Wei mentioned in another mail, is to treat
> >>>>>> such write protected PPGTT page tables as MMIO. new hypercall
> >>>>>> is required to change the p2m type between p2m_io and p2m_ram,
> >>>>>> based on allocation/free of guest page table. This way may
> >>>>>> impact performance on read though.
> >>>>>>
> >>>>>> Comments?
> >>>>>
> >>>>> Another solution is using the EPT misconfiguration mechanism
> >>>>> like what Xen does for MTRR emulation currently.
> >>>>
> >>>> That would cause faults on reads as well, making it necessary to
> >>>> emulate them.
> >>>>
> >>>
> >>> Then it's same effect of p2m_mmio_dm.
> >>
> >> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
> >>
> >
> > I don't understand. Treating it as emulated io just means no valid p2m
> > entry in EPT/VT-d. Note XenGT itself doesn't require VT-d, while you
> > can still assign other devices this way.
> 
> From guest's view, it doesn't know the page is not valid in EPT/VT-d table. So if
> guest CPU touch this page, then hypervisor will intercept the access from EPT
> violation and handle it. But if guest uses that page as DMA buffer (a malicious
> guest may do it), then VT-d fault happens. Since DMA is not restartable, guest
> will never receive the right data.
> 

as you said, it's malicious guest, that's why we need zap the VT-d entries to avoid
mallicous DMA to GPU page tables. In sane case the page is allocated by gfx driver 
so other device drivers with VT-d assigned devices shouldn't use it as DMA target.
There is no 'right data' to be ensured in such case. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06  2:40                   ` Zhang, Yang Z
  2014-08-06  2:49                     ` Tian, Kevin
@ 2014-08-06  2:50                     ` Tian, Kevin
  2014-08-06  3:04                       ` Zhang, Yang Z
  1 sibling, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-06  2:50 UTC (permalink / raw)
  To: Zhang, Yang Z, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

> From: Tian, Kevin
> Sent: Tuesday, August 05, 2014 7:49 PM
> 
> > From: Zhang, Yang Z
> > Sent: Tuesday, August 05, 2014 7:40 PM
> >
> > Tian, Kevin wrote on 2014-08-06:
> > >> From: Zhang, Yang Z
> > >> Sent: Tuesday, August 05, 2014 7:11 PM
> > >>
> > >> Tian, Kevin wrote on 2014-08-05:
> > >>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >>>> Sent: Tuesday, August 05, 2014 12:52 AM
> > >>>>
> > >>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote: Tian,
> > >>>>>>> Kevin wrote on 2014-08-05: From: Jan Beulich
> > >>>>>>> [mailto:JBeulich@suse.com]
> > >>>>>>> Sent: Monday, August 04, 2014 12:35 AM
> > >>>>>>>
> > >>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> Jan Beulich wrote on 2014-07-28:
> > >>>>>>>>> devel@lists.xen.org; keir@xen.org
> > >>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to support
> > >>>>>>>>> page write protection
> > >>>>>>>>>
> > >>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> > >>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
> > >>>>>>>>>> multiple device models according to the io range. XenGT (Intel
> > >>>>>>>>>> Graphics Virtualization technology, please refer to
> > >>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtualizat io
> > >>>>>>>>>> n- xengt) driver reside in Dom0 as a virtual graphics device
> > >>>>>>>>>> model also need to trap and emulate the guest's write
> operation
> > >>>>>>>>>> to some specific memory pages, like the memory pages used by
> > >>>>>>>>>> guest graphics driver as PPGTT(per-process graphics translation
> > >>>>>>>>>> table). We add an new p2m type "p2m_ram_wp" to trap the
> page
> > >>>>>>>>>> write operation. Page of this new p2m type is read only and for
> > >>>>>>>>>> write, the request will go to device model via ioreq-server.
> > >>>>>>>>>
> > >>>>>>>>> So how is this write-protection supposed to work on the
> > >>>>>>>>> IOMMU side when sharing page tables?
> > >>>>>>>>
> > >>>>>>>> Thanks for pointing out this question. Write-protection is
> > >>>>>>>> not supposed to work when sharing page tables between EPT and
> > vt-d.
> > >>>>>>>> An explicit command line "iommu=no-sharept" should be setted
> > >>>>>>>> for avoiding undesirable iommu fault.
> > >>>>>>>
> > >>>>>>> Requiring the unconditional use of a specific command line
> > >>>>>>> option is certainly fine for experimental code, but not beyond that.
> > >>>>>>> Behavior should be correct by default in production code.
> > >>>>>>>
> > >>>>>>> But what's worse here: The option _allows_ device side writes from
> > >>>>>>> the guest. Why would device side writes be okay, but CPU side ones
> > >>>>>>> not?
> > >>>>>>>
> > >>>>>>
> > >>>>>> right, whether ept is shared or not doesn't address the concern here.
> > >>>>>> In both cases we need maintain the same p2m view between CPU
> > >>>>>> and device side, otherwise it's broken...
> > >>>>>>
> > >>>>>> One option is to treat wp similar to logdirty, i.e. exclusive
> > >>>>>> to VT-d device assignment, until in the future VT-d supports
> > >>>>>> page fault. We can provide a boot option to override the
> > >>>>>> default policy if user thinks OK.
> > >>>>>>
> > >>>>>> 2nd option, like Wei mentioned in another mail, is to treat
> > >>>>>> such write protected PPGTT page tables as MMIO. new hypercall
> > >>>>>> is required to change the p2m type between p2m_io and p2m_ram,
> > >>>>>> based on allocation/free of guest page table. This way may
> > >>>>>> impact performance on read though.
> > >>>>>>
> > >>>>>> Comments?
> > >>>>>
> > >>>>> Another solution is using the EPT misconfiguration mechanism
> > >>>>> like what Xen does for MTRR emulation currently.
> > >>>>
> > >>>> That would cause faults on reads as well, making it necessary to
> > >>>> emulate them.
> > >>>>
> > >>>
> > >>> Then it's same effect of p2m_mmio_dm.
> > >>
> > >> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
> > >>
> > >
> > > I don't understand. Treating it as emulated io just means no valid p2m
> > > entry in EPT/VT-d. Note XenGT itself doesn't require VT-d, while you
> > > can still assign other devices this way.
> >
> > From guest's view, it doesn't know the page is not valid in EPT/VT-d table. So
> if
> > guest CPU touch this page, then hypervisor will intercept the access from
> EPT
> > violation and handle it. But if guest uses that page as DMA buffer (a
> malicious
> > guest may do it), then VT-d fault happens. Since DMA is not restartable,
> guest
> > will never receive the right data.
> >
> 
> as you said, it's malicious guest, that's why we need zap the VT-d entries to
> avoid
> mallicous DMA to GPU page tables. In sane case the page is allocated by gfx
> driver
> so other device drivers with VT-d assigned devices shouldn't use it as DMA
> target.
> There is no 'right data' to be ensured in such case. :-)
> 

Think about a malicious guest may DMA to any holes...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06  2:50                     ` Tian, Kevin
@ 2014-08-06  3:04                       ` Zhang, Yang Z
  2014-08-06 15:00                         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Zhang, Yang Z @ 2014-08-06  3:04 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	xen-devel, Paul.Durrant, Lv, Zhiyuan

Tian, Kevin wrote on 2014-08-06:
>> From: Tian, Kevin
>> Sent: Tuesday, August 05, 2014 7:49 PM
>> 
>>> From: Zhang, Yang Z
>>> Sent: Tuesday, August 05, 2014 7:40 PM
>>> 
>>> Tian, Kevin wrote on 2014-08-06:
>>>>> From: Zhang, Yang Z
>>>>> Sent: Tuesday, August 05, 2014 7:11 PM
>>>>> 
>>>>> Tian, Kevin wrote on 2014-08-05:
>>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>>>>>> Sent: Tuesday, August 05, 2014 12:52 AM
>>>>>>> 
>>>>>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
>>>>>>>>>> Tian, Kevin wrote on 2014-08-05: From: Jan Beulich
>>>>>>>>>> [mailto:JBeulich@suse.com]
>>>>>>>>>> Sent: Monday, August 04, 2014 12:35 AM
>>>>>>>>>> 
>>>>>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Jan Beulich wrote on 2014-07-28:
>>>>>>>>>>>> devel@lists.xen.org; keir@xen.org
>>>>>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to
>>>>>>>>>>>> support page write protection
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
>>>>>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
>>>>>>>>>>>>> multiple device models according to the io range. XenGT
>>>>>>>>>>>>> (Intel Graphics Virtualization technology, please refer to
>>>>>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtual iz
>>>>>>>>>>>>> at io n- xengt) driver reside in Dom0 as a virtual graphics
>>>>>>>>>>>>> device model also need to trap and emulate the guest's write
>>>>>>>>>>>>> operation to some specific memory pages, like the memory
>>>>>>>>>>>>> pages used by guest graphics driver as PPGTT(per-process
>>>>>>>>>>>>> graphics translation table). We add an new p2m type
>>>>>>>>>>>>> "p2m_ram_wp" to trap the page write operation. Page of this
>>>>>>>>>>>>> new p2m type is read only and for write, the request will go
>>>>>>>>>>>>> to device model via
> ioreq-server.
>>>>>>>>>>>> 
>>>>>>>>>>>> So how is this write-protection supposed to work on the
>>>>>>>>>>>> IOMMU side when sharing page tables?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for pointing out this question. Write-protection is not
>>>>>>>>>>> supposed to work when sharing page tables between EPT and
>>>>>>>>>>> vt-d. An explicit command line "iommu=no-sharept" should be
>>>>>>>>>>> setted for avoiding undesirable iommu fault.
>>>>>>>>>> 
>>>>>>>>>> Requiring the unconditional use of a specific command line
>>>>>>>>>> option is certainly fine for experimental code, but not beyond that.
>>>>>>>>>> Behavior should be correct by default in production code.
>>>>>>>>>> 
>>>>>>>>>> But what's worse here: The option _allows_ device side
>>>>>>>>>> writes from the guest. Why would device side writes be
>>>>>>>>>> okay, but CPU side ones not?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> right, whether ept is shared or not doesn't address the concern
>>>>>>>>> here. In both cases we need maintain the same p2m view between
>>>>>>>>> CPU and device side, otherwise it's broken...
>>>>>>>>> 
>>>>>>>>> One option is to treat wp similar to logdirty, i.e.
>>>>>>>>> exclusive to VT-d device assignment, until in the future
>>>>>>>>> VT-d supports page fault. We can provide a boot option to
>>>>>>>>> override the default policy if user thinks OK.
>>>>>>>>> 
>>>>>>>>> 2nd option, like Wei mentioned in another mail, is to treat
>>>>>>>>> such write protected PPGTT page tables as MMIO. new
>>>>>>>>> hypercall is required to change the p2m type between p2m_io
>>>>>>>>> and p2m_ram, based on allocation/free of guest page table.
>>>>>>>>> This way may impact performance on read though.
>>>>>>>>> 
>>>>>>>>> Comments?
>>>>>>>> 
>>>>>>>> Another solution is using the EPT misconfiguration mechanism
>>>>>>>> like what Xen does for MTRR emulation currently.
>>>>>>> 
>>>>>>> That would cause faults on reads as well, making it necessary
>>>>>>> to emulate them.
>>>>>>> 
>>>>>> 
>>>>>> Then it's same effect of p2m_mmio_dm.
>>>>> 
>>>>> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
>>>>> 
>>>> 
>>>> I don't understand. Treating it as emulated io just means no
>>>> valid p2m entry in EPT/VT-d. Note XenGT itself doesn't require
>>>> VT-d, while you can still assign other devices this way.
>>> 
>>> From guest's view, it doesn't know the page is not valid in EPT/VT-d
>>> table. So if guest CPU touch this page, then hypervisor will intercept
>>> the access from EPT violation and handle it. But if guest uses that
>>> page as DMA buffer (a malicious guest may do it), then VT-d fault
>>> happens. Since DMA is not restartable, guest will never receive the
>>> right data.
>>> 
>> 
>> as you said, it's malicious guest, that's why we need zap the VT-d

Malicious guest is just an example. We can assume a normal guest never does it.

>> entries to avoid mallicous DMA to GPU page tables. In sane case the
>> page is allocated by gfx driver so other device drivers with VT-d
>> assigned devices shouldn't use it as DMA target.

What about a guest really does it?

>> There is no 'right data' to be ensured in such case. :-)
>> 
> 
> Think about a malicious guest may DMA to any holes...

Other hole is not a RAM, so a VT-d fault is acceptable.

> 
> Thanks
> Kevin


Best regards,
Yang

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06  3:04                       ` Zhang, Yang Z
@ 2014-08-06 15:00                         ` Konrad Rzeszutek Wilk
  2014-08-06 16:08                           ` Tian, Kevin
  2014-08-06 17:38                           ` Tian, Kevin
  0 siblings, 2 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-08-06 15:00 UTC (permalink / raw)
  To: Zhang, Yang Z, donald.d.dugger
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, ian.jackson,
	tim, xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich, Ye, Wei

On Wed, Aug 06, 2014 at 03:04:31AM +0000, Zhang, Yang Z wrote:
> Tian, Kevin wrote on 2014-08-06:
> >> From: Tian, Kevin
> >> Sent: Tuesday, August 05, 2014 7:49 PM
> >> 
> >>> From: Zhang, Yang Z
> >>> Sent: Tuesday, August 05, 2014 7:40 PM
> >>> 
> >>> Tian, Kevin wrote on 2014-08-06:
> >>>>> From: Zhang, Yang Z
> >>>>> Sent: Tuesday, August 05, 2014 7:11 PM
> >>>>> 
> >>>>> Tian, Kevin wrote on 2014-08-05:
> >>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>>>>>> Sent: Tuesday, August 05, 2014 12:52 AM
> >>>>>>> 
> >>>>>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
> >>>>>>>>>> Tian, Kevin wrote on 2014-08-05: From: Jan Beulich
> >>>>>>>>>> [mailto:JBeulich@suse.com]
> >>>>>>>>>> Sent: Monday, August 04, 2014 12:35 AM
> >>>>>>>>>> 
> >>>>>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> >>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> Jan Beulich wrote on 2014-07-28:
> >>>>>>>>>>>> devel@lists.xen.org; keir@xen.org
> >>>>>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to
> >>>>>>>>>>>> support page write protection
> >>>>>>>>>>>> 
> >>>>>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> >>>>>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request to
> >>>>>>>>>>>>> multiple device models according to the io range. XenGT
> >>>>>>>>>>>>> (Intel Graphics Virtualization technology, please refer to
> >>>>>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtual iz
> >>>>>>>>>>>>> at io n- xengt) driver reside in Dom0 as a virtual graphics
> >>>>>>>>>>>>> device model also need to trap and emulate the guest's write
> >>>>>>>>>>>>> operation to some specific memory pages, like the memory
> >>>>>>>>>>>>> pages used by guest graphics driver as PPGTT(per-process
> >>>>>>>>>>>>> graphics translation table). We add an new p2m type
> >>>>>>>>>>>>> "p2m_ram_wp" to trap the page write operation. Page of this
> >>>>>>>>>>>>> new p2m type is read only and for write, the request will go
> >>>>>>>>>>>>> to device model via
> > ioreq-server.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> So how is this write-protection supposed to work on the
> >>>>>>>>>>>> IOMMU side when sharing page tables?
> >>>>>>>>>>> 
> >>>>>>>>>>> Thanks for pointing out this question. Write-protection is not
> >>>>>>>>>>> supposed to work when sharing page tables between EPT and
> >>>>>>>>>>> vt-d. An explicit command line "iommu=no-sharept" should be
> >>>>>>>>>>> setted for avoiding undesirable iommu fault.
> >>>>>>>>>> 
> >>>>>>>>>> Requiring the unconditional use of a specific command line
> >>>>>>>>>> option is certainly fine for experimental code, but not beyond that.
> >>>>>>>>>> Behavior should be correct by default in production code.
> >>>>>>>>>> 
> >>>>>>>>>> But what's worse here: The option _allows_ device side
> >>>>>>>>>> writes from the guest. Why would device side writes be
> >>>>>>>>>> okay, but CPU side ones not?
> >>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> right, whether ept is shared or not doesn't address the concern
> >>>>>>>>> here. In both cases we need maintain the same p2m view between
> >>>>>>>>> CPU and device side, otherwise it's broken...
> >>>>>>>>> 
> >>>>>>>>> One option is to treat wp similar to logdirty, i.e.
> >>>>>>>>> exclusive to VT-d device assignment, until in the future
> >>>>>>>>> VT-d supports page fault. We can provide a boot option to
> >>>>>>>>> override the default policy if user thinks OK.
> >>>>>>>>> 
> >>>>>>>>> 2nd option, like Wei mentioned in another mail, is to treat
> >>>>>>>>> such write protected PPGTT page tables as MMIO. new
> >>>>>>>>> hypercall is required to change the p2m type between p2m_io
> >>>>>>>>> and p2m_ram, based on allocation/free of guest page table.
> >>>>>>>>> This way may impact performance on read though.
> >>>>>>>>> 
> >>>>>>>>> Comments?
> >>>>>>>> 
> >>>>>>>> Another solution is using the EPT misconfiguration mechanism
> >>>>>>>> like what Xen does for MTRR emulation currently.
> >>>>>>> 
> >>>>>>> That would cause faults on reads as well, making it necessary
> >>>>>>> to emulate them.
> >>>>>>> 
> >>>>>> 
> >>>>>> Then it's same effect of p2m_mmio_dm.
> >>>>> 
> >>>>> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
> >>>>> 
> >>>> 
> >>>> I don't understand. Treating it as emulated io just means no
> >>>> valid p2m entry in EPT/VT-d. Note XenGT itself doesn't require
> >>>> VT-d, while you can still assign other devices this way.
> >>> 
> >>> From guest's view, it doesn't know the page is not valid in EPT/VT-d
> >>> table. So if guest CPU touch this page, then hypervisor will intercept
> >>> the access from EPT violation and handle it. But if guest uses that
> >>> page as DMA buffer (a malicious guest may do it), then VT-d fault
> >>> happens. Since DMA is not restartable, guest will never receive the
> >>> right data.
> >>> 
> >> 
> >> as you said, it's malicious guest, that's why we need zap the VT-d
> 
> Malicious guest is just an example. We can assume a normal guest never does it.

What!? No you can't. That is thundering towards an XSA! The normal
guest can become compromised.

> 
> >> entries to avoid mallicous DMA to GPU page tables. In sane case the
> >> page is allocated by gfx driver so other device drivers with VT-d
> >> assigned devices shouldn't use it as DMA target.
> 
> What about a guest really does it?

Oh, here you say it can do it. A bit confusing.
> 
> >> There is no 'right data' to be ensured in such case. :-)
> >> 
> > 
> > Think about a malicious guest may DMA to any holes...
> 
> Other hole is not a RAM, so a VT-d fault is acceptable.

I feel that we talking about the same issue that had been raised before
about EPT and VT-d sharing a page and having issues with DMAing
in RAM regions (like into the framebuffer).

And my understanding is that Intel is going to fix it, except that
it is on the 'low-priority'. It sounds to me that it should be
raised to a higher priority and be taken care of.

> 
> > 
> > Thanks
> > Kevin
> 
> 
> Best regards,
> Yang
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06 15:00                         ` Konrad Rzeszutek Wilk
@ 2014-08-06 16:08                           ` Tian, Kevin
  2014-08-07  6:45                             ` Jan Beulich
  2014-08-06 17:38                           ` Tian, Kevin
  1 sibling, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-06 16:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Zhang, Yang Z, Dugger, Donald D
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, tim,
	xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich, Ye, Wei

> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, August 06, 2014 8:01 AM
> 
> On Wed, Aug 06, 2014 at 03:04:31AM +0000, Zhang, Yang Z wrote:
> > Tian, Kevin wrote on 2014-08-06:
> > >> From: Tian, Kevin
> > >> Sent: Tuesday, August 05, 2014 7:49 PM
> > >>
> > >>> From: Zhang, Yang Z
> > >>> Sent: Tuesday, August 05, 2014 7:40 PM
> > >>>
> > >>> Tian, Kevin wrote on 2014-08-06:
> > >>>>> From: Zhang, Yang Z
> > >>>>> Sent: Tuesday, August 05, 2014 7:11 PM
> > >>>>>
> > >>>>> Tian, Kevin wrote on 2014-08-05:
> > >>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> > >>>>>>> Sent: Tuesday, August 05, 2014 12:52 AM
> > >>>>>>>
> > >>>>>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
> > >>>>>>>>>> Tian, Kevin wrote on 2014-08-05: From: Jan Beulich
> > >>>>>>>>>> [mailto:JBeulich@suse.com]
> > >>>>>>>>>> Sent: Monday, August 04, 2014 12:35 AM
> > >>>>>>>>>>
> > >>>>>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Jan Beulich wrote on 2014-07-28:
> > >>>>>>>>>>>> devel@lists.xen.org; keir@xen.org
> > >>>>>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to
> > >>>>>>>>>>>> support page write protection
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> > >>>>>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO request
> to
> > >>>>>>>>>>>>> multiple device models according to the io range. XenGT
> > >>>>>>>>>>>>> (Intel Graphics Virtualization technology, please refer to
> > >>>>>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtual iz
> > >>>>>>>>>>>>> at io n- xengt) driver reside in Dom0 as a virtual graphics
> > >>>>>>>>>>>>> device model also need to trap and emulate the guest's
> write
> > >>>>>>>>>>>>> operation to some specific memory pages, like the memory
> > >>>>>>>>>>>>> pages used by guest graphics driver as PPGTT(per-process
> > >>>>>>>>>>>>> graphics translation table). We add an new p2m type
> > >>>>>>>>>>>>> "p2m_ram_wp" to trap the page write operation. Page of
> this
> > >>>>>>>>>>>>> new p2m type is read only and for write, the request will go
> > >>>>>>>>>>>>> to device model via
> > > ioreq-server.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> So how is this write-protection supposed to work on the
> > >>>>>>>>>>>> IOMMU side when sharing page tables?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks for pointing out this question. Write-protection is not
> > >>>>>>>>>>> supposed to work when sharing page tables between EPT and
> > >>>>>>>>>>> vt-d. An explicit command line "iommu=no-sharept" should be
> > >>>>>>>>>>> setted for avoiding undesirable iommu fault.
> > >>>>>>>>>>
> > >>>>>>>>>> Requiring the unconditional use of a specific command line
> > >>>>>>>>>> option is certainly fine for experimental code, but not beyond
> that.
> > >>>>>>>>>> Behavior should be correct by default in production code.
> > >>>>>>>>>>
> > >>>>>>>>>> But what's worse here: The option _allows_ device side
> > >>>>>>>>>> writes from the guest. Why would device side writes be
> > >>>>>>>>>> okay, but CPU side ones not?
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> right, whether ept is shared or not doesn't address the concern
> > >>>>>>>>> here. In both cases we need maintain the same p2m view
> between
> > >>>>>>>>> CPU and device side, otherwise it's broken...
> > >>>>>>>>>
> > >>>>>>>>> One option is to treat wp similar to logdirty, i.e.
> > >>>>>>>>> exclusive to VT-d device assignment, until in the future
> > >>>>>>>>> VT-d supports page fault. We can provide a boot option to
> > >>>>>>>>> override the default policy if user thinks OK.
> > >>>>>>>>>
> > >>>>>>>>> 2nd option, like Wei mentioned in another mail, is to treat
> > >>>>>>>>> such write protected PPGTT page tables as MMIO. new
> > >>>>>>>>> hypercall is required to change the p2m type between p2m_io
> > >>>>>>>>> and p2m_ram, based on allocation/free of guest page table.
> > >>>>>>>>> This way may impact performance on read though.
> > >>>>>>>>>
> > >>>>>>>>> Comments?
> > >>>>>>>>
> > >>>>>>>> Another solution is using the EPT misconfiguration mechanism
> > >>>>>>>> like what Xen does for MTRR emulation currently.
> > >>>>>>>
> > >>>>>>> That would cause faults on reads as well, making it necessary
> > >>>>>>> to emulate them.
> > >>>>>>>
> > >>>>>>
> > >>>>>> Then it's same effect of p2m_mmio_dm.
> > >>>>>
> > >>>>> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
> > >>>>>
> > >>>>
> > >>>> I don't understand. Treating it as emulated io just means no
> > >>>> valid p2m entry in EPT/VT-d. Note XenGT itself doesn't require
> > >>>> VT-d, while you can still assign other devices this way.
> > >>>
> > >>> From guest's view, it doesn't know the page is not valid in EPT/VT-d
> > >>> table. So if guest CPU touch this page, then hypervisor will intercept
> > >>> the access from EPT violation and handle it. But if guest uses that
> > >>> page as DMA buffer (a malicious guest may do it), then VT-d fault
> > >>> happens. Since DMA is not restartable, guest will never receive the
> > >>> right data.
> > >>>
> > >>
> > >> as you said, it's malicious guest, that's why we need zap the VT-d
> >
> > Malicious guest is just an example. We can assume a normal guest never
> does it.
> 
> What!? No you can't. That is thundering towards an XSA! The normal
> guest can become compromised.

It would be a severe security hole to write-protect only the CPU path while 
leaving DMA writes valid. When we say a page is write-protected, it must be 
consistently enforced in both CPU/DMA paths. Now there is no other way to 
achieve it lacking of VT-d page fault, except treating it fully as a emulated 
MMIO range (means removing mapping in both EPT/VT-d page table), or 
excluding VT-d like this patch originally does.

> 
> >
> > >> entries to avoid mallicous DMA to GPU page tables. In sane case the
> > >> page is allocated by gfx driver so other device drivers with VT-d
> > >> assigned devices shouldn't use it as DMA target.
> >
> > What about a guest really does it?
> 
> Oh, here you say it can do it. A bit confusing.
> >
> > >> There is no 'right data' to be ensured in such case. :-)
> > >>
> > >
> > > Think about a malicious guest may DMA to any holes...
> >
> > Other hole is not a RAM, so a VT-d fault is acceptable.
> 
> I feel that we talking about the same issue that had been raised before
> about EPT and VT-d sharing a page and having issues with DMAing
> in RAM regions (like into the framebuffer).
> 
> And my understanding is that Intel is going to fix it, except that
> it is on the 'low-priority'. It sounds to me that it should be
> raised to a higher priority and be taken care of.
> 

I'm not in original discussion loop, but my feeling is w/o VT-d fault
support anything related to write-protection (e.g. logdirty) will remains
a gap here (i.e. the feature has to be exclusive to VT-d usage)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06 15:00                         ` Konrad Rzeszutek Wilk
  2014-08-06 16:08                           ` Tian, Kevin
@ 2014-08-06 17:38                           ` Tian, Kevin
  1 sibling, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2014-08-06 17:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Zhang, Yang Z, Dugger, Donald D
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, tim,
	xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich, Ye, Wei

> From: Tian, Kevin
> Sent: Wednesday, August 06, 2014 9:09 AM
> 
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > Sent: Wednesday, August 06, 2014 8:01 AM
> >
> > On Wed, Aug 06, 2014 at 03:04:31AM +0000, Zhang, Yang Z wrote:
> > > Tian, Kevin wrote on 2014-08-06:
> > > >> From: Tian, Kevin
> > > >> Sent: Tuesday, August 05, 2014 7:49 PM
> > > >>
> > > >>> From: Zhang, Yang Z
> > > >>> Sent: Tuesday, August 05, 2014 7:40 PM
> > > >>>
> > > >>> Tian, Kevin wrote on 2014-08-06:
> > > >>>>> From: Zhang, Yang Z
> > > >>>>> Sent: Tuesday, August 05, 2014 7:11 PM
> > > >>>>>
> > > >>>>> Tian, Kevin wrote on 2014-08-05:
> > > >>>>>>> From: Jan Beulich [mailto:JBeulich@suse.com]
> > > >>>>>>> Sent: Tuesday, August 05, 2014 12:52 AM
> > > >>>>>>>
> > > >>>>>>>>>> On 05.08.14 at 09:35, <yang.z.zhang@intel.com> wrote:
> > > >>>>>>>>>> Tian, Kevin wrote on 2014-08-05: From: Jan Beulich
> > > >>>>>>>>>> [mailto:JBeulich@suse.com]
> > > >>>>>>>>>> Sent: Monday, August 04, 2014 12:35 AM
> > > >>>>>>>>>>
> > > >>>>>>>>>>>>> On 04.08.14 at 07:05, <wei.ye@intel.com> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Jan Beulich wrote on 2014-07-28:
> > > >>>>>>>>>>>> devel@lists.xen.org; keir@xen.org
> > > >>>>>>>>>>>> Subject: Re: [PATCH v1 0/2] Extend ioreq-server to
> > > >>>>>>>>>>>> support page write protection
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On 28.07.14 at 19:55, <wei.ye@intel.com> wrote:
> > > >>>>>>>>>>>>> ioreq-server is proposed to forward PIO and MMIO
> request
> > to
> > > >>>>>>>>>>>>> multiple device models according to the io range. XenGT
> > > >>>>>>>>>>>>> (Intel Graphics Virtualization technology, please refer to
> > > >>>>>>>>>>>>> https://01.org/xen/blogs/srclarkx/2013/graphics-virtual iz
> > > >>>>>>>>>>>>> at io n- xengt) driver reside in Dom0 as a virtual graphics
> > > >>>>>>>>>>>>> device model also need to trap and emulate the guest's
> > write
> > > >>>>>>>>>>>>> operation to some specific memory pages, like the
> memory
> > > >>>>>>>>>>>>> pages used by guest graphics driver as PPGTT(per-process
> > > >>>>>>>>>>>>> graphics translation table). We add an new p2m type
> > > >>>>>>>>>>>>> "p2m_ram_wp" to trap the page write operation. Page of
> > this
> > > >>>>>>>>>>>>> new p2m type is read only and for write, the request will
> go
> > > >>>>>>>>>>>>> to device model via
> > > > ioreq-server.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> So how is this write-protection supposed to work on the
> > > >>>>>>>>>>>> IOMMU side when sharing page tables?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks for pointing out this question. Write-protection is not
> > > >>>>>>>>>>> supposed to work when sharing page tables between EPT
> and
> > > >>>>>>>>>>> vt-d. An explicit command line "iommu=no-sharept" should
> be
> > > >>>>>>>>>>> setted for avoiding undesirable iommu fault.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Requiring the unconditional use of a specific command line
> > > >>>>>>>>>> option is certainly fine for experimental code, but not beyond
> > that.
> > > >>>>>>>>>> Behavior should be correct by default in production code.
> > > >>>>>>>>>>
> > > >>>>>>>>>> But what's worse here: The option _allows_ device side
> > > >>>>>>>>>> writes from the guest. Why would device side writes be
> > > >>>>>>>>>> okay, but CPU side ones not?
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> right, whether ept is shared or not doesn't address the concern
> > > >>>>>>>>> here. In both cases we need maintain the same p2m view
> > between
> > > >>>>>>>>> CPU and device side, otherwise it's broken...
> > > >>>>>>>>>
> > > >>>>>>>>> One option is to treat wp similar to logdirty, i.e.
> > > >>>>>>>>> exclusive to VT-d device assignment, until in the future
> > > >>>>>>>>> VT-d supports page fault. We can provide a boot option to
> > > >>>>>>>>> override the default policy if user thinks OK.
> > > >>>>>>>>>
> > > >>>>>>>>> 2nd option, like Wei mentioned in another mail, is to treat
> > > >>>>>>>>> such write protected PPGTT page tables as MMIO. new
> > > >>>>>>>>> hypercall is required to change the p2m type between p2m_io
> > > >>>>>>>>> and p2m_ram, based on allocation/free of guest page table.
> > > >>>>>>>>> This way may impact performance on read though.
> > > >>>>>>>>>
> > > >>>>>>>>> Comments?
> > > >>>>>>>>
> > > >>>>>>>> Another solution is using the EPT misconfiguration mechanism
> > > >>>>>>>> like what Xen does for MTRR emulation currently.
> > > >>>>>>>
> > > >>>>>>> That would cause faults on reads as well, making it necessary
> > > >>>>>>> to emulate them.
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> Then it's same effect of p2m_mmio_dm.
> > > >>>>>
> > > >>>>> p2m_mmio_dm will break VT-d but EPT misconfiguration doesn't.
> > > >>>>>
> > > >>>>
> > > >>>> I don't understand. Treating it as emulated io just means no
> > > >>>> valid p2m entry in EPT/VT-d. Note XenGT itself doesn't require
> > > >>>> VT-d, while you can still assign other devices this way.
> > > >>>
> > > >>> From guest's view, it doesn't know the page is not valid in EPT/VT-d
> > > >>> table. So if guest CPU touch this page, then hypervisor will intercept
> > > >>> the access from EPT violation and handle it. But if guest uses that
> > > >>> page as DMA buffer (a malicious guest may do it), then VT-d fault
> > > >>> happens. Since DMA is not restartable, guest will never receive the
> > > >>> right data.
> > > >>>
> > > >>
> > > >> as you said, it's malicious guest, that's why we need zap the VT-d
> > >
> > > Malicious guest is just an example. We can assume a normal guest never
> > does it.
> >
> > What!? No you can't. That is thundering towards an XSA! The normal
> > guest can become compromised.
> 
> It would be a severe security hole to write-protect only the CPU path while
> leaving DMA writes valid. When we say a page is write-protected, it must be
> consistently enforced in both CPU/DMA paths. Now there is no other way to
> achieve it lacking of VT-d page fault, except treating it fully as a emulated
> MMIO range (means removing mapping in both EPT/VT-d page table), or
> excluding VT-d like this patch originally does.
> 

Two corrections:

- it's still a functional problem instead of security problem. with EPT/VT-d a
malicious VM anyway can only access its own memory.

- A general write-protection mechanism is not possible until VT-d page fault
is there. The options just talk about XenGT specific requirement, which only
cares about CPU modification to the GPU page table.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-06 16:08                           ` Tian, Kevin
@ 2014-08-07  6:45                             ` Jan Beulich
  2014-08-07 16:28                               ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-08-07  6:45 UTC (permalink / raw)
  To: Kevin Tian
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	Donald D Dugger, xen-devel, Paul.Durrant, Zhiyuan Lv,
	Yang Z Zhang, Wei Ye

>>> On 06.08.14 at 18:08, <kevin.tian@intel.com> wrote:
>>  From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> On Wed, Aug 06, 2014 at 03:04:31AM +0000, Zhang, Yang Z wrote:
>> > Malicious guest is just an example. We can assume a normal guest never
>> does it.
>> 
>> What!? No you can't. That is thundering towards an XSA! The normal
>> guest can become compromised.
> 
> It would be a severe security hole to write-protect only the CPU path while 
> leaving DMA writes valid. When we say a page is write-protected, it must be 
> consistently enforced in both CPU/DMA paths. Now there is no other way to 
> achieve it lacking of VT-d page fault, except treating it fully as a emulated 
> MMIO range (means removing mapping in both EPT/VT-d page table), or 
> excluding VT-d like this patch originally does.

A fully emulated range means more than just removing the mapping
though: No passed through device must access the memory range,
since you can't emulate such accesses. While this is fine for existing
uses (LAPIC, HPET, ...), I don't think it is for things like a frame
buffer. If otoh talk is exclusively about GPU page tables, then this
ought to be fine, but would - to me - not fit with other aspects
discussed in the context of these patches (like the supposedly large
number of pages needing to be tracked for a guest).

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-07  6:45                             ` Jan Beulich
@ 2014-08-07 16:28                               ` Tian, Kevin
  2014-08-08  6:32                                 ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-07 16:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Zhang, Yang Z,
	Ye, Wei

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, August 06, 2014 11:45 PM
> 
> >>> On 06.08.14 at 18:08, <kevin.tian@intel.com> wrote:
> >>  From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> >> On Wed, Aug 06, 2014 at 03:04:31AM +0000, Zhang, Yang Z wrote:
> >> > Malicious guest is just an example. We can assume a normal guest never
> >> does it.
> >>
> >> What!? No you can't. That is thundering towards an XSA! The normal
> >> guest can become compromised.
> >
> > It would be a severe security hole to write-protect only the CPU path while
> > leaving DMA writes valid. When we say a page is write-protected, it must be
> > consistently enforced in both CPU/DMA paths. Now there is no other way to
> > achieve it lacking of VT-d page fault, except treating it fully as a emulated
> > MMIO range (means removing mapping in both EPT/VT-d page table), or
> > excluding VT-d like this patch originally does.
> 
> A fully emulated range means more than just removing the mapping
> though: No passed through device must access the memory range,
> since you can't emulate such accesses. While this is fine for existing
> uses (LAPIC, HPET, ...), I don't think it is for things like a frame
> buffer. If otoh talk is exclusively about GPU page tables, then this

I don't understand here. why would a pass-through device wants to access
frame buffer (I think you meant 'virtual' frame buffer)? Note XenGT itself
doesn't use VT-d, and we didn't see such usage of having another device
DMA to GPU resource. Then it's just same as any existing emulated MMIO
ranges e.g. on a virtual NIC, where no pass-through device would access 
them.

> ought to be fine, but would - to me - not fit with other aspects
> discussed in the context of these patches (like the supposedly large
> number of pages needing to be tracked for a guest).
> 

So what's your suggestion here?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-07 16:28                               ` Tian, Kevin
@ 2014-08-08  6:32                                 ` Jan Beulich
  2014-08-08 16:02                                   ` Tian, Kevin
  2014-08-08 16:04                                   ` Tian, Kevin
  0 siblings, 2 replies; 46+ messages in thread
From: Jan Beulich @ 2014-08-08  6:32 UTC (permalink / raw)
  To: Kevin Tian
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson,
	Donald D Dugger, xen-devel, Paul.Durrant, Zhiyuan Lv,
	Yang Z Zhang, Wei Ye

>>> On 07.08.14 at 18:28, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Wednesday, August 06, 2014 11:45 PM
>> >>> On 06.08.14 at 18:08, <kevin.tian@intel.com> wrote:
>> > It would be a severe security hole to write-protect only the CPU path while
>> > leaving DMA writes valid. When we say a page is write-protected, it must be
>> > consistently enforced in both CPU/DMA paths. Now there is no other way to
>> > achieve it lacking of VT-d page fault, except treating it fully as a emulated
>> > MMIO range (means removing mapping in both EPT/VT-d page table), or
>> > excluding VT-d like this patch originally does.
>> 
>> A fully emulated range means more than just removing the mapping
>> though: No passed through device must access the memory range,
>> since you can't emulate such accesses. While this is fine for existing
>> uses (LAPIC, HPET, ...), I don't think it is for things like a frame
>> buffer. If otoh talk is exclusively about GPU page tables, then this
> 
> I don't understand here. why would a pass-through device wants to access
> frame buffer (I think you meant 'virtual' frame buffer)? Note XenGT itself
> doesn't use VT-d, and we didn't see such usage of having another device
> DMA to GPU resource. Then it's just same as any existing emulated MMIO
> ranges e.g. on a virtual NIC, where no pass-through device would access 
> them.

This is the same as the discussion about VT-d code not supporting
large pages right now: An OS _might_ do clever things (merely
dumping the frame buffer verbatim to disk or network would be one
of the more trivial examples). Current OSes (or more precisely
their drivers) may not do this, but I can only repeat my fundamental
position here: We should not make assumptions about HVM guest
behavior except in cases where we want to optimize certain things.

>> ought to be fine, but would - to me - not fit with other aspects
>> discussed in the context of these patches (like the supposedly large
>> number of pages needing to be tracked for a guest).
> 
> So what's your suggestion here?

I can't answer that without you clarifying whether talk earlier on
was about the frame buffer or page tables.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-08  6:32                                 ` Jan Beulich
@ 2014-08-08 16:02                                   ` Tian, Kevin
  2014-08-08 16:04                                   ` Tian, Kevin
  1 sibling, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2014-08-08 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Zhang, Yang Z,
	Ye, Wei

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, August 07, 2014 11:33 PM
> 
> >>> On 07.08.14 at 18:28, <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Wednesday, August 06, 2014 11:45 PM
> >> >>> On 06.08.14 at 18:08, <kevin.tian@intel.com> wrote:
> >> > It would be a severe security hole to write-protect only the CPU path
> while
> >> > leaving DMA writes valid. When we say a page is write-protected, it must
> be
> >> > consistently enforced in both CPU/DMA paths. Now there is no other way
> to
> >> > achieve it lacking of VT-d page fault, except treating it fully as a
> emulated
> >> > MMIO range (means removing mapping in both EPT/VT-d page table), or
> >> > excluding VT-d like this patch originally does.
> >>
> >> A fully emulated range means more than just removing the mapping
> >> though: No passed through device must access the memory range,
> >> since you can't emulate such accesses. While this is fine for existing
> >> uses (LAPIC, HPET, ...), I don't think it is for things like a frame
> >> buffer. If otoh talk is exclusively about GPU page tables, then this
> >
> > I don't understand here. why would a pass-through device wants to access
> > frame buffer (I think you meant 'virtual' frame buffer)? Note XenGT itself
> > doesn't use VT-d, and we didn't see such usage of having another device
> > DMA to GPU resource. Then it's just same as any existing emulated MMIO
> > ranges e.g. on a virtual NIC, where no pass-through device would access
> > them.
> 
> This is the same as the discussion about VT-d code not supporting
> large pages right now: An OS _might_ do clever things (merely
> dumping the frame buffer verbatim to disk or network would be one
> of the more trivial examples). Current OSes (or more precisely
> their drivers) may not do this, but I can only repeat my fundamental
> position here: We should not make assumptions about HVM guest
> behavior except in cases where we want to optimize certain things.

with a complete HW support yes we should make assumption for guest
behavior (e.g. with VT-x we can support any type of OS), but when HW
support in VT-d is not fully ready, then making some assumption makes
sense especially when no real usage is observed in this case. 

> 
> >> ought to be fine, but would - to me - not fit with other aspects
> >> discussed in the context of these patches (like the supposedly large
> >> number of pages needing to be tracked for a guest).
> >
> > So what's your suggestion here?
> 
> I can't answer that without you clarifying whether talk earlier on
> was about the frame buffer or page tables.
> 

In our case it's only GPU page table. XenGT assigns frame buffer to the
VM so there's no concern in that path.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-08  6:32                                 ` Jan Beulich
  2014-08-08 16:02                                   ` Tian, Kevin
@ 2014-08-08 16:04                                   ` Tian, Kevin
  2014-08-12 23:15                                     ` Ye, Wei
  1 sibling, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-08 16:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, stefano.stabellini, tim, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Zhang, Yang Z,
	Ye, Wei

> From: Tian, Kevin
> Sent: Friday, August 08, 2014 9:02 AM
> 
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Thursday, August 07, 2014 11:33 PM
> >
> > >>> On 07.08.14 at 18:28, <kevin.tian@intel.com> wrote:
> > >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> > >> Sent: Wednesday, August 06, 2014 11:45 PM
> > >> >>> On 06.08.14 at 18:08, <kevin.tian@intel.com> wrote:
> > >> > It would be a severe security hole to write-protect only the CPU path
> > while
> > >> > leaving DMA writes valid. When we say a page is write-protected, it
> must
> > be
> > >> > consistently enforced in both CPU/DMA paths. Now there is no other
> way
> > to
> > >> > achieve it lacking of VT-d page fault, except treating it fully as a
> > emulated
> > >> > MMIO range (means removing mapping in both EPT/VT-d page table),
> or
> > >> > excluding VT-d like this patch originally does.
> > >>
> > >> A fully emulated range means more than just removing the mapping
> > >> though: No passed through device must access the memory range,
> > >> since you can't emulate such accesses. While this is fine for existing
> > >> uses (LAPIC, HPET, ...), I don't think it is for things like a frame
> > >> buffer. If otoh talk is exclusively about GPU page tables, then this
> > >
> > > I don't understand here. why would a pass-through device wants to access
> > > frame buffer (I think you meant 'virtual' frame buffer)? Note XenGT itself
> > > doesn't use VT-d, and we didn't see such usage of having another device
> > > DMA to GPU resource. Then it's just same as any existing emulated MMIO
> > > ranges e.g. on a virtual NIC, where no pass-through device would access
> > > them.
> >
> > This is the same as the discussion about VT-d code not supporting
> > large pages right now: An OS _might_ do clever things (merely
> > dumping the frame buffer verbatim to disk or network would be one
> > of the more trivial examples). Current OSes (or more precisely
> > their drivers) may not do this, but I can only repeat my fundamental
> > position here: We should not make assumptions about HVM guest
> > behavior except in cases where we want to optimize certain things.
> 
> with a complete HW support yes we should make assumption for guest

should -> shouldn't

> behavior (e.g. with VT-x we can support any type of OS), but when HW
> support in VT-d is not fully ready, then making some assumption makes
> sense especially when no real usage is observed in this case.
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-08 16:04                                   ` Tian, Kevin
@ 2014-08-12 23:15                                     ` Ye, Wei
  2014-08-13  8:38                                       ` Tim Deegan
  0 siblings, 1 reply; 46+ messages in thread
From: Ye, Wei @ 2014-08-12 23:15 UTC (permalink / raw)
  To: xen-devel, Jan Beulich
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, tim,
	ian.jackson, Dugger, Donald D, Paul.Durrant, Lv, Zhiyuan, Zhang,
	Yang Z

Some summary of the discussion:

Without vt-d page fault support, a general write-protection is not possible. It's impossible to protect the write of both CPU and DMA path.
But note that XenGT itself doesn't use VT-d, and there is no another usage of having another device DMA to GPU resource. Or even other
device use the protect page for its DMA buffer, like a malicious guest, there is no security hole since it can only access it's our memory range.
So, just keep protecting the CPU write path may be feasible for this moment.

Another way is treat the page as MMIO ranges(means removing mapping in both EPD/VT-D page table). In this way, both read/write access
should be forwarded and emulated. For XenGT usage case, we observed that for the Linux guest, there is no read access to the ppgtt page
tables. And for the windows guest, the read access is just about 20% of write. The read overhead is relatively small, so treat the pages as MMIO
range may be possible.

Any comments?

Regards
Wei

> -----Original Message-----
> From: Tian, Kevin
> Sent: Saturday, August 9, 2014 12:04 AM
> To: Jan Beulich
> Cc: ian.campbell@citrix.com; Paul.Durrant@citrix.com;
> ian.jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com; Dugger, Donald
> D; Ye, Wei; Zhang, Yang Z; Lv, Zhiyuan; xen-devel@lists.xen.org; Konrad
> Rzeszutek Wilk; keir@xen.org; tim@xen.org
> Subject: RE: [Xen-devel] [PATCH v1 0/2] Extend ioreq-server to support page
> write protection
> 
> > From: Tian, Kevin
> > Sent: Friday, August 08, 2014 9:02 AM
> >
> > > From: Jan Beulich [mailto:JBeulich@suse.com]
> > > Sent: Thursday, August 07, 2014 11:33 PM
> > >
> > > >>> On 07.08.14 at 18:28, <kevin.tian@intel.com> wrote:
> > > >>  From: Jan Beulich [mailto:JBeulich@suse.com]
> > > >> Sent: Wednesday, August 06, 2014 11:45 PM
> > > >> >>> On 06.08.14 at 18:08, <kevin.tian@intel.com> wrote:
> > > >> > It would be a severe security hole to write-protect only the
> > > >> > CPU path
> > > while
> > > >> > leaving DMA writes valid. When we say a page is
> > > >> > write-protected, it
> > must
> > > be
> > > >> > consistently enforced in both CPU/DMA paths. Now there is no
> > > >> > other
> > way
> > > to
> > > >> > achieve it lacking of VT-d page fault, except treating it fully
> > > >> > as a
> > > emulated
> > > >> > MMIO range (means removing mapping in both EPT/VT-d page
> > > >> > table),
> > or
> > > >> > excluding VT-d like this patch originally does.
> > > >>
> > > >> A fully emulated range means more than just removing the mapping
> > > >> though: No passed through device must access the memory range,
> > > >> since you can't emulate such accesses. While this is fine for
> > > >> existing uses (LAPIC, HPET, ...), I don't think it is for things
> > > >> like a frame buffer. If otoh talk is exclusively about GPU page
> > > >> tables, then this
> > > >
> > > > I don't understand here. why would a pass-through device wants to
> > > > access frame buffer (I think you meant 'virtual' frame buffer)?
> > > > Note XenGT itself doesn't use VT-d, and we didn't see such usage
> > > > of having another device DMA to GPU resource. Then it's just same
> > > > as any existing emulated MMIO ranges e.g. on a virtual NIC, where
> > > > no pass-through device would access them.
> > >
> > > This is the same as the discussion about VT-d code not supporting
> > > large pages right now: An OS _might_ do clever things (merely
> > > dumping the frame buffer verbatim to disk or network would be one of
> > > the more trivial examples). Current OSes (or more precisely their
> > > drivers) may not do this, but I can only repeat my fundamental
> > > position here: We should not make assumptions about HVM guest
> > > behavior except in cases where we want to optimize certain things.
> >
> > with a complete HW support yes we should make assumption for guest
> 
> should -> shouldn't
> 
> > behavior (e.g. with VT-x we can support any type of OS), but when HW
> > support in VT-d is not fully ready, then making some assumption makes
> > sense especially when no real usage is observed in this case.
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-12 23:15                                     ` Ye, Wei
@ 2014-08-13  8:38                                       ` Tim Deegan
  2014-08-13 16:14                                         ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Tim Deegan @ 2014-08-13  8:38 UTC (permalink / raw)
  To: Ye, Wei
  Cc: Tian, Kevin, keir, ian.campbell, stefano.stabellini, ian.jackson,
	Dugger, Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan,
	Jan Beulich, Zhang, Yang Z

At 23:15 +0000 on 12 Aug (1407881757), Ye, Wei wrote:
> Some summary of the discussion:
> 
> Without vt-d page fault support, a general write-protection is not
> possible. It's impossible to protect the write of both CPU and DMA
> path.  But note that XenGT itself doesn't use VT-d, and there is no
> another usage of having another device DMA to GPU resource.

None that you're aware of.  But there might be some out-of-tree users?

> Or even
> other device use the protect page for its DMA buffer, like a
> malicious guest, there is no security hole since it can only access
> it's our memory range.

Not true at all!  If the guest has a read-only mapping it must be
read-only for a reason.  For example, it might have a read-only grant
mapping of another VM's memory.  Or an out-of-VM security tool might
be enforcing W^X permissions.

>  So, just keep protecting the CPU write path may be feasible for this moment.

I don't think so. 

> Another way is treat the page as MMIO ranges(means removing mapping
> in both EPD/VT-D page table). In this way, both read/write access
> should be forwarded and emulated. For XenGT usage case, we observed
> that for the Linux guest, there is no read access to the ppgtt page
> tables. And for the windows guest, the read access is just about 20%
> of write. The read overhead is relatively small, so treat the pages
> as MMIO range may be possible.

That might work for XenGT but it doesn't address the general problem
at all.  AFAICT the only _safe_ way to make a page read-only to a guest
that can issue DMA is to remove it from the IOMMU tables - and that
means having separate EPT and vtD tables, until hardware support
for restartable IOMMU faults comes along.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-13  8:38                                       ` Tim Deegan
@ 2014-08-13 16:14                                         ` Tian, Kevin
  2014-08-14  8:08                                           ` Tim Deegan
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-13 16:14 UTC (permalink / raw)
  To: Tim Deegan, Ye, Wei
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich,
	Zhang, Yang Z

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Wednesday, August 13, 2014 1:39 AM
> 
> At 23:15 +0000 on 12 Aug (1407881757), Ye, Wei wrote:
> > Some summary of the discussion:
> >
> > Without vt-d page fault support, a general write-protection is not
> > possible. It's impossible to protect the write of both CPU and DMA
> > path.  But note that XenGT itself doesn't use VT-d, and there is no
> > another usage of having another device DMA to GPU resource.
> 
> None that you're aware of.  But there might be some out-of-tree users?

If there is, then it has to be in the unsupported feature list. I think it's fair 
enough since we all know w/o complete VT-d page fault support there's 
no solution for such usage. Pretty much like old days big real mode tricks
before hardware support is ready.

> 
> > Or even
> > other device use the protect page for its DMA buffer, like a
> > malicious guest, there is no security hole since it can only access
> > it's our memory range.
> 
> Not true at all!  If the guest has a read-only mapping it must be
> read-only for a reason.  For example, it might have a read-only grant
> mapping of another VM's memory.  Or an out-of-VM security tool might
> be enforcing W^X permissions.

sorry original statement is inaccurate. there is no cross-VM security hole
since EPT/VT-d are fully under VMM's control, but yes out-of-VM security
tool might be broken... 

> 
> >  So, just keep protecting the CPU write path may be feasible for this
> moment.
> 
> I don't think so.

Could we introduce a cpu-write-protection attribute only for functional
usages which don't care about DMA path like in XenGT? Then out-of-VM
security tool relying on complete write-protection in both CPU/DMA
paths still need to wait full VT-d page fault. I know it's not clean, but
want to hear your comments anything we can do here in current stage. :-)

> 
> > Another way is treat the page as MMIO ranges(means removing mapping
> > in both EPD/VT-D page table). In this way, both read/write access
> > should be forwarded and emulated. For XenGT usage case, we observed
> > that for the Linux guest, there is no read access to the ppgtt page
> > tables. And for the windows guest, the read access is just about 20%
> > of write. The read overhead is relatively small, so treat the pages
> > as MMIO range may be possible.
> 
> That might work for XenGT but it doesn't address the general problem
> at all.  AFAICT the only _safe_ way to make a page read-only to a guest
> that can issue DMA is to remove it from the IOMMU tables - and that
> means having separate EPT and vtD tables, until hardware support
> for restartable IOMMU faults comes along.

Aren't we talking about a similar option? ours is more conservative, not only
removing from IOMMU table, but also removing from EPT so both r/w are
trapped which serves the read-only requirements with performance penalty.
along this way we didn't aim to provide a general write-protection mechanism,
but just to extend ioreq server to support scattered mmio ranges. If adding
a new p2m type is deemed dirty, we may go to this option since it can meet
XenGT's requirement, and doesn't force other usages to go same route.

Actually separate or shared EPT/VT-d tables doesn't mean anything here. 
Removing the page from IOMMU tables still breaks guest's attempt. You 
protected it from write, but didn't emulate the DMA so it's still broken.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-13 16:14                                         ` Tian, Kevin
@ 2014-08-14  8:08                                           ` Tim Deegan
  2014-08-14 17:49                                             ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Tim Deegan @ 2014-08-14  8:08 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich,
	Zhang, Yang Z, Ye, Wei

At 16:14 +0000 on 13 Aug (1407942882), Tian, Kevin wrote:
> > From: Tim Deegan [mailto:tim@xen.org]
> > Sent: Wednesday, August 13, 2014 1:39 AM
> > 
> > At 23:15 +0000 on 12 Aug (1407881757), Ye, Wei wrote:
> > > Some summary of the discussion:
> > >
> > > Without vt-d page fault support, a general write-protection is not
> > > possible. It's impossible to protect the write of both CPU and DMA
> > > path.  But note that XenGT itself doesn't use VT-d, and there is no
> > > another usage of having another device DMA to GPU resource.
> > 
> > None that you're aware of.  But there might be some out-of-tree users?
> 
> If there is, then it has to be in the unsupported feature list. I think it's fair 
> enough since we all know w/o complete VT-d page fault support there's 
> no solution for such usage. Pretty much like old days big real mode tricks
> before hardware support is ready.

I think there's confusion here about what we're taling about.  Yes,
until we have h/w support for restarting faulting DMA the IOMMU
feature set will be incomplete.  The important thing is that it should
fail safe -- i.e. if a guest has a read-only mapping of a page it is
better to disallow DMA altogether than to allow read-write DMA.

> > > Or even
> > > other device use the protect page for its DMA buffer, like a
> > > malicious guest, there is no security hole since it can only access
> > > it's our memory range.
> > 
> > Not true at all!  If the guest has a read-only mapping it must be
> > read-only for a reason.  For example, it might have a read-only grant
> > mapping of another VM's memory.  Or an out-of-VM security tool might
> > be enforcing W^X permissions.
> 
> sorry original statement is inaccurate. there is no cross-VM security hole
> since EPT/VT-d are fully under VMM's control, but yes out-of-VM security
> tool might be broken... 

As I just said: "it might have a read-only grant mapping of another
VM's memory".  We can't let it DMA into that mapping.

> > >  So, just keep protecting the CPU write path may be feasible for this
> > moment.
> > 
> > I don't think so.
> 
> Could we introduce a cpu-write-protection attribute only for functional
> usages which don't care about DMA path like in XenGT?

Possibly.  Sorry for going over things that have probably already been
covered but AIUI you are saying that XenGT needs to allow full DMA access
to this memory area but only read-only CPU access.  If that's what's
needed then I think a new type with these semantics is the best way of
doing it (and requiring no shared IOMMU/HAP tables).  But:
 - it's pointless if the guest can make the device issue DMA for it,
   (with a blitting engine or similar) since it can just use that DMA
   to make the writes it wants; and
 - for the same reason, you will need to give the GPU its own set
   of IOMMU tables, separate from the ones used for other devices that
   the guest controls.  Otherwise the guest can use its NIC to DMA
   into the read-only area.

> > > Another way is treat the page as MMIO ranges(means removing mapping
> > > in both EPD/VT-D page table). In this way, both read/write access
> > > should be forwarded and emulated. For XenGT usage case, we observed
> > > that for the Linux guest, there is no read access to the ppgtt page
> > > tables. And for the windows guest, the read access is just about 20%
> > > of write. The read overhead is relatively small, so treat the pages
> > > as MMIO range may be possible.
> > 
> > That might work for XenGT but it doesn't address the general problem
> > at all.  AFAICT the only _safe_ way to make a page read-only to a guest
> > that can issue DMA is to remove it from the IOMMU tables - and that
> > means having separate EPT and vtD tables, until hardware support
> > for restartable IOMMU faults comes along.
> 
> Aren't we talking about a similar option? ours is more conservative, not only
> removing from IOMMU table, but also removing from EPT so both r/w are
> trapped which serves the read-only requirements with performance penalty.

Yes they are similar.  I would say mine is more conservative in a
different way - not using the emulator unneccesarily. :)

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-14  8:08                                           ` Tim Deegan
@ 2014-08-14 17:49                                             ` Tian, Kevin
  2014-08-14 20:25                                               ` Tim Deegan
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-14 17:49 UTC (permalink / raw)
  To: Tim Deegan
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich,
	Zhang, Yang Z, Ye, Wei

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, August 14, 2014 1:08 AM
> 
> At 16:14 +0000 on 13 Aug (1407942882), Tian, Kevin wrote:
> > > From: Tim Deegan [mailto:tim@xen.org]
> > > Sent: Wednesday, August 13, 2014 1:39 AM
> > >
> > > At 23:15 +0000 on 12 Aug (1407881757), Ye, Wei wrote:
> > > > Some summary of the discussion:
> > > >
> > > > Without vt-d page fault support, a general write-protection is not
> > > > possible. It's impossible to protect the write of both CPU and DMA
> > > > path.  But note that XenGT itself doesn't use VT-d, and there is no
> > > > another usage of having another device DMA to GPU resource.
> > >
> > > None that you're aware of.  But there might be some out-of-tree users?
> >
> > If there is, then it has to be in the unsupported feature list. I think it's fair
> > enough since we all know w/o complete VT-d page fault support there's
> > no solution for such usage. Pretty much like old days big real mode tricks
> > before hardware support is ready.
> 
> I think there's confusion here about what we're taling about.  Yes,
> until we have h/w support for restarting faulting DMA the IOMMU
> feature set will be incomplete.  The important thing is that it should
> fail safe -- i.e. if a guest has a read-only mapping of a page it is
> better to disallow DMA altogether than to allow read-write DMA.

Yes, there did be some confusions here. :-)

I think we are on the same page that, complete 'read-only' or
complete 'write-protection' is impossible w/o restarting faulting
DMA, because there is no way to enforce the access permission 
in the IOMMU side.

Then whatever 'read-only' or 'write-protection" discussion now
is a partial solution where only CPU path is protected, with the
assumption that no valid usage of using DMA to access the concerned
memory area. And then I agree disallowing DMA becomes a fail-safe 
means to capture violating usages.

> 
> > > > Or even
> > > > other device use the protect page for its DMA buffer, like a
> > > > malicious guest, there is no security hole since it can only access
> > > > it's our memory range.
> > >
> > > Not true at all!  If the guest has a read-only mapping it must be
> > > read-only for a reason.  For example, it might have a read-only grant
> > > mapping of another VM's memory.  Or an out-of-VM security tool might
> > > be enforcing W^X permissions.
> >
> > sorry original statement is inaccurate. there is no cross-VM security hole
> > since EPT/VT-d are fully under VMM's control, but yes out-of-VM security
> > tool might be broken...
> 
> As I just said: "it might have a read-only grant mapping of another
> VM's memory".  We can't let it DMA into that mapping.
> 
> > > >  So, just keep protecting the CPU write path may be feasible for this
> > > moment.
> > >
> > > I don't think so.
> >
> > Could we introduce a cpu-write-protection attribute only for functional
> > usages which don't care about DMA path like in XenGT?
> 
> Possibly.  Sorry for going over things that have probably already been
> covered but AIUI you are saying that XenGT needs to allow full DMA access
> to this memory area but only read-only CPU access.  If that's what's
> needed then I think a new type with these semantics is the best way of
> doing it (and requiring no shared IOMMU/HAP tables).  But:

It's about GPU page table write-protection, so XenGT needs to trap
writes in both CPU/DMA paths:

* CPU writes are captured by read-only permission in EPT
* GPU writes are captured by scanning GPU commands before they're
submitted for execution, as part of the 'mediation' concept in XenGT
mediated pass-through architecture.

Note XenGT doesn't use IOMMU because w/o SR-IOV a single device
entry can't be used to server multiple VMs.

So XenGT can achieve true 'write-protection' w/o IOMMU restarting
fault support. However it's very GPU specific. As a general implementation, 
I think a new type with below semantics should be enough:

- only CPU access to the protected area, so it's removed from IOMMU
page table
- require IOMMU/HAP seperate

How would you call such a new type then? just p2m_write_protection
or p2m_cpu_write_protection? (still need to differentiate write-protection
from read-only, since the latter implies no writes)

>  - it's pointless if the guest can make the device issue DMA for it,
>    (with a blitting engine or similar) since it can just use that DMA
>    to make the writes it wants; and

Based on above explanation it's not a problem

>  - for the same reason, you will need to give the GPU its own set
>    of IOMMU tables, separate from the ones used for other devices that
>    the guest controls.  Otherwise the guest can use its NIC to DMA
>    into the read-only area.

XenGT doesn't use IOMMU. So it's enough as long as we zap the range
from all IOMMU page tables

in the future when we have SR-IOV GPU support, there's no need to
shadow GPU page table then, and thus no such a requirement of
write-protection.

> 
> > > > Another way is treat the page as MMIO ranges(means removing
> mapping
> > > > in both EPD/VT-D page table). In this way, both read/write access
> > > > should be forwarded and emulated. For XenGT usage case, we observed
> > > > that for the Linux guest, there is no read access to the ppgtt page
> > > > tables. And for the windows guest, the read access is just about 20%
> > > > of write. The read overhead is relatively small, so treat the pages
> > > > as MMIO range may be possible.
> > >
> > > That might work for XenGT but it doesn't address the general problem
> > > at all.  AFAICT the only _safe_ way to make a page read-only to a guest
> > > that can issue DMA is to remove it from the IOMMU tables - and that
> > > means having separate EPT and vtD tables, until hardware support
> > > for restartable IOMMU faults comes along.
> >
> > Aren't we talking about a similar option? ours is more conservative, not only
> > removing from IOMMU table, but also removing from EPT so both r/w are
> > trapped which serves the read-only requirements with performance penalty.
> 
> Yes they are similar.  I would say mine is more conservative in a
> different way - not using the emulator unneccesarily. :)
> 

yeah from different view. Our original thought is if new p2m type is tricky, 
then treating it as mmio could be an easier thing to extend. Now since you
also agree adding a new type is reasonable, then we can continue this
direction (actually original patch is along this road, but didn't zap IOMMU
pages. :-))

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-14 17:49                                             ` Tian, Kevin
@ 2014-08-14 20:25                                               ` Tim Deegan
  2014-08-14 22:53                                                 ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Tim Deegan @ 2014-08-14 20:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich,
	Zhang, Yang Z, Ye, Wei

At 17:49 +0000 on 14 Aug (1408034941), Tian, Kevin wrote:
> > From: Tim Deegan [mailto:tim@xen.org]
> > > Could we introduce a cpu-write-protection attribute only for functional
> > > usages which don't care about DMA path like in XenGT?
> > 
> > Possibly.  Sorry for going over things that have probably already been
> > covered but AIUI you are saying that XenGT needs to allow full DMA access
> > to this memory area but only read-only CPU access.  If that's what's
> > needed then I think a new type with these semantics is the best way of
> > doing it (and requiring no shared IOMMU/HAP tables).  But:
> 
> It's about GPU page table write-protection, so XenGT needs to trap
> writes in both CPU/DMA paths:
> 
> * CPU writes are captured by read-only permission in EPT
> * GPU writes are captured by scanning GPU commands before they're
> submitted for execution, as part of the 'mediation' concept in XenGT
> mediated pass-through architecture.
> 
> Note XenGT doesn't use IOMMU because w/o SR-IOV a single device
> entry can't be used to server multiple VMs.

Oh, I see.  So you don't actually want the GPUPTs mapped for DMA at
all.  That's much easier.

> So XenGT can achieve true 'write-protection' w/o IOMMU restarting
> fault support. However it's very GPU specific. As a general implementation, 
> I think a new type with below semantics should be enough:
> 
> - only CPU access to the protected area, so it's removed from IOMMU
> page table
> - require IOMMU/HAP seperate

I think we might want to make those tables separate in general.  IIRC we
were discussing that in another thread already.  But that aside, I think
a plain read-only mapping will DTRT for you -- it allows read-only DMA
from the GPUPTs, but that's no worse that the read-only CPU mapping.

After that, the only odd behaviour is that you're emulating writes to
read-only memory.  And that's where the new type is helpful: to say
that we should be emulating writes rather than dropping them.

> How would you call such a new type then? just p2m_write_protection
> or p2m_cpu_write_protection? (still need to differentiate write-protection
> from read-only, since the latter implies no writes)

p2m_mmio_write_dm?

Now that (I hope) I understand better what you need, I can see how
just using plain mmio_dm and emulating reads would be a plausible
solution too.  So I think I'd be happy with either that or a new type.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-14 20:25                                               ` Tim Deegan
@ 2014-08-14 22:53                                                 ` Tian, Kevin
  2014-08-14 23:12                                                   ` Jan Beulich
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2014-08-14 22:53 UTC (permalink / raw)
  To: Tim Deegan
  Cc: keir, ian.campbell, stefano.stabellini, ian.jackson, Dugger,
	Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Jan Beulich,
	Zhang, Yang Z, Ye, Wei

> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, August 14, 2014 1:25 PM
> 
> At 17:49 +0000 on 14 Aug (1408034941), Tian, Kevin wrote:
> > > From: Tim Deegan [mailto:tim@xen.org]
> > > > Could we introduce a cpu-write-protection attribute only for functional
> > > > usages which don't care about DMA path like in XenGT?
> > >
> > > Possibly.  Sorry for going over things that have probably already been
> > > covered but AIUI you are saying that XenGT needs to allow full DMA access
> > > to this memory area but only read-only CPU access.  If that's what's
> > > needed then I think a new type with these semantics is the best way of
> > > doing it (and requiring no shared IOMMU/HAP tables).  But:
> >
> > It's about GPU page table write-protection, so XenGT needs to trap
> > writes in both CPU/DMA paths:
> >
> > * CPU writes are captured by read-only permission in EPT
> > * GPU writes are captured by scanning GPU commands before they're
> > submitted for execution, as part of the 'mediation' concept in XenGT
> > mediated pass-through architecture.
> >
> > Note XenGT doesn't use IOMMU because w/o SR-IOV a single device
> > entry can't be used to server multiple VMs.
> 
> Oh, I see.  So you don't actually want the GPUPTs mapped for DMA at
> all.  That's much easier.
> 
> > So XenGT can achieve true 'write-protection' w/o IOMMU restarting
> > fault support. However it's very GPU specific. As a general implementation,
> > I think a new type with below semantics should be enough:
> >
> > - only CPU access to the protected area, so it's removed from IOMMU
> > page table
> > - require IOMMU/HAP seperate
> 
> I think we might want to make those tables separate in general.  IIRC we
> were discussing that in another thread already.  But that aside, I think
> a plain read-only mapping will DTRT for you -- it allows read-only DMA
> from the GPUPTs, but that's no worse that the read-only CPU mapping.

for 'plain read-only mapping' you mean existing p2m_ram_ro? 

btw just curious how we handle p2m_ram_ro today. Does it play any trick
to fail-safe in IOMMU side, e.g. zap the mapping or not? Otherwise when
it allows read-only DMA there's no way to prevent DMA writes either. :-)


> 
> After that, the only odd behaviour is that you're emulating writes to
> read-only memory.  And that's where the new type is helpful: to say
> that we should be emulating writes rather than dropping them.

so p2m_mmio_write_dm would go together with existing p2m_ram_ro
in most condition checks, with only difference on final policy of emulation 
vs. drop.

> 
> > How would you call such a new type then? just p2m_write_protection
> > or p2m_cpu_write_protection? (still need to differentiate write-protection
> > from read-only, since the latter implies no writes)
> 
> p2m_mmio_write_dm?
> 
> Now that (I hope) I understand better what you need, I can see how
> just using plain mmio_dm and emulating reads would be a plausible
> solution too.  So I think I'd be happy with either that or a new type.
> 

yes either way works. for now let's try the new p2m type since it
is more intuitive regarding to 'write-protection'. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-14 22:53                                                 ` Tian, Kevin
@ 2014-08-14 23:12                                                   ` Jan Beulich
  2014-08-14 23:33                                                     ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Beulich @ 2014-08-14 23:12 UTC (permalink / raw)
  To: Kevin Tian
  Cc: keir, ian.campbell, stefano.stabellini, Tim Deegan, ian.jackson,
	Donald D Dugger, xen-devel, Paul.Durrant, Zhiyuan Lv,
	Yang Z Zhang, Wei Ye

>>> On 15.08.14 at 00:53, <kevin.tian@intel.com> wrote:
> btw just curious how we handle p2m_ram_ro today. Does it play any trick
> to fail-safe in IOMMU side, e.g. zap the mapping or not? Otherwise when
> it allows read-only DMA there's no way to prevent DMA writes either. :-)

Why would that be? A page marked r/o in the IOMMU page tables
ought to very well be protected against DMA writes. It's just that
these will cause faults, which - if happening too frequently - will
result in the device getting turned off, quite likely (implicitly) killing
the guest.

Jan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v1 0/2] Extend ioreq-server to support page write protection
  2014-08-14 23:12                                                   ` Jan Beulich
@ 2014-08-14 23:33                                                     ` Tian, Kevin
  0 siblings, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2014-08-14 23:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, stefano.stabellini, Tim Deegan, ian.jackson,
	Dugger, Donald D, xen-devel, Paul.Durrant, Lv, Zhiyuan, Zhang,
	Yang Z, Ye, Wei

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, August 14, 2014 4:12 PM
> 
> >>> On 15.08.14 at 00:53, <kevin.tian@intel.com> wrote:
> > btw just curious how we handle p2m_ram_ro today. Does it play any trick
> > to fail-safe in IOMMU side, e.g. zap the mapping or not? Otherwise when
> > it allows read-only DMA there's no way to prevent DMA writes either. :-)
> 
> Why would that be? A page marked r/o in the IOMMU page tables
> ought to very well be protected against DMA writes. It's just that
> these will cause faults, which - if happening too frequently - will
> result in the device getting turned off, quite likely (implicitly) killing
> the guest.
> 

yes, as protection it's enough, and it's only a problem for emulation. sorry
for messed msg after a long discussion thread. :/

Thanks
Kevin

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-08-14 23:33 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28 17:55 [PATCH v1 0/2] Extend ioreq-server to support page write protection Wei Ye
2014-07-28  8:24 ` Jan Beulich
2014-08-04  5:05   ` Ye, Wei
2014-08-04  7:35     ` Jan Beulich
2014-08-04 21:34       ` Tian, Kevin
2014-08-05  6:35         ` Jan Beulich
2014-08-05  6:46           ` Ye, Wei
2014-08-05  7:51             ` Jan Beulich
2014-08-05  7:35         ` Zhang, Yang Z
2014-08-05  7:51           ` Jan Beulich
2014-08-05  8:20             ` Ye, Wei
2014-08-05 15:41             ` Tian, Kevin
2014-08-06  2:11               ` Zhang, Yang Z
2014-08-06  2:33                 ` Tian, Kevin
2014-08-06  2:40                   ` Zhang, Yang Z
2014-08-06  2:49                     ` Tian, Kevin
2014-08-06  2:50                     ` Tian, Kevin
2014-08-06  3:04                       ` Zhang, Yang Z
2014-08-06 15:00                         ` Konrad Rzeszutek Wilk
2014-08-06 16:08                           ` Tian, Kevin
2014-08-07  6:45                             ` Jan Beulich
2014-08-07 16:28                               ` Tian, Kevin
2014-08-08  6:32                                 ` Jan Beulich
2014-08-08 16:02                                   ` Tian, Kevin
2014-08-08 16:04                                   ` Tian, Kevin
2014-08-12 23:15                                     ` Ye, Wei
2014-08-13  8:38                                       ` Tim Deegan
2014-08-13 16:14                                         ` Tian, Kevin
2014-08-14  8:08                                           ` Tim Deegan
2014-08-14 17:49                                             ` Tian, Kevin
2014-08-14 20:25                                               ` Tim Deegan
2014-08-14 22:53                                                 ` Tian, Kevin
2014-08-14 23:12                                                   ` Jan Beulich
2014-08-14 23:33                                                     ` Tian, Kevin
2014-08-06 17:38                           ` Tian, Kevin
2014-07-28 17:55 ` [PATCH v1 1/2] x86: add p2m_ram_wp Wei Ye
2014-07-28  8:31   ` Jan Beulich
2014-08-04  5:10     ` Ye, Wei
2014-08-04  7:37       ` Jan Beulich
2014-08-05  7:09         ` Ye, Wei
2014-07-28 17:55 ` [PATCH v1 2/2] ioreq-server: Support scatter page forwarding Wei Ye
2014-07-28  8:57   ` Jan Beulich
2014-08-04  5:41     ` Ye, Wei
2014-08-04  7:47       ` Jan Beulich
2014-08-04 21:39         ` Tian, Kevin
2014-08-05  6:38           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.