All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Various XSA followups
@ 2017-06-21 15:12 Andrew Cooper
  2017-06-21 15:12 ` [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch() Andrew Cooper
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

XSA-219 was discovered while trying to implement the bugfix in patch 6.

Andrew Cooper (6):
  x86/hvm: Fixes to hvmemul_insn_fetch()
  x86/shadow: Fixes to hvm_emulate_insn_fetch()
  x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result
  x86/hvm: Break out __hvm_copy()'s translation logic
  x86/hvm: Implement hvmemul_write() using real mappings

 xen/arch/x86/hvm/dom0_build.c     |   2 +-
 xen/arch/x86/hvm/emulate.c        | 224 ++++++++++++++++++++++++++++++++------
 xen/arch/x86/hvm/hvm.c            | 181 +++++++++++++++++-------------
 xen/arch/x86/hvm/intercept.c      |  20 ++--
 xen/arch/x86/hvm/svm/nestedsvm.c  |   5 +-
 xen/arch/x86/hvm/svm/svm.c        |   2 +-
 xen/arch/x86/hvm/viridian.c       |   2 +-
 xen/arch/x86/hvm/vmsi.c           |   2 +-
 xen/arch/x86/hvm/vmx/realmode.c   |   2 +-
 xen/arch/x86/hvm/vmx/vvmx.c       |  14 +--
 xen/arch/x86/mm/shadow/common.c   |  22 ++--
 xen/arch/x86/mm/shadow/multi.c    |   8 +-
 xen/arch/x86/mm/shadow/private.h  |   7 +-
 xen/common/libelf/libelf-loader.c |   4 +-
 xen/include/asm-x86/hvm/emulate.h |   7 ++
 xen/include/asm-x86/hvm/support.h |  52 +++++----
 16 files changed, 379 insertions(+), 175 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
  2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
@ 2017-06-21 15:12 ` Andrew Cooper
  2017-06-21 16:04   ` Paul Durrant
  2017-06-22  8:05   ` Jan Beulich
  2017-06-21 15:12 ` [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch() Andrew Cooper
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich

Force insn_off to a single byte, as offset can wrap around or truncate with
respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.

Furthermore, don't use an ASSERT() for bounds checking the write into
hvmemul_ctxt->insn_buf[].

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Paul Durrant <paul.durrant@citrix.com>

For anyone wondering, this is way to explot XSA-186
---
 xen/arch/x86/hvm/emulate.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 11e4aba..495e312 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -939,7 +939,8 @@ int hvmemul_insn_fetch(
 {
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
-    unsigned int insn_off = offset - hvmemul_ctxt->insn_buf_eip;
+    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
+    uint8_t insn_off = offset - hvmemul_ctxt->insn_buf_eip;
 
     /*
      * Fall back if requested bytes are not in the prefetch cache.
@@ -953,7 +954,17 @@ int hvmemul_insn_fetch(
 
         if ( rc == X86EMUL_OKAY && bytes )
         {
-            ASSERT(insn_off + bytes <= sizeof(hvmemul_ctxt->insn_buf));
+            /*
+             * Will we overflow insn_buf[]?  This shouldn't be able to happen,
+             * which means something went wrong with instruction decoding...
+             */
+            if ( insn_off > sizeof(hvmemul_ctxt->insn_buf) ||
+                 (insn_off + bytes) > sizeof(hvmemul_ctxt->insn_buf) )
+            {
+                ASSERT_UNREACHABLE();
+                return X86EMUL_UNHANDLEABLE;
+            }
+
             memcpy(&hvmemul_ctxt->insn_buf[insn_off], p_data, bytes);
             hvmemul_ctxt->insn_buf_bytes = insn_off + bytes;
         }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch()
  2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
  2017-06-21 15:12 ` [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch() Andrew Cooper
@ 2017-06-21 15:12 ` Andrew Cooper
  2017-06-22  8:09   ` Jan Beulich
  2017-06-22 11:52   ` Tim Deegan
  2017-06-21 15:12 ` [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest() Andrew Cooper
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

Zero-legnth reads are jump-target segmentation checks; never serve them from
the cache.

Force insn_off to a single byte, as offset can wrap around or truncate with
respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Tim Deegan <tim@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/mm/shadow/common.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 2e64a77..deea03a 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -235,12 +235,16 @@ hvm_emulate_insn_fetch(enum x86_segment seg,
 {
     struct sh_emulate_ctxt *sh_ctxt =
         container_of(ctxt, struct sh_emulate_ctxt, ctxt);
-    unsigned int insn_off = offset - sh_ctxt->insn_buf_eip;
+    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
+    uint8_t insn_off = offset - sh_ctxt->insn_buf_eip;
 
     ASSERT(seg == x86_seg_cs);
 
-    /* Fall back if requested bytes are not in the prefetch cache. */
-    if ( unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
+    /*
+     * Fall back if requested bytes are not in the prefetch cache, but always
+     * perform the zero-length read for segmentation purposes.
+     */
+    if ( !bytes || unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
         return hvm_read(seg, offset, p_data, bytes,
                         hvm_access_insn_fetch, sh_ctxt);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
  2017-06-21 15:12 ` [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch() Andrew Cooper
  2017-06-21 15:12 ` [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch() Andrew Cooper
@ 2017-06-21 15:12 ` Andrew Cooper
  2017-06-22  8:14   ` Jan Beulich
  2017-06-22 12:09   ` Tim Deegan
  2017-06-21 15:12 ` [PATCH 4/6] [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result Andrew Cooper
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
infrasturcture, but take the opportunity to avoid opencoding it.

The chosen error constants require need to be negative to work with IS_ERR(),
but no other changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * Use ~(long)X86EMUL rather than -X86EMUL so MAPPING_SILENT_FAIL is
   considered an error to IS_ERR()
---
 xen/arch/x86/mm/shadow/multi.c   | 8 ++++----
 xen/arch/x86/mm/shadow/private.h | 7 +++----
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index f65ffc6..74c3641 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4752,8 +4752,8 @@ sh_x86_emulate_write(struct vcpu *v, unsigned long vaddr, void *src,
         return X86EMUL_UNHANDLEABLE;
 
     addr = sh_emulate_map_dest(v, vaddr, bytes, sh_ctxt);
-    if ( sh_emulate_map_dest_failed(addr) )
-        return (long)addr;
+    if ( IS_ERR(addr) )
+        return ~PTR_ERR(addr);
 
     paging_lock(v->domain);
     memcpy(addr, src, bytes);
@@ -4794,8 +4794,8 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, unsigned long vaddr,
         return X86EMUL_UNHANDLEABLE;
 
     addr = sh_emulate_map_dest(v, vaddr, bytes, sh_ctxt);
-    if ( sh_emulate_map_dest_failed(addr) )
-        return (long)addr;
+    if ( IS_ERR(addr) )
+        return ~PTR_ERR(addr);
 
     paging_lock(v->domain);
     switch ( bytes )
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index 472676c..a59ff7a 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -395,10 +395,9 @@ void shadow_unhook_mappings(struct domain *d, mfn_t smfn, int user_only);
 
 /* Returns a mapped pointer to write to, or one of the following error
  * indicators. */
-#define MAPPING_UNHANDLEABLE ((void *)(unsigned long)X86EMUL_UNHANDLEABLE)
-#define MAPPING_EXCEPTION    ((void *)(unsigned long)X86EMUL_EXCEPTION)
-#define MAPPING_SILENT_FAIL  ((void *)(unsigned long)X86EMUL_OKAY)
-#define sh_emulate_map_dest_failed(rc) ((unsigned long)(rc) <= 3)
+#define MAPPING_UNHANDLEABLE ERR_PTR(~(long)X86EMUL_UNHANDLEABLE)
+#define MAPPING_EXCEPTION    ERR_PTR(~(long)X86EMUL_EXCEPTION)
+#define MAPPING_SILENT_FAIL  ERR_PTR(~(long)X86EMUL_OKAY)
 void *sh_emulate_map_dest(struct vcpu *v, unsigned long vaddr,
                           unsigned int bytes, struct sh_emulate_ctxt *sh_ctxt);
 void sh_emulate_unmap_dest(struct vcpu *v, void *addr, unsigned int bytes,
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/6] [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result
  2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
                   ` (2 preceding siblings ...)
  2017-06-21 15:12 ` [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest() Andrew Cooper
@ 2017-06-21 15:12 ` Andrew Cooper
  2017-06-22  8:19   ` Jan Beulich
  2017-06-21 15:12 ` [PATCH 5/6] x86/hvm: Break out __hvm_copy()'s translation logic Andrew Cooper
  2017-06-21 15:12 ` [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings Andrew Cooper
  5 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

Name definitely open to improvement.  Perhaps better considered in the context
of the following patch.
---
 xen/arch/x86/hvm/dom0_build.c     |  2 +-
 xen/arch/x86/hvm/emulate.c        | 40 ++++++++++++++--------------
 xen/arch/x86/hvm/hvm.c            | 56 +++++++++++++++++++--------------------
 xen/arch/x86/hvm/intercept.c      | 20 +++++++-------
 xen/arch/x86/hvm/svm/nestedsvm.c  |  5 ++--
 xen/arch/x86/hvm/svm/svm.c        |  2 +-
 xen/arch/x86/hvm/viridian.c       |  2 +-
 xen/arch/x86/hvm/vmsi.c           |  2 +-
 xen/arch/x86/hvm/vmx/realmode.c   |  2 +-
 xen/arch/x86/hvm/vmx/vvmx.c       | 14 +++++-----
 xen/arch/x86/mm/shadow/common.c   | 12 ++++-----
 xen/common/libelf/libelf-loader.c |  4 +--
 xen/include/asm-x86/hvm/support.h | 40 ++++++++++++++--------------
 13 files changed, 101 insertions(+), 100 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 020c355..e8f746c 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -238,7 +238,7 @@ static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
     if ( !pvh_steal_ram(d, HVM_VM86_TSS_SIZE, 128, GB(4), &gaddr) )
     {
         if ( hvm_copy_to_guest_phys(gaddr, NULL, HVM_VM86_TSS_SIZE, v) !=
-             HVMCOPY_okay )
+             HVMTRANS_okay )
             printk("Unable to zero VM86 TSS area\n");
         d->arch.hvm_domain.params[HVM_PARAM_VM86_TSS_SIZED] =
             VM86_TSS_UPDATED | ((uint64_t)HVM_VM86_TSS_SIZE << 32) | gaddr;
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 495e312..384ad0b 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -100,7 +100,7 @@ static int ioreq_server_read(const struct hvm_io_handler *io_handler,
                     uint32_t size,
                     uint64_t *data)
 {
-    if ( hvm_copy_from_guest_phys(data, addr, size) != HVMCOPY_okay )
+    if ( hvm_copy_from_guest_phys(data, addr, size) != HVMTRANS_okay )
         return X86EMUL_UNHANDLEABLE;
 
     return X86EMUL_OKAY;
@@ -892,18 +892,18 @@ static int __hvmemul_read(
 
     switch ( rc )
     {
-    case HVMCOPY_okay:
+    case HVMTRANS_okay:
         break;
-    case HVMCOPY_bad_gva_to_gfn:
+    case HVMTRANS_bad_linear_to_gfn:
         x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
-    case HVMCOPY_bad_gfn_to_mfn:
+    case HVMTRANS_bad_gfn_to_mfn:
         if ( access_type == hvm_access_insn_fetch )
             return X86EMUL_UNHANDLEABLE;
 
         return hvmemul_linear_mmio_read(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
-    case HVMCOPY_gfn_paged_out:
-    case HVMCOPY_gfn_shared:
+    case HVMTRANS_gfn_paged_out:
+    case HVMTRANS_gfn_shared:
         return X86EMUL_RETRY;
     default:
         return X86EMUL_UNHANDLEABLE;
@@ -1011,15 +1011,15 @@ static int hvmemul_write(
 
     switch ( rc )
     {
-    case HVMCOPY_okay:
+    case HVMTRANS_okay:
         break;
-    case HVMCOPY_bad_gva_to_gfn:
+    case HVMTRANS_bad_linear_to_gfn:
         x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
-    case HVMCOPY_bad_gfn_to_mfn:
+    case HVMTRANS_bad_gfn_to_mfn:
         return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
-    case HVMCOPY_gfn_paged_out:
-    case HVMCOPY_gfn_shared:
+    case HVMTRANS_gfn_paged_out:
+    case HVMTRANS_gfn_shared:
         return X86EMUL_RETRY;
     default:
         return X86EMUL_UNHANDLEABLE;
@@ -1383,7 +1383,7 @@ static int hvmemul_rep_movs(
             return rc;
         }
 
-        rc = HVMCOPY_okay;
+        rc = HVMTRANS_okay;
     }
     else
         /*
@@ -1393,16 +1393,16 @@ static int hvmemul_rep_movs(
          */
         rc = hvm_copy_from_guest_phys(buf, sgpa, bytes);
 
-    if ( rc == HVMCOPY_okay )
+    if ( rc == HVMTRANS_okay )
         rc = hvm_copy_to_guest_phys(dgpa, buf, bytes, current);
 
     xfree(buf);
 
-    if ( rc == HVMCOPY_gfn_paged_out )
+    if ( rc == HVMTRANS_gfn_paged_out )
         return X86EMUL_RETRY;
-    if ( rc == HVMCOPY_gfn_shared )
+    if ( rc == HVMTRANS_gfn_shared )
         return X86EMUL_RETRY;
-    if ( rc != HVMCOPY_okay )
+    if ( rc != HVMTRANS_okay )
     {
         gdprintk(XENLOG_WARNING, "Failed memory-to-memory REP MOVS: sgpa=%"
                  PRIpaddr" dgpa=%"PRIpaddr" reps=%lu bytes_per_rep=%u\n",
@@ -1512,10 +1512,10 @@ static int hvmemul_rep_stos(
 
         switch ( rc )
         {
-        case HVMCOPY_gfn_paged_out:
-        case HVMCOPY_gfn_shared:
+        case HVMTRANS_gfn_paged_out:
+        case HVMTRANS_gfn_shared:
             return X86EMUL_RETRY;
-        case HVMCOPY_okay:
+        case HVMTRANS_okay:
             return X86EMUL_OKAY;
         }
 
@@ -2171,7 +2171,7 @@ void hvm_emulate_init_per_insn(
                                         &addr) &&
              hvm_fetch_from_guest_linear(hvmemul_ctxt->insn_buf, addr,
                                          sizeof(hvmemul_ctxt->insn_buf),
-                                         pfec, NULL) == HVMCOPY_okay) ?
+                                         pfec, NULL) == HVMTRANS_okay) ?
             sizeof(hvmemul_ctxt->insn_buf) : 0;
     }
     else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3ed6ec4..c822d3b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2918,9 +2918,9 @@ void hvm_task_switch(
 
     rc = hvm_copy_from_guest_linear(
         &tss, prev_tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
-    if ( rc == HVMCOPY_bad_gva_to_gfn )
+    if ( rc == HVMTRANS_bad_linear_to_gfn )
         hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
-    if ( rc != HVMCOPY_okay )
+    if ( rc != HVMTRANS_okay )
         goto out;
 
     eflags = regs->eflags;
@@ -2958,20 +2958,20 @@ void hvm_task_switch(
                                   offsetof(typeof(tss), trace) -
                                   offsetof(typeof(tss), eip),
                                   PFEC_page_present, &pfinfo);
-    if ( rc == HVMCOPY_bad_gva_to_gfn )
+    if ( rc == HVMTRANS_bad_linear_to_gfn )
         hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
-    if ( rc != HVMCOPY_okay )
+    if ( rc != HVMTRANS_okay )
         goto out;
 
     rc = hvm_copy_from_guest_linear(
         &tss, tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
-    if ( rc == HVMCOPY_bad_gva_to_gfn )
+    if ( rc == HVMTRANS_bad_linear_to_gfn )
         hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
     /*
-     * Note: The HVMCOPY_gfn_shared case could be optimised, if the callee
+     * Note: The HVMTRANS_gfn_shared case could be optimised, if the callee
      * functions knew we want RO access.
      */
-    if ( rc != HVMCOPY_okay )
+    if ( rc != HVMTRANS_okay )
         goto out;
 
     new_cpl = tss.eflags & X86_EFLAGS_VM ? 3 : tss.cs & 3;
@@ -3013,12 +3013,12 @@ void hvm_task_switch(
         rc = hvm_copy_to_guest_linear(tr.base + offsetof(typeof(tss), back_link),
                                       &tss.back_link, sizeof(tss.back_link), 0,
                                       &pfinfo);
-        if ( rc == HVMCOPY_bad_gva_to_gfn )
+        if ( rc == HVMTRANS_bad_linear_to_gfn )
         {
             hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
             exn_raised = 1;
         }
-        else if ( rc != HVMCOPY_okay )
+        else if ( rc != HVMTRANS_okay )
             goto out;
     }
 
@@ -3054,12 +3054,12 @@ void hvm_task_switch(
         {
             rc = hvm_copy_to_guest_linear(linear_addr, &errcode, opsz, 0,
                                           &pfinfo);
-            if ( rc == HVMCOPY_bad_gva_to_gfn )
+            if ( rc == HVMTRANS_bad_linear_to_gfn )
             {
                 hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
                 exn_raised = 1;
             }
-            else if ( rc != HVMCOPY_okay )
+            else if ( rc != HVMTRANS_okay )
                 goto out;
         }
     }
@@ -3076,7 +3076,7 @@ void hvm_task_switch(
 #define HVMCOPY_to_guest   (1u<<0)
 #define HVMCOPY_phys       (0u<<2)
 #define HVMCOPY_linear     (1u<<2)
-static enum hvm_copy_result __hvm_copy(
+static enum hvm_translation_result __hvm_copy(
     void *buf, paddr_t addr, int size, struct vcpu *v, unsigned int flags,
     uint32_t pfec, pagefault_info_t *pfinfo)
 {
@@ -3101,7 +3101,7 @@ static enum hvm_copy_result __hvm_copy(
      * Hence we bail immediately if called from atomic context.
      */
     if ( in_atomic() )
-        return HVMCOPY_unhandleable;
+        return HVMTRANS_unhandleable;
 #endif
 
     while ( todo > 0 )
@@ -3116,15 +3116,15 @@ static enum hvm_copy_result __hvm_copy(
             if ( gfn == gfn_x(INVALID_GFN) )
             {
                 if ( pfec & PFEC_page_paged )
-                    return HVMCOPY_gfn_paged_out;
+                    return HVMTRANS_gfn_paged_out;
                 if ( pfec & PFEC_page_shared )
-                    return HVMCOPY_gfn_shared;
+                    return HVMTRANS_gfn_shared;
                 if ( pfinfo )
                 {
                     pfinfo->linear = addr;
                     pfinfo->ec = pfec & ~PFEC_implicit;
                 }
-                return HVMCOPY_bad_gva_to_gfn;
+                return HVMTRANS_bad_linear_to_gfn;
             }
             gpa |= (paddr_t)gfn << PAGE_SHIFT;
         }
@@ -3142,28 +3142,28 @@ static enum hvm_copy_result __hvm_copy(
         if ( v == current && is_hvm_vcpu(v)
              && !nestedhvm_vcpu_in_guestmode(v)
              && hvm_mmio_internal(gpa) )
-            return HVMCOPY_bad_gfn_to_mfn;
+            return HVMTRANS_bad_gfn_to_mfn;
 
         page = get_page_from_gfn(v->domain, gfn, &p2mt, P2M_UNSHARE);
 
         if ( !page )
-            return HVMCOPY_bad_gfn_to_mfn;
+            return HVMTRANS_bad_gfn_to_mfn;
 
         if ( p2m_is_paging(p2mt) )
         {
             put_page(page);
             p2m_mem_paging_populate(v->domain, gfn);
-            return HVMCOPY_gfn_paged_out;
+            return HVMTRANS_gfn_paged_out;
         }
         if ( p2m_is_shared(p2mt) )
         {
             put_page(page);
-            return HVMCOPY_gfn_shared;
+            return HVMTRANS_gfn_shared;
         }
         if ( p2m_is_grant(p2mt) )
         {
             put_page(page);
-            return HVMCOPY_unhandleable;
+            return HVMTRANS_unhandleable;
         }
 
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
@@ -3201,24 +3201,24 @@ static enum hvm_copy_result __hvm_copy(
         put_page(page);
     }
 
-    return HVMCOPY_okay;
+    return HVMTRANS_okay;
 }
 
-enum hvm_copy_result hvm_copy_to_guest_phys(
+enum hvm_translation_result hvm_copy_to_guest_phys(
     paddr_t paddr, void *buf, int size, struct vcpu *v)
 {
     return __hvm_copy(buf, paddr, size, v,
                       HVMCOPY_to_guest | HVMCOPY_phys, 0, NULL);
 }
 
-enum hvm_copy_result hvm_copy_from_guest_phys(
+enum hvm_translation_result hvm_copy_from_guest_phys(
     void *buf, paddr_t paddr, int size)
 {
     return __hvm_copy(buf, paddr, size, current,
                       HVMCOPY_from_guest | HVMCOPY_phys, 0, NULL);
 }
 
-enum hvm_copy_result hvm_copy_to_guest_linear(
+enum hvm_translation_result hvm_copy_to_guest_linear(
     unsigned long addr, void *buf, int size, uint32_t pfec,
     pagefault_info_t *pfinfo)
 {
@@ -3227,7 +3227,7 @@ enum hvm_copy_result hvm_copy_to_guest_linear(
                       PFEC_page_present | PFEC_write_access | pfec, pfinfo);
 }
 
-enum hvm_copy_result hvm_copy_from_guest_linear(
+enum hvm_translation_result hvm_copy_from_guest_linear(
     void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo)
 {
@@ -3236,7 +3236,7 @@ enum hvm_copy_result hvm_copy_from_guest_linear(
                       PFEC_page_present | pfec, pfinfo);
 }
 
-enum hvm_copy_result hvm_fetch_from_guest_linear(
+enum hvm_translation_result hvm_fetch_from_guest_linear(
     void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo)
 {
@@ -3673,7 +3673,7 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
                                         sizeof(sig), hvm_access_insn_fetch,
                                         cs, &addr) &&
              (hvm_fetch_from_guest_linear(sig, addr, sizeof(sig),
-                                          walk, NULL) == HVMCOPY_okay) &&
+                                          walk, NULL) == HVMTRANS_okay) &&
              (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
         {
             regs->rip += sizeof(sig);
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index e51efd5..ef82419 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -136,14 +136,14 @@ int hvm_process_io_intercept(const struct hvm_io_handler *handler,
                 switch ( hvm_copy_to_guest_phys(p->data + step * i,
                                                 &data, p->size, current) )
                 {
-                case HVMCOPY_okay:
+                case HVMTRANS_okay:
                     break;
-                case HVMCOPY_bad_gfn_to_mfn:
+                case HVMTRANS_bad_gfn_to_mfn:
                     /* Drop the write as real hardware would. */
                     continue;
-                case HVMCOPY_bad_gva_to_gfn:
-                case HVMCOPY_gfn_paged_out:
-                case HVMCOPY_gfn_shared:
+                case HVMTRANS_bad_linear_to_gfn:
+                case HVMTRANS_gfn_paged_out:
+                case HVMTRANS_gfn_shared:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
@@ -164,14 +164,14 @@ int hvm_process_io_intercept(const struct hvm_io_handler *handler,
                 switch ( hvm_copy_from_guest_phys(&data, p->data + step * i,
                                                   p->size) )
                 {
-                case HVMCOPY_okay:
+                case HVMTRANS_okay:
                     break;
-                case HVMCOPY_bad_gfn_to_mfn:
+                case HVMTRANS_bad_gfn_to_mfn:
                     data = ~0;
                     break;
-                case HVMCOPY_bad_gva_to_gfn:
-                case HVMCOPY_gfn_paged_out:
-                case HVMCOPY_gfn_shared:
+                case HVMTRANS_bad_linear_to_gfn:
+                case HVMTRANS_gfn_paged_out:
+                case HVMTRANS_gfn_shared:
                     ASSERT_UNREACHABLE();
                     /* fall through */
                 default:
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 8fd9c23..66a1777 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -357,7 +357,7 @@ static int nsvm_vmrun_permissionmap(struct vcpu *v, bool_t viopm)
     struct vmcb_struct *host_vmcb = arch_svm->vmcb;
     unsigned long *ns_msrpm_ptr;
     unsigned int i;
-    enum hvm_copy_result ret;
+    enum hvm_translation_result ret;
     unsigned long *ns_viomap;
     bool_t ioport_80 = 1, ioport_ed = 1;
 
@@ -365,7 +365,8 @@ static int nsvm_vmrun_permissionmap(struct vcpu *v, bool_t viopm)
 
     ret = hvm_copy_from_guest_phys(svm->ns_cached_msrpm,
                                    ns_vmcb->_msrpm_base_pa, MSRPM_SIZE);
-    if (ret != HVMCOPY_okay) {
+    if ( ret != HVMTRANS_okay )
+    {
         gdprintk(XENLOG_ERR, "hvm_copy_from_guest_phys msrpm %u\n", ret);
         return 1;
     }
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 402e815..93cfdd7 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1268,7 +1268,7 @@ static void svm_emul_swint_injection(struct x86_event *event)
                                     PFEC_implicit, &pfinfo);
     if ( rc )
     {
-        if ( rc == HVMCOPY_bad_gva_to_gfn )
+        if ( rc == HVMTRANS_bad_linear_to_gfn )
         {
             fault = TRAP_page_fault;
             ec = pfinfo.ec;
diff --git a/xen/arch/x86/hvm/viridian.c b/xen/arch/x86/hvm/viridian.c
index aa9b87c..0c74fa4 100644
--- a/xen/arch/x86/hvm/viridian.c
+++ b/xen/arch/x86/hvm/viridian.c
@@ -914,7 +914,7 @@ int viridian_hypercall(struct cpu_user_regs *regs)
 
         /* Get input parameters. */
         if ( hvm_copy_from_guest_phys(&input_params, input_params_gpa,
-                                      sizeof(input_params)) != HVMCOPY_okay )
+                                      sizeof(input_params)) != HVMTRANS_okay )
             break;
 
         /*
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index a36692c..7415cdd 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -610,7 +610,7 @@ void msix_write_completion(struct vcpu *v)
         if ( desc &&
              hvm_copy_from_guest_phys(&data,
                                       v->arch.hvm_vcpu.hvm_io.msix_snoop_gpa,
-                                      sizeof(data)) == HVMCOPY_okay &&
+                                      sizeof(data)) == HVMTRANS_okay &&
              !(data & PCI_MSIX_VECTOR_BITMASK) )
             ctrl_address = snoop_addr;
     }
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 1996b1f..bd749f4 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -40,7 +40,7 @@ static void realmode_deliver_exception(
     last_byte = (vector * 4) + 3;
     if ( idtr->limit < last_byte ||
          hvm_copy_from_guest_phys(&cs_eip, idtr->base + vector * 4, 4) !=
-         HVMCOPY_okay )
+         HVMTRANS_okay )
     {
         /* Software interrupt? */
         if ( insn_len != 0 )
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3560fae..3feb46c 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -479,9 +479,9 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
             int rc = hvm_copy_from_guest_linear(poperandS, base, size,
                                                 0, &pfinfo);
 
-            if ( rc == HVMCOPY_bad_gva_to_gfn )
+            if ( rc == HVMTRANS_bad_linear_to_gfn )
                 hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
-            if ( rc != HVMCOPY_okay )
+            if ( rc != HVMTRANS_okay )
                 return X86EMUL_EXCEPTION;
         }
         decode->mem = base;
@@ -1453,7 +1453,7 @@ int nvmx_handle_vmxon(struct cpu_user_regs *regs)
     }
 
     rc = hvm_copy_from_guest_phys(&nvmcs_revid, gpa, sizeof(nvmcs_revid));
-    if ( rc != HVMCOPY_okay ||
+    if ( rc != HVMTRANS_okay ||
          (nvmcs_revid & ~VMX_BASIC_REVISION_MASK) ||
          ((nvmcs_revid ^ vmx_basic_msr) & VMX_BASIC_REVISION_MASK) )
     {
@@ -1731,9 +1731,9 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
     gpa = nvcpu->nv_vvmcxaddr;
 
     rc = hvm_copy_to_guest_linear(decode.mem, &gpa, decode.len, 0, &pfinfo);
-    if ( rc == HVMCOPY_bad_gva_to_gfn )
+    if ( rc == HVMTRANS_bad_linear_to_gfn )
         hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
-    if ( rc != HVMCOPY_okay )
+    if ( rc != HVMTRANS_okay )
         return X86EMUL_EXCEPTION;
 
     vmsucceed(regs);
@@ -1820,9 +1820,9 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
     switch ( decode.type ) {
     case VMX_INST_MEMREG_TYPE_MEMORY:
         rc = hvm_copy_to_guest_linear(decode.mem, &value, decode.len, 0, &pfinfo);
-        if ( rc == HVMCOPY_bad_gva_to_gfn )
+        if ( rc == HVMTRANS_bad_linear_to_gfn )
             hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
-        if ( rc != HVMCOPY_okay )
+        if ( rc != HVMTRANS_okay )
             return X86EMUL_EXCEPTION;
         break;
     case VMX_INST_MEMREG_TYPE_REG:
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index deea03a..fd70147 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -196,16 +196,16 @@ hvm_read(enum x86_segment seg,
 
     switch ( rc )
     {
-    case HVMCOPY_okay:
+    case HVMTRANS_okay:
         return X86EMUL_OKAY;
-    case HVMCOPY_bad_gva_to_gfn:
+    case HVMTRANS_bad_linear_to_gfn:
         x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &sh_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
-    case HVMCOPY_bad_gfn_to_mfn:
-    case HVMCOPY_unhandleable:
+    case HVMTRANS_bad_gfn_to_mfn:
+    case HVMTRANS_unhandleable:
         return X86EMUL_UNHANDLEABLE;
-    case HVMCOPY_gfn_paged_out:
-    case HVMCOPY_gfn_shared:
+    case HVMTRANS_gfn_paged_out:
+    case HVMTRANS_gfn_shared:
         return X86EMUL_RETRY;
     }
 
diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
index c8b7ec9..0f46872 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -154,10 +154,10 @@ static elf_errorstatus elf_memcpy(struct vcpu *v, void *dst, void *src,
 #ifdef CONFIG_X86
     if ( is_hvm_vcpu(v) )
     {
-        enum hvm_copy_result rc;
+        enum hvm_translation_result rc;
 
         rc = hvm_copy_to_guest_phys((paddr_t)dst, src, size, v);
-        return rc != HVMCOPY_okay ? -1 : 0;
+        return rc != HVMTRANS_okay ? -1 : 0;
     }
 #endif
 
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 8a1252b..a1799a3 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -53,23 +53,23 @@ extern unsigned int opt_hvm_debug_level;
 
 extern unsigned long hvm_io_bitmap[];
 
-enum hvm_copy_result {
-    HVMCOPY_okay = 0,
-    HVMCOPY_bad_gva_to_gfn,
-    HVMCOPY_bad_gfn_to_mfn,
-    HVMCOPY_unhandleable,
-    HVMCOPY_gfn_paged_out,
-    HVMCOPY_gfn_shared,
+enum hvm_translation_result {
+    HVMTRANS_okay,
+    HVMTRANS_bad_linear_to_gfn,
+    HVMTRANS_bad_gfn_to_mfn,
+    HVMTRANS_unhandleable,
+    HVMTRANS_gfn_paged_out,
+    HVMTRANS_gfn_shared,
 };
 
 /*
  * Copy to/from a guest physical address.
- * Returns HVMCOPY_okay, else HVMCOPY_bad_gfn_to_mfn if the given physical
+ * Returns HVMTRANS_okay, else HVMTRANS_bad_gfn_to_mfn if the given physical
  * address range does not map entirely onto ordinary machine memory.
  */
-enum hvm_copy_result hvm_copy_to_guest_phys(
+enum hvm_translation_result hvm_copy_to_guest_phys(
     paddr_t paddr, void *buf, int size, struct vcpu *v);
-enum hvm_copy_result hvm_copy_from_guest_phys(
+enum hvm_translation_result hvm_copy_from_guest_phys(
     void *buf, paddr_t paddr, int size);
 
 /*
@@ -79,13 +79,13 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
  * to set them.
  * 
  * Returns:
- *  HVMCOPY_okay: Copy was entirely successful.
- *  HVMCOPY_bad_gfn_to_mfn: Some guest physical address did not map to
- *                          ordinary machine memory.
- *  HVMCOPY_bad_gva_to_gfn: Some guest virtual address did not have a valid
- *                          mapping to a guest physical address.  The
- *                          pagefault_info_t structure will be filled in if
- *                          provided.
+ *  HVMTRANS_okay: Copy was entirely successful.
+ *  HVMTRANS_bad_gfn_to_mfn: Some guest physical address did not map to
+ *                           ordinary machine memory.
+ *  HVMTRANS_bad_linear_to_gfn: Some guest linear address did not have a
+ *                              valid mapping to a guest physical address.
+ *                              The pagefault_info_t structure will be filled
+ *                              in if provided.
  */
 typedef struct pagefault_info
 {
@@ -93,13 +93,13 @@ typedef struct pagefault_info
     int ec;
 } pagefault_info_t;
 
-enum hvm_copy_result hvm_copy_to_guest_linear(
+enum hvm_translation_result hvm_copy_to_guest_linear(
     unsigned long addr, void *buf, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
-enum hvm_copy_result hvm_copy_from_guest_linear(
+enum hvm_translation_result hvm_copy_from_guest_linear(
     void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
-enum hvm_copy_result hvm_fetch_from_guest_linear(
+enum hvm_translation_result hvm_fetch_from_guest_linear(
     void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/6] x86/hvm: Break out __hvm_copy()'s translation logic
  2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
                   ` (3 preceding siblings ...)
  2017-06-21 15:12 ` [PATCH 4/6] [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result Andrew Cooper
@ 2017-06-21 15:12 ` Andrew Cooper
  2017-06-22  8:30   ` Jan Beulich
  2017-06-21 15:12 ` [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings Andrew Cooper
  5 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

It will be reused by later changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c            | 141 ++++++++++++++++++++++----------------
 xen/include/asm-x86/hvm/support.h |  12 ++++
 2 files changed, 95 insertions(+), 58 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c822d3b..0a5697a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3072,6 +3072,80 @@ void hvm_task_switch(
     hvm_unmap_entry(nptss_desc);
 }
 
+enum hvm_translation_result hvm_translate_get_page(
+    struct vcpu *v, unsigned long addr, bool linear, uint32_t pfec,
+    pagefault_info_t *pfinfo, struct page_info **_page,
+    gfn_t *__gfn, p2m_type_t *_p2mt)
+{
+    struct page_info *page;
+    p2m_type_t p2mt;
+    gfn_t gfn;
+
+    if ( linear )
+    {
+        gfn = _gfn(paging_gva_to_gfn(v, addr, &pfec));
+
+        if ( gfn_eq(gfn, INVALID_GFN) )
+        {
+            if ( pfec & PFEC_page_paged )
+                return HVMTRANS_gfn_paged_out;
+
+            if ( pfec & PFEC_page_shared )
+                return HVMTRANS_gfn_shared;
+
+            if ( pfinfo )
+            {
+                pfinfo->linear = addr;
+                pfinfo->ec = pfec & ~PFEC_implicit;
+            }
+
+            return HVMTRANS_bad_linear_to_gfn;
+        }
+    }
+    else
+        gfn = _gfn(addr >> PAGE_SHIFT);
+
+    /*
+     * No need to do the P2M lookup for internally handled MMIO, benefiting
+     * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
+     * - newer Windows (like Server 2012) for HPET accesses.
+     */
+    if ( v == current && is_hvm_vcpu(v)
+         && !nestedhvm_vcpu_in_guestmode(v)
+         && hvm_mmio_internal(gfn_x(gfn) << PAGE_SHIFT) )
+        return HVMTRANS_bad_gfn_to_mfn;
+
+    page = get_page_from_gfn(v->domain, gfn_x(gfn), &p2mt, P2M_UNSHARE);
+
+    if ( !page )
+        return HVMTRANS_bad_gfn_to_mfn;
+
+    if ( p2m_is_paging(p2mt) )
+    {
+        put_page(page);
+        p2m_mem_paging_populate(v->domain, gfn_x(gfn));
+        return HVMTRANS_gfn_paged_out;
+    }
+    if ( p2m_is_shared(p2mt) )
+    {
+        put_page(page);
+        return HVMTRANS_gfn_shared;
+    }
+    if ( p2m_is_grant(p2mt) )
+    {
+        put_page(page);
+        return HVMTRANS_unhandleable;
+    }
+
+    *_page = page;
+    if ( __gfn )
+        *__gfn = gfn;
+    if ( _p2mt )
+        *_p2mt = p2mt;
+
+    return HVMTRANS_okay;
+}
+
 #define HVMCOPY_from_guest (0u<<0)
 #define HVMCOPY_to_guest   (1u<<0)
 #define HVMCOPY_phys       (0u<<2)
@@ -3080,7 +3154,7 @@ static enum hvm_translation_result __hvm_copy(
     void *buf, paddr_t addr, int size, struct vcpu *v, unsigned int flags,
     uint32_t pfec, pagefault_info_t *pfinfo)
 {
-    unsigned long gfn;
+    gfn_t gfn;
     struct page_info *page;
     p2m_type_t p2mt;
     char *p;
@@ -3106,65 +3180,15 @@ static enum hvm_translation_result __hvm_copy(
 
     while ( todo > 0 )
     {
+        enum hvm_translation_result res;
         paddr_t gpa = addr & ~PAGE_MASK;
 
         count = min_t(int, PAGE_SIZE - gpa, todo);
 
-        if ( flags & HVMCOPY_linear )
-        {
-            gfn = paging_gva_to_gfn(v, addr, &pfec);
-            if ( gfn == gfn_x(INVALID_GFN) )
-            {
-                if ( pfec & PFEC_page_paged )
-                    return HVMTRANS_gfn_paged_out;
-                if ( pfec & PFEC_page_shared )
-                    return HVMTRANS_gfn_shared;
-                if ( pfinfo )
-                {
-                    pfinfo->linear = addr;
-                    pfinfo->ec = pfec & ~PFEC_implicit;
-                }
-                return HVMTRANS_bad_linear_to_gfn;
-            }
-            gpa |= (paddr_t)gfn << PAGE_SHIFT;
-        }
-        else
-        {
-            gfn = addr >> PAGE_SHIFT;
-            gpa = addr;
-        }
-
-        /*
-         * No need to do the P2M lookup for internally handled MMIO, benefiting
-         * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
-         * - newer Windows (like Server 2012) for HPET accesses.
-         */
-        if ( v == current && is_hvm_vcpu(v)
-             && !nestedhvm_vcpu_in_guestmode(v)
-             && hvm_mmio_internal(gpa) )
-            return HVMTRANS_bad_gfn_to_mfn;
-
-        page = get_page_from_gfn(v->domain, gfn, &p2mt, P2M_UNSHARE);
-
-        if ( !page )
-            return HVMTRANS_bad_gfn_to_mfn;
-
-        if ( p2m_is_paging(p2mt) )
-        {
-            put_page(page);
-            p2m_mem_paging_populate(v->domain, gfn);
-            return HVMTRANS_gfn_paged_out;
-        }
-        if ( p2m_is_shared(p2mt) )
-        {
-            put_page(page);
-            return HVMTRANS_gfn_shared;
-        }
-        if ( p2m_is_grant(p2mt) )
-        {
-            put_page(page);
-            return HVMTRANS_unhandleable;
-        }
+        res = hvm_translate_get_page(v, addr, flags & HVMCOPY_linear,
+                                     pfec, pfinfo, &page, &gfn, &p2mt);
+        if ( res != HVMTRANS_okay )
+            return res;
 
         p = (char *)__map_domain_page(page) + (addr & ~PAGE_MASK);
 
@@ -3173,10 +3197,11 @@ static enum hvm_translation_result __hvm_copy(
             if ( p2m_is_discard_write(p2mt) )
             {
                 static unsigned long lastpage;
-                if ( xchg(&lastpage, gfn) != gfn )
+
+                if ( xchg(&lastpage, gfn_x(gfn)) != gfn_x(gfn) )
                     dprintk(XENLOG_G_DEBUG,
                             "%pv attempted write to read-only gfn %#lx (mfn=%#lx)\n",
-                            v, gfn, page_to_mfn(page));
+                            v, gfn_x(gfn), page_to_mfn(page));
             }
             else
             {
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index a1799a3..42cccf0 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -24,6 +24,7 @@
 #include <xen/sched.h>
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
+#include <asm/p2m.h>
 
 #ifndef NDEBUG
 #define DBG_LEVEL_0                 (1 << 0)
@@ -103,6 +104,17 @@ enum hvm_translation_result hvm_fetch_from_guest_linear(
     void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
 
+/*
+ * Get a reference on the page under an HVM physical or linear address.  If
+ * linear, a pagewalk is performed using pfec (fault details optionally in
+ * pfinfo).
+ * On success, returns HVMTRANS_okay with a reference taken on **_page.
+ */
+enum hvm_translation_result hvm_translate_get_page(
+    struct vcpu *v, unsigned long addr, bool linear, uint32_t pfec,
+    pagefault_info_t *pfinfo, struct page_info **_page,
+    gfn_t *__gfn, p2m_type_t *_p2mt);
+
 #define HVM_HCALL_completed  0 /* hypercall completed - no further action */
 #define HVM_HCALL_preempted  1 /* hypercall preempted - re-execute VMCALL */
 int hvm_hypercall(struct cpu_user_regs *regs);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
  2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
                   ` (4 preceding siblings ...)
  2017-06-21 15:12 ` [PATCH 5/6] x86/hvm: Break out __hvm_copy()'s translation logic Andrew Cooper
@ 2017-06-21 15:12 ` Andrew Cooper
  2017-06-21 16:19   ` Paul Durrant
  2017-06-22  9:06   ` Jan Beulich
  5 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 15:12 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Mihai Donțu, Paul Durrant, Razvan Cojocaru,
	Jan Beulich

An access which crosses a page boundary is performed atomically by x86
hardware, albeit with a severe performance penalty.  An important corner case
is when a straddled access hits two pages which differ in whether a
translation exists, or in net access rights.

The use of hvm_copy*() in hvmemul_write() is problematic, because it performs
a translation then completes the partial write, before moving onto the next
translation.

If an individual emulated write straddles two pages, the first of which is
writable, and the second of which is not, the first half of the write will
complete before #PF is raised from the second half.

This results in guest state corruption as a side effect of emulation, which
has been observed to cause windows to crash while under introspection.

Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an
entire contents of a linear access, and vmap() the underlying frames to
provide a contiguous virtual mapping for the emulator to use.  This is the
same mechanism as used by the shadow emulation code.

This will catch any translation issues and abort the emulation before any
modifications occur.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
CC: Mihai Donțu <mdontu@bitdefender.com>

While the maximum size of linear mapping is capped at 1 page, the logic in the
helpers is written to work properly as hvmemul_ctxt->mfn[] gets longer,
specifically with XSAVE instruction emulation in mind.

This has only had light testing so far.
---
 xen/arch/x86/hvm/emulate.c        | 179 ++++++++++++++++++++++++++++++++++----
 xen/include/asm-x86/hvm/emulate.h |   7 ++
 2 files changed, 169 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 384ad0b..804bea5 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
 }
 
 /*
+ * Map the frame(s) covering an individual linear access, for writeable
+ * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other errors
+ * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings.
+ *
+ * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is
+ * clean before use, and poisions unused slots with INVALID_MFN.
+ */
+static void *hvmemul_map_linear_addr(
+    unsigned long linear, unsigned int bytes, uint32_t pfec,
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    struct vcpu *curr = current;
+    void *err, *mapping;
+
+    /* First and final gfns which need mapping. */
+    unsigned long frame = linear >> PAGE_SHIFT, first = frame;
+    unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;
+
+    /*
+     * mfn points to the next free slot.  All used slots have a page reference
+     * held on them.
+     */
+    mfn_t *mfn = &hvmemul_ctxt->mfn[0];
+
+    /*
+     * The caller has no legitimate reason for trying a zero-byte write, but
+     * final is calculate to fail safe in release builds.
+     *
+     * The maximum write size depends on the number of adjacent mfns[] which
+     * can be vmap()'d, accouting for possible misalignment within the region.
+     * The higher level emulation callers are responsible for ensuring that
+     * mfns[] is large enough for the requested write size.
+     */
+    if ( bytes == 0 ||
+         final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )
+    {
+        ASSERT_UNREACHABLE();
+        goto unhandleable;
+    }
+
+    do {
+        enum hvm_translation_result res;
+        struct page_info *page;
+        pagefault_info_t pfinfo;
+        p2m_type_t p2mt;
+
+        res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
+                                     &pfinfo, &page, NULL, &p2mt);
+
+        switch ( res )
+        {
+        case HVMTRANS_okay:
+            break;
+
+        case HVMTRANS_bad_linear_to_gfn:
+            x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
+            err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
+            goto out;
+
+        case HVMTRANS_bad_gfn_to_mfn:
+            err = NULL;
+            goto out;
+
+        case HVMTRANS_gfn_paged_out:
+        case HVMTRANS_gfn_shared:
+            err = ERR_PTR(~(long)X86EMUL_RETRY);
+            goto out;
+
+        default:
+            goto unhandleable;
+        }
+
+        /* Error checking.  Confirm that the current slot is clean. */
+        ASSERT(mfn_x(*mfn) == 0);
+
+        *mfn++ = _mfn(page_to_mfn(page));
+        frame++;
+
+        if ( p2m_is_discard_write(p2mt) )
+        {
+            err = ERR_PTR(~(long)X86EMUL_OKAY);
+            goto out;
+        }
+
+    } while ( frame < final );
+
+    /* Entire access within a single frame? */
+    if ( first == final )
+        mapping = map_domain_page(hvmemul_ctxt->mfn[0]) + (linear & ~PAGE_MASK);
+    /* Multiple frames? Need to vmap(). */
+    else if ( (mapping = vmap(hvmemul_ctxt->mfn,
+                              mfn - hvmemul_ctxt->mfn)) == NULL )
+        goto unhandleable;
+
+#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */
+    while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
+    {
+        ASSERT(mfn_x(*mfn) == 0);
+        *mfn++ = INVALID_MFN;
+    }
+#endif
+
+    return mapping;
+
+ unhandleable:
+    err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE);
+
+ out:
+    /* Drop all held references. */
+    while ( mfn > hvmemul_ctxt->mfn )
+        put_page(mfn_to_page(mfn_x(*mfn--)));
+
+    return err;
+}
+
+static void hvmemul_unmap_linear_addr(
+    void *mapping, unsigned long linear, unsigned int bytes,
+    struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+    struct domain *currd = current->domain;
+    unsigned long frame = linear >> PAGE_SHIFT;
+    unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;
+    mfn_t *mfn = &hvmemul_ctxt->mfn[0];
+
+    ASSERT(bytes > 0);
+
+    if ( frame == final )
+        unmap_domain_page(mapping);
+    else
+        vunmap(mapping);
+
+    do
+    {
+        ASSERT(mfn_valid(*mfn));
+        paging_mark_dirty(currd, *mfn);
+        put_page(mfn_to_page(mfn_x(*mfn)));
+
+        frame++;
+        *mfn++ = _mfn(0); /* Clean slot for map()'s error checking. */
+
+    } while ( frame < final );
+
+
+#ifndef NDEBUG /* Check (and clean) all unused mfns. */
+    while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
+    {
+        ASSERT(mfn_eq(*mfn, INVALID_MFN));
+        *mfn++ = _mfn(0);
+    }
+#endif
+}
+
+/*
  * Convert addr from linear to physical form, valid over the range
  * [addr, addr + *reps * bytes_per_rep]. *reps is adjusted according to
  * the valid computed range. It is always >0 when X86EMUL_OKAY is returned.
@@ -987,11 +1140,11 @@ static int hvmemul_write(
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
     struct vcpu *curr = current;
-    pagefault_info_t pfinfo;
     unsigned long addr, reps = 1;
     uint32_t pfec = PFEC_page_present | PFEC_write_access;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
     int rc;
+    void *mapping;
 
     if ( is_x86_system_segment(seg) )
         pfec |= PFEC_implicit;
@@ -1007,23 +1160,15 @@ static int hvmemul_write(
          (vio->mmio_gla == (addr & PAGE_MASK)) )
         return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 1);
 
-    rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
-
-    switch ( rc )
-    {
-    case HVMTRANS_okay:
-        break;
-    case HVMTRANS_bad_linear_to_gfn:
-        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
-        return X86EMUL_EXCEPTION;
-    case HVMTRANS_bad_gfn_to_mfn:
+    mapping = hvmemul_map_linear_addr(addr, bytes, pfec, hvmemul_ctxt);
+    if ( IS_ERR(mapping) )
+        return ~PTR_ERR(mapping);
+    else if ( !mapping )
         return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
-    case HVMTRANS_gfn_paged_out:
-    case HVMTRANS_gfn_shared:
-        return X86EMUL_RETRY;
-    default:
-        return X86EMUL_UNHANDLEABLE;
-    }
+
+    memcpy(mapping, p_data, bytes);
+
+    hvmemul_unmap_linear_addr(mapping, addr, bytes, hvmemul_ctxt);
 
     return X86EMUL_OKAY;
 }
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 8864775..65efd4e 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -37,6 +37,13 @@ struct hvm_emulate_ctxt {
     unsigned long seg_reg_accessed;
     unsigned long seg_reg_dirty;
 
+    /*
+     * MFNs behind temporary mappings in the write callback.  The length is
+     * arbitrary, and can be increased if writes longer than PAGE_SIZE are
+     * needed.
+     */
+    mfn_t mfn[2];
+
     uint32_t intr_shadow;
 
     bool_t set_context;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
  2017-06-21 15:12 ` [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch() Andrew Cooper
@ 2017-06-21 16:04   ` Paul Durrant
  2017-06-21 16:15     ` Andrew Cooper
  2017-06-22  8:05   ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 21 June 2017 16:12
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
> 
> Force insn_off to a single byte, as offset can wrap around or truncate with
> respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.
> 
> Furthermore, don't use an ASSERT() for bounds checking the write into
> hvmemul_ctxt->insn_buf[].
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Paul Durrant <paul.durrant@citrix.com>
> 
> For anyone wondering, this is way to explot XSA-186
> ---
>  xen/arch/x86/hvm/emulate.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 11e4aba..495e312 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -939,7 +939,8 @@ int hvmemul_insn_fetch(
>  {
>      struct hvm_emulate_ctxt *hvmemul_ctxt =
>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
> -    unsigned int insn_off = offset - hvmemul_ctxt->insn_buf_eip;
> +    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
> +    uint8_t insn_off = offset - hvmemul_ctxt->insn_buf_eip;

Why the change to a uint8_t?

  Paul

> 
>      /*
>       * Fall back if requested bytes are not in the prefetch cache.
> @@ -953,7 +954,17 @@ int hvmemul_insn_fetch(
> 
>          if ( rc == X86EMUL_OKAY && bytes )
>          {
> -            ASSERT(insn_off + bytes <= sizeof(hvmemul_ctxt->insn_buf));
> +            /*
> +             * Will we overflow insn_buf[]?  This shouldn't be able to happen,
> +             * which means something went wrong with instruction decoding...
> +             */
> +            if ( insn_off > sizeof(hvmemul_ctxt->insn_buf) ||
> +                 (insn_off + bytes) > sizeof(hvmemul_ctxt->insn_buf) )
> +            {
> +                ASSERT_UNREACHABLE();
> +                return X86EMUL_UNHANDLEABLE;
> +            }
> +
>              memcpy(&hvmemul_ctxt->insn_buf[insn_off], p_data, bytes);
>              hvmemul_ctxt->insn_buf_bytes = insn_off + bytes;
>          }
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
  2017-06-21 16:04   ` Paul Durrant
@ 2017-06-21 16:15     ` Andrew Cooper
  2017-06-21 16:49       ` Paul Durrant
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-06-21 16:15 UTC (permalink / raw)
  To: Paul Durrant, Xen-devel; +Cc: Jan Beulich

On 21/06/17 17:04, Paul Durrant wrote:
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: 21 June 2017 16:12
>> To: Xen-devel <xen-devel@lists.xen.org>
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
>> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>
>> Subject: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
>>
>> Force insn_off to a single byte, as offset can wrap around or truncate with
>> respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.
>>
>> Furthermore, don't use an ASSERT() for bounds checking the write into
>> hvmemul_ctxt->insn_buf[].
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Paul Durrant <paul.durrant@citrix.com>
>>
>> For anyone wondering, this is way to explot XSA-186
>> ---
>>  xen/arch/x86/hvm/emulate.c | 15 +++++++++++++--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
>> index 11e4aba..495e312 100644
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -939,7 +939,8 @@ int hvmemul_insn_fetch(
>>  {
>>      struct hvm_emulate_ctxt *hvmemul_ctxt =
>>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
>> -    unsigned int insn_off = offset - hvmemul_ctxt->insn_buf_eip;
>> +    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
>> +    uint8_t insn_off = offset - hvmemul_ctxt->insn_buf_eip;
> Why the change to a uint8_t?

XSA-186 caused problems because offset was truncated at 16 bits, but all
calculations here are performed at 64 bits wide, then truncated down to
32bits wide.  As a result, insn_off could become massively positive.

insn_off needs to be less wide than the minimum truncation width of
incoming parameters for it to work correctly.

Code hitting the emulator can legitimately cause offset to truncate at
32bits WRT EIP, and the only reason we aren't still vulnerable is
because insn_off is unsigned int.  If it were unsigned long, we'd have
another privilege escalation vulnerability.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
  2017-06-21 15:12 ` [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings Andrew Cooper
@ 2017-06-21 16:19   ` Paul Durrant
  2017-06-22  9:06   ` Jan Beulich
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 16:19 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Mihai Donțu, Razvan Cojocaru, Jan Beulich

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 21 June 2017 16:13
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>; Razvan
> Cojocaru <rcojocaru@bitdefender.com>; Mihai Donțu
> <mdontu@bitdefender.com>
> Subject: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real
> mappings
> 
> An access which crosses a page boundary is performed atomically by x86
> hardware, albeit with a severe performance penalty.  An important corner
> case
> is when a straddled access hits two pages which differ in whether a
> translation exists, or in net access rights.
> 
> The use of hvm_copy*() in hvmemul_write() is problematic, because it
> performs
> a translation then completes the partial write, before moving onto the next
> translation.
> 
> If an individual emulated write straddles two pages, the first of which is
> writable, and the second of which is not, the first half of the write will
> complete before #PF is raised from the second half.
> 
> This results in guest state corruption as a side effect of emulation, which
> has been observed to cause windows to crash while under introspection.
> 
> Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an
> entire contents of a linear access, and vmap() the underlying frames to
> provide a contiguous virtual mapping for the emulator to use.  This is the
> same mechanism as used by the shadow emulation code.
> 
> This will catch any translation issues and abort the emulation before any
> modifications occur.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Paul Durrant <paul.durrant@citrix.com>
> CC: Razvan Cojocaru <rcojocaru@bitdefender.com>
> CC: Mihai Donțu <mdontu@bitdefender.com>
> 
> While the maximum size of linear mapping is capped at 1 page, the logic in
> the
> helpers is written to work properly as hvmemul_ctxt->mfn[] gets longer,
> specifically with XSAVE instruction emulation in mind.
> 
> This has only had light testing so far.
> ---
>  xen/arch/x86/hvm/emulate.c        | 179
> ++++++++++++++++++++++++++++++++++----
>  xen/include/asm-x86/hvm/emulate.h |   7 ++
>  2 files changed, 169 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 384ad0b..804bea5 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t
> mmio_gpa,
>  }
> 
>  /*
> + * Map the frame(s) covering an individual linear access, for writeable
> + * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other
> errors
> + * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings.
> + *
> + * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is
> + * clean before use, and poisions unused slots with INVALID_MFN.
> + */
> +static void *hvmemul_map_linear_addr(
> +    unsigned long linear, unsigned int bytes, uint32_t pfec,
> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
> +{
> +    struct vcpu *curr = current;
> +    void *err, *mapping;
> +
> +    /* First and final gfns which need mapping. */
> +    unsigned long frame = linear >> PAGE_SHIFT, first = frame;
> +    unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;

Do we need to worry about linear + bytes overflowing here?

Also, is this ever legitimately called with bytes == 0?

> +
> +    /*
> +     * mfn points to the next free slot.  All used slots have a page reference
> +     * held on them.
> +     */
> +    mfn_t *mfn = &hvmemul_ctxt->mfn[0];
> +
> +    /*
> +     * The caller has no legitimate reason for trying a zero-byte write, but
> +     * final is calculate to fail safe in release builds.
> +     *
> +     * The maximum write size depends on the number of adjacent mfns[]
> which
> +     * can be vmap()'d, accouting for possible misalignment within the region.
> +     * The higher level emulation callers are responsible for ensuring that
> +     * mfns[] is large enough for the requested write size.
> +     */
> +    if ( bytes == 0 ||
> +         final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )
> +    {

I guess not, so why the weird looking calculation for final? It's value will not be used when bytes == 0.

> +        ASSERT_UNREACHABLE();
> +        goto unhandleable;
> +    }
> +
> +    do {
> +        enum hvm_translation_result res;
> +        struct page_info *page;
> +        pagefault_info_t pfinfo;
> +        p2m_type_t p2mt;
> +
> +        res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
> +                                     &pfinfo, &page, NULL, &p2mt);
> +
> +        switch ( res )
> +        {
> +        case HVMTRANS_okay:
> +            break;
> +
> +        case HVMTRANS_bad_linear_to_gfn:
> +            x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
> +            err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
> +            goto out;
> +
> +        case HVMTRANS_bad_gfn_to_mfn:
> +            err = NULL;
> +            goto out;
> +
> +        case HVMTRANS_gfn_paged_out:
> +        case HVMTRANS_gfn_shared:
> +            err = ERR_PTR(~(long)X86EMUL_RETRY);
> +            goto out;
> +
> +        default:
> +            goto unhandleable;
> +        }
> +
> +        /* Error checking.  Confirm that the current slot is clean. */
> +        ASSERT(mfn_x(*mfn) == 0);
> +
> +        *mfn++ = _mfn(page_to_mfn(page));
> +        frame++;
> +
> +        if ( p2m_is_discard_write(p2mt) )
> +        {
> +            err = ERR_PTR(~(long)X86EMUL_OKAY);
> +            goto out;
> +        }
> +
> +    } while ( frame < final );
> +
> +    /* Entire access within a single frame? */
> +    if ( first == final )
> +        mapping = map_domain_page(hvmemul_ctxt->mfn[0]) + (linear &
> ~PAGE_MASK);
> +    /* Multiple frames? Need to vmap(). */
> +    else if ( (mapping = vmap(hvmemul_ctxt->mfn,
> +                              mfn - hvmemul_ctxt->mfn)) == NULL )
> +        goto unhandleable;
> +
> +#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */
> +    while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
> +    {
> +        ASSERT(mfn_x(*mfn) == 0);
> +        *mfn++ = INVALID_MFN;
> +    }
> +#endif
> +
> +    return mapping;
> +
> + unhandleable:
> +    err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE);
> +
> + out:
> +    /* Drop all held references. */
> +    while ( mfn > hvmemul_ctxt->mfn )
> +        put_page(mfn_to_page(mfn_x(*mfn--)));
> +
> +    return err;
> +}
> +
> +static void hvmemul_unmap_linear_addr(
> +    void *mapping, unsigned long linear, unsigned int bytes,
> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
> +{
> +    struct domain *currd = current->domain;
> +    unsigned long frame = linear >> PAGE_SHIFT;
> +    unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;
> +    mfn_t *mfn = &hvmemul_ctxt->mfn[0];
> +
> +    ASSERT(bytes > 0);

Why not return if bytes == 0? I know it's not a legitimate call but in a non-debug build it would result in unmap_domain_page() being called below.

  Paul

> +
> +    if ( frame == final )
> +        unmap_domain_page(mapping);
> +    else
> +        vunmap(mapping);
> +
> +    do
> +    {
> +        ASSERT(mfn_valid(*mfn));
> +        paging_mark_dirty(currd, *mfn);
> +        put_page(mfn_to_page(mfn_x(*mfn)));
> +
> +        frame++;
> +        *mfn++ = _mfn(0); /* Clean slot for map()'s error checking. */
> +
> +    } while ( frame < final );
> +
> +
> +#ifndef NDEBUG /* Check (and clean) all unused mfns. */
> +    while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
> +    {
> +        ASSERT(mfn_eq(*mfn, INVALID_MFN));
> +        *mfn++ = _mfn(0);
> +    }
> +#endif
> +}
> +
> +/*
>   * Convert addr from linear to physical form, valid over the range
>   * [addr, addr + *reps * bytes_per_rep]. *reps is adjusted according to
>   * the valid computed range. It is always >0 when X86EMUL_OKAY is
> returned.
> @@ -987,11 +1140,11 @@ static int hvmemul_write(
>      struct hvm_emulate_ctxt *hvmemul_ctxt =
>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
>      struct vcpu *curr = current;
> -    pagefault_info_t pfinfo;
>      unsigned long addr, reps = 1;
>      uint32_t pfec = PFEC_page_present | PFEC_write_access;
>      struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
>      int rc;
> +    void *mapping;
> 
>      if ( is_x86_system_segment(seg) )
>          pfec |= PFEC_implicit;
> @@ -1007,23 +1160,15 @@ static int hvmemul_write(
>           (vio->mmio_gla == (addr & PAGE_MASK)) )
>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec,
> hvmemul_ctxt, 1);
> 
> -    rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
> -
> -    switch ( rc )
> -    {
> -    case HVMTRANS_okay:
> -        break;
> -    case HVMTRANS_bad_linear_to_gfn:
> -        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
> -        return X86EMUL_EXCEPTION;
> -    case HVMTRANS_bad_gfn_to_mfn:
> +    mapping = hvmemul_map_linear_addr(addr, bytes, pfec,
> hvmemul_ctxt);
> +    if ( IS_ERR(mapping) )
> +        return ~PTR_ERR(mapping);
> +    else if ( !mapping )
>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec,
> hvmemul_ctxt, 0);
> -    case HVMTRANS_gfn_paged_out:
> -    case HVMTRANS_gfn_shared:
> -        return X86EMUL_RETRY;
> -    default:
> -        return X86EMUL_UNHANDLEABLE;
> -    }
> +
> +    memcpy(mapping, p_data, bytes);
> +
> +    hvmemul_unmap_linear_addr(mapping, addr, bytes, hvmemul_ctxt);
> 
>      return X86EMUL_OKAY;
>  }
> diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-
> x86/hvm/emulate.h
> index 8864775..65efd4e 100644
> --- a/xen/include/asm-x86/hvm/emulate.h
> +++ b/xen/include/asm-x86/hvm/emulate.h
> @@ -37,6 +37,13 @@ struct hvm_emulate_ctxt {
>      unsigned long seg_reg_accessed;
>      unsigned long seg_reg_dirty;
> 
> +    /*
> +     * MFNs behind temporary mappings in the write callback.  The length is
> +     * arbitrary, and can be increased if writes longer than PAGE_SIZE are
> +     * needed.
> +     */
> +    mfn_t mfn[2];
> +
>      uint32_t intr_shadow;
> 
>      bool_t set_context;
> --
> 2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
  2017-06-21 16:15     ` Andrew Cooper
@ 2017-06-21 16:49       ` Paul Durrant
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Durrant @ 2017-06-21 16:49 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich

> -----Original Message-----
> From: Andrew Cooper
> Sent: 21 June 2017 17:15
> To: Paul Durrant <Paul.Durrant@citrix.com>; Xen-devel <xen-
> devel@lists.xen.org>
> Cc: Jan Beulich <JBeulich@suse.com>
> Subject: Re: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
> 
> On 21/06/17 17:04, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> Sent: 21 June 2017 16:12
> >> To: Xen-devel <xen-devel@lists.xen.org>
> >> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
> >> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>
> >> Subject: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
> >>
> >> Force insn_off to a single byte, as offset can wrap around or truncate with
> >> respect to sh_ctxt->insn_buf_eip under a number of normal
> circumstances.
> >>
> >> Furthermore, don't use an ASSERT() for bounds checking the write into
> >> hvmemul_ctxt->insn_buf[].
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Paul Durrant <paul.durrant@citrix.com>
> >>
> >> For anyone wondering, this is way to explot XSA-186
> >> ---
> >>  xen/arch/x86/hvm/emulate.c | 15 +++++++++++++--
> >>  1 file changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> >> index 11e4aba..495e312 100644
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -939,7 +939,8 @@ int hvmemul_insn_fetch(
> >>  {
> >>      struct hvm_emulate_ctxt *hvmemul_ctxt =
> >>          container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
> >> -    unsigned int insn_off = offset - hvmemul_ctxt->insn_buf_eip;
> >> +    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
> >> +    uint8_t insn_off = offset - hvmemul_ctxt->insn_buf_eip;
> > Why the change to a uint8_t?
> 
> XSA-186 caused problems because offset was truncated at 16 bits, but all
> calculations here are performed at 64 bits wide, then truncated down to
> 32bits wide.  As a result, insn_off could become massively positive.
> 
> insn_off needs to be less wide than the minimum truncation width of
> incoming parameters for it to work correctly.
> 
> Code hitting the emulator can legitimately cause offset to truncate at
> 32bits WRT EIP, and the only reason we aren't still vulnerable is
> because insn_off is unsigned int.  If it were unsigned long, we'd have
> another privilege escalation vulnerability.

Ok.

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch()
  2017-06-21 15:12 ` [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch() Andrew Cooper
  2017-06-21 16:04   ` Paul Durrant
@ 2017-06-22  8:05   ` Jan Beulich
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  8:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, Xen-devel

>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
> Force insn_off to a single byte, as offset can wrap around or truncate with
> respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.
> 
> Furthermore, don't use an ASSERT() for bounds checking the write into
> hvmemul_ctxt->insn_buf[].
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch()
  2017-06-21 15:12 ` [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch() Andrew Cooper
@ 2017-06-22  8:09   ` Jan Beulich
  2017-06-22 11:52   ` Tim Deegan
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  8:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -235,12 +235,16 @@ hvm_emulate_insn_fetch(enum x86_segment seg,
>  {
>      struct sh_emulate_ctxt *sh_ctxt =
>          container_of(ctxt, struct sh_emulate_ctxt, ctxt);
> -    unsigned int insn_off = offset - sh_ctxt->insn_buf_eip;
> +    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
> +    uint8_t insn_off = offset - sh_ctxt->insn_buf_eip;
>  
>      ASSERT(seg == x86_seg_cs);
>  
> -    /* Fall back if requested bytes are not in the prefetch cache. */
> -    if ( unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
> +    /*
> +     * Fall back if requested bytes are not in the prefetch cache, but always
> +     * perform the zero-length read for segmentation purposes.
> +     */
> +    if ( !bytes || unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
>          return hvm_read(seg, offset, p_data, bytes,
>                          hvm_access_insn_fetch, sh_ctxt);

I only note this now - patch one has an (apparently unnecessary)
insn_off > sizeof() check ahead of the insn_off + bytes > sizeof()
one. If you see a need for it to be there, why don't you put it here
too? In any event I'd like to ask for both pieces of code to match
up in the end (i.e. I'm fine with the adjustment being done on either
side), and then
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-21 15:12 ` [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest() Andrew Cooper
@ 2017-06-22  8:14   ` Jan Beulich
  2017-06-22  8:21     ` Andrew Cooper
  2017-06-22 12:09   ` Tim Deegan
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  8:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
> infrasturcture, but take the opportunity to avoid opencoding it.
> 
> The chosen error constants require need to be negative to work with IS_ERR(),
> but no other changes.

Drop one of "require" or "need"?

> --- a/xen/arch/x86/mm/shadow/private.h
> +++ b/xen/arch/x86/mm/shadow/private.h
> @@ -395,10 +395,9 @@ void shadow_unhook_mappings(struct domain *d, mfn_t smfn, int user_only);
>  
>  /* Returns a mapped pointer to write to, or one of the following error
>   * indicators. */
> -#define MAPPING_UNHANDLEABLE ((void *)(unsigned long)X86EMUL_UNHANDLEABLE)
> -#define MAPPING_EXCEPTION    ((void *)(unsigned long)X86EMUL_EXCEPTION)
> -#define MAPPING_SILENT_FAIL  ((void *)(unsigned long)X86EMUL_OKAY)
> -#define sh_emulate_map_dest_failed(rc) ((unsigned long)(rc) <= 3)
> +#define MAPPING_UNHANDLEABLE ERR_PTR(~(long)X86EMUL_UNHANDLEABLE)
> +#define MAPPING_EXCEPTION    ERR_PTR(~(long)X86EMUL_EXCEPTION)
> +#define MAPPING_SILENT_FAIL  ERR_PTR(~(long)X86EMUL_OKAY)

While this doesn't change with your patch, this last definition has a
string dependency on X86EMUL_OKAY to be zero, yet there's no
respective BUILD_BUG_ON() anywhere. Would you mind adding
one at once? In any event
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/6] [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result
  2017-06-21 15:12 ` [PATCH 4/6] [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result Andrew Cooper
@ 2017-06-22  8:19   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  8:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> 
> Name definitely open to improvement.

I'm fine with it; hvm_translation_result is a little long, but shouldn't
be used often.

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-22  8:14   ` Jan Beulich
@ 2017-06-22  8:21     ` Andrew Cooper
  2017-06-22  9:07       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-06-22  8:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Xen-devel

On 22/06/2017 09:14, Jan Beulich wrote:
>>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
>> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
>> infrasturcture, but take the opportunity to avoid opencoding it.
>>
>> The chosen error constants require need to be negative to work with IS_ERR(),
>> but no other changes.
> Drop one of "require" or "need"?
>
>> --- a/xen/arch/x86/mm/shadow/private.h
>> +++ b/xen/arch/x86/mm/shadow/private.h
>> @@ -395,10 +395,9 @@ void shadow_unhook_mappings(struct domain *d, mfn_t smfn, int user_only);
>>  
>>  /* Returns a mapped pointer to write to, or one of the following error
>>   * indicators. */
>> -#define MAPPING_UNHANDLEABLE ((void *)(unsigned long)X86EMUL_UNHANDLEABLE)
>> -#define MAPPING_EXCEPTION    ((void *)(unsigned long)X86EMUL_EXCEPTION)
>> -#define MAPPING_SILENT_FAIL  ((void *)(unsigned long)X86EMUL_OKAY)
>> -#define sh_emulate_map_dest_failed(rc) ((unsigned long)(rc) <= 3)
>> +#define MAPPING_UNHANDLEABLE ERR_PTR(~(long)X86EMUL_UNHANDLEABLE)
>> +#define MAPPING_EXCEPTION    ERR_PTR(~(long)X86EMUL_EXCEPTION)
>> +#define MAPPING_SILENT_FAIL  ERR_PTR(~(long)X86EMUL_OKAY)
> While this doesn't change with your patch, this last definition has a
> string dependency on X86EMUL_OKAY to be zero, yet there's no
> respective BUILD_BUG_ON() anywhere. Would you mind adding
> one at once? In any event
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

There is no longer any dependence on X86EMUL_OKAY being 0 (so far as I
can see).  All this logic would work fine if OKAY had the value 10 for
example.

The only requirement (now) is that the X86EMUL constants are all within
MAX_ERRNO so they count as errors as far as the ERR_PTR logic is
concerned, and I suppose those should gain BUILD_BUG_ON()s

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] x86/hvm: Break out __hvm_copy()'s translation logic
  2017-06-21 15:12 ` [PATCH 5/6] x86/hvm: Break out __hvm_copy()'s translation logic Andrew Cooper
@ 2017-06-22  8:30   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  8:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3072,6 +3072,80 @@ void hvm_task_switch(
>      hvm_unmap_entry(nptss_desc);
>  }
>  
> +enum hvm_translation_result hvm_translate_get_page(
> +    struct vcpu *v, unsigned long addr, bool linear, uint32_t pfec,
> +    pagefault_info_t *pfinfo, struct page_info **_page,
> +    gfn_t *__gfn, p2m_type_t *_p2mt)

Please avoid leading underscores on parameter names (and even
more so double ones). page_p, gfn_p, and p2mt_p perhaps?

> +{
> +    struct page_info *page;
> +    p2m_type_t p2mt;
> +    gfn_t gfn;
> +
> +    if ( linear )
> +    {
> +        gfn = _gfn(paging_gva_to_gfn(v, addr, &pfec));
> +
> +        if ( gfn_eq(gfn, INVALID_GFN) )
> +        {
> +            if ( pfec & PFEC_page_paged )
> +                return HVMTRANS_gfn_paged_out;
> +
> +            if ( pfec & PFEC_page_shared )
> +                return HVMTRANS_gfn_shared;
> +
> +            if ( pfinfo )
> +            {
> +                pfinfo->linear = addr;
> +                pfinfo->ec = pfec & ~PFEC_implicit;
> +            }
> +
> +            return HVMTRANS_bad_linear_to_gfn;
> +        }
> +    }
> +    else
> +        gfn = _gfn(addr >> PAGE_SHIFT);

As we risk the caller not recognizing that *pfinfo won't be written
to on this "else" path, wouldn't it be a good idea to ASSERT(!pfinfo)
here?

> +    /*
> +     * No need to do the P2M lookup for internally handled MMIO, benefiting
> +     * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
> +     * - newer Windows (like Server 2012) for HPET accesses.
> +     */
> +    if ( v == current && is_hvm_vcpu(v)

Isn't the is_hvm_vcpu() a stray leftover from the PVHv1 removal,
and hence doesn't need retaining here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
  2017-06-21 15:12 ` [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings Andrew Cooper
  2017-06-21 16:19   ` Paul Durrant
@ 2017-06-22  9:06   ` Jan Beulich
  2017-07-03 15:07     ` Andrew Cooper
       [not found]     ` <594BA4A3?= =?UTF-8?Q?0200007800165AA5@prv=ef=bf=bdmh.provo.novell.com>
  1 sibling, 2 replies; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  9:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Mihai Donțu, Paul Durrant, Razvan Cojocaru, Xen-devel

>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t mmio_gpa,
>  }
>  
>  /*
> + * Map the frame(s) covering an individual linear access, for writeable
> + * access.  May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other errors
> + * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings.
> + *
> + * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is
> + * clean before use, and poisions unused slots with INVALID_MFN.
> + */
> +static void *hvmemul_map_linear_addr(
> +    unsigned long linear, unsigned int bytes, uint32_t pfec,
> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
> +{
> +    struct vcpu *curr = current;
> +    void *err, *mapping;
> +
> +    /* First and final gfns which need mapping. */
> +    unsigned long frame = linear >> PAGE_SHIFT, first = frame;
> +    unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT;

I second Paul's desire for the zero bytes case to be rejected up
the call stack.

> +    /*
> +     * mfn points to the next free slot.  All used slots have a page reference
> +     * held on them.
> +     */
> +    mfn_t *mfn = &hvmemul_ctxt->mfn[0];
> +
> +    /*
> +     * The caller has no legitimate reason for trying a zero-byte write, but
> +     * final is calculate to fail safe in release builds.
> +     *
> +     * The maximum write size depends on the number of adjacent mfns[] which
> +     * can be vmap()'d, accouting for possible misalignment within the region.
> +     * The higher level emulation callers are responsible for ensuring that
> +     * mfns[] is large enough for the requested write size.
> +     */
> +    if ( bytes == 0 ||
> +         final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )

Any reason not to use ">= ARRAY_SIZE(hvmemul_ctxt->mfn)"

> +    {
> +        ASSERT_UNREACHABLE();
> +        goto unhandleable;
> +    }
> +
> +    do {
> +        enum hvm_translation_result res;
> +        struct page_info *page;
> +        pagefault_info_t pfinfo;
> +        p2m_type_t p2mt;
> +
> +        res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
> +                                     &pfinfo, &page, NULL, &p2mt);
> +
> +        switch ( res )
> +        {
> +        case HVMTRANS_okay:
> +            break;
> +
> +        case HVMTRANS_bad_linear_to_gfn:
> +            x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
> +            err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
> +            goto out;
> +
> +        case HVMTRANS_bad_gfn_to_mfn:
> +            err = NULL;
> +            goto out;
> +
> +        case HVMTRANS_gfn_paged_out:
> +        case HVMTRANS_gfn_shared:
> +            err = ERR_PTR(~(long)X86EMUL_RETRY);
> +            goto out;
> +
> +        default:
> +            goto unhandleable;
> +        }
> +
> +        /* Error checking.  Confirm that the current slot is clean. */
> +        ASSERT(mfn_x(*mfn) == 0);

Wouldn't this better be done first thing in the loop? And wouldn't
the value better be INVALID_MFN?

> +        *mfn++ = _mfn(page_to_mfn(page));
> +        frame++;
> +
> +        if ( p2m_is_discard_write(p2mt) )
> +        {
> +            err = ERR_PTR(~(long)X86EMUL_OKAY);
> +            goto out;

If one page is discard-write and the other isn't, this will end up
being wrong.

> +        }
> +
> +    } while ( frame < final );
> +
> +    /* Entire access within a single frame? */
> +    if ( first == final )
> +        mapping = map_domain_page(hvmemul_ctxt->mfn[0]) + (linear & ~PAGE_MASK);
> +    /* Multiple frames? Need to vmap(). */
> +    else if ( (mapping = vmap(hvmemul_ctxt->mfn,
> +                              mfn - hvmemul_ctxt->mfn)) == NULL )

final - first + 1 would likely yield better code.

> +        goto unhandleable;
> +
> +#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */
> +    while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
> +    {
> +        ASSERT(mfn_x(*mfn) == 0);
> +        *mfn++ = INVALID_MFN;
> +    }
> +#endif
> +
> +    return mapping;
> +
> + unhandleable:
> +    err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE);
> +
> + out:
> +    /* Drop all held references. */
> +    while ( mfn > hvmemul_ctxt->mfn )
> +        put_page(mfn_to_page(mfn_x(*mfn--)));

ITYM

    while ( mfn-- > hvmemul_ctxt->mfn )
        put_page(mfn_to_page(mfn_x(*mfn)));

or

    while ( mfn > hvmemul_ctxt->mfn )
        put_page(mfn_to_page(mfn_x(*--mfn)));

> +static void hvmemul_unmap_linear_addr(
> +    void *mapping, unsigned long linear, unsigned int bytes,

Both vunmap() and unmap_domain_page() take pointers to const, so
please use const on the pointer here too.

> +    struct hvm_emulate_ctxt *hvmemul_ctxt)

There upsides and downsides to requiring the caller to pass in the
same values as to map(): You can do more correctness checking
here, but you also risk the caller using the wrong values (perhaps
because of a meanwhile updated local variable). While I don't
outright object to this approach, personally I'd prefer minimal
inputs here, and the code deriving everything from hvmemul_ctxt.

> @@ -1007,23 +1160,15 @@ static int hvmemul_write(
>           (vio->mmio_gla == (addr & PAGE_MASK)) )
>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 1);
>  
> -    rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
> -
> -    switch ( rc )
> -    {
> -    case HVMTRANS_okay:
> -        break;
> -    case HVMTRANS_bad_linear_to_gfn:
> -        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
> -        return X86EMUL_EXCEPTION;
> -    case HVMTRANS_bad_gfn_to_mfn:
> +    mapping = hvmemul_map_linear_addr(addr, bytes, pfec, hvmemul_ctxt);
> +    if ( IS_ERR(mapping) )
> +        return ~PTR_ERR(mapping);
> +    else if ( !mapping )

Pointless "else".

>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);

Considering the 2nd linear -> guest-phys translation done here, did
you consider having hvmemul_map_linear_addr() obtain and provide
the GFNs?

> --- a/xen/include/asm-x86/hvm/emulate.h
> +++ b/xen/include/asm-x86/hvm/emulate.h
> @@ -37,6 +37,13 @@ struct hvm_emulate_ctxt {
>      unsigned long seg_reg_accessed;
>      unsigned long seg_reg_dirty;
>  
> +    /*
> +     * MFNs behind temporary mappings in the write callback.  The length is
> +     * arbitrary, and can be increased if writes longer than PAGE_SIZE are
> +     * needed.
> +     */
> +    mfn_t mfn[2];

Mind being precise in the comment, saying "PAGE_SIZE+1"?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-22  8:21     ` Andrew Cooper
@ 2017-06-22  9:07       ` Jan Beulich
  2017-07-03 14:22         ` Andrew Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-06-22  9:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 22.06.17 at 10:21, <andrew.cooper3@citrix.com> wrote:
> On 22/06/2017 09:14, Jan Beulich wrote:
>>>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
>>> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
>>> infrasturcture, but take the opportunity to avoid opencoding it.
>>>
>>> The chosen error constants require need to be negative to work with 
> IS_ERR(),
>>> but no other changes.
>> Drop one of "require" or "need"?
>>
>>> --- a/xen/arch/x86/mm/shadow/private.h
>>> +++ b/xen/arch/x86/mm/shadow/private.h
>>> @@ -395,10 +395,9 @@ void shadow_unhook_mappings(struct domain *d, mfn_t 
> smfn, int user_only);
>>>  
>>>  /* Returns a mapped pointer to write to, or one of the following error
>>>   * indicators. */
>>> -#define MAPPING_UNHANDLEABLE ((void *)(unsigned long)X86EMUL_UNHANDLEABLE)
>>> -#define MAPPING_EXCEPTION    ((void *)(unsigned long)X86EMUL_EXCEPTION)
>>> -#define MAPPING_SILENT_FAIL  ((void *)(unsigned long)X86EMUL_OKAY)
>>> -#define sh_emulate_map_dest_failed(rc) ((unsigned long)(rc) <= 3)
>>> +#define MAPPING_UNHANDLEABLE ERR_PTR(~(long)X86EMUL_UNHANDLEABLE)
>>> +#define MAPPING_EXCEPTION    ERR_PTR(~(long)X86EMUL_EXCEPTION)
>>> +#define MAPPING_SILENT_FAIL  ERR_PTR(~(long)X86EMUL_OKAY)
>> While this doesn't change with your patch, this last definition has a
>> string dependency on X86EMUL_OKAY to be zero, yet there's no
>> respective BUILD_BUG_ON() anywhere. Would you mind adding
>> one at once? In any event
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> There is no longer any dependence on X86EMUL_OKAY being 0 (so far as I
> can see).  All this logic would work fine if OKAY had the value 10 for
> example.

The mapping failure is "silent" only if X86EMUL_OKAY is zero, afaict.

> The only requirement (now) is that the X86EMUL constants are all within
> MAX_ERRNO so they count as errors as far as the ERR_PTR logic is
> concerned, and I suppose those should gain BUILD_BUG_ON()s

That would be helpful too.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch()
  2017-06-21 15:12 ` [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch() Andrew Cooper
  2017-06-22  8:09   ` Jan Beulich
@ 2017-06-22 11:52   ` Tim Deegan
  1 sibling, 0 replies; 28+ messages in thread
From: Tim Deegan @ 2017-06-22 11:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

Hi,

At 16:12 +0100 on 21 Jun (1498061548), Andrew Cooper wrote:
> Zero-legnth reads are jump-target segmentation checks; never serve them from
> the cache.

Why not?  If the target is in the cached range, then it has passed the
segmentation check.  (Or if that's not true then the normal fetch path
needs to be fixed too).

> Force insn_off to a single byte, as offset can wrap around or truncate with
> respect to sh_ctxt->insn_buf_eip under a number of normal circumstances.

Wouldn't it be better to detect that and fall through?  Otherwise we
might return cached bytes by mistake.

Tim.

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Tim Deegan <tim@xen.org>
> CC: Jan Beulich <JBeulich@suse.com>
> ---
>  xen/arch/x86/mm/shadow/common.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> index 2e64a77..deea03a 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -235,12 +235,16 @@ hvm_emulate_insn_fetch(enum x86_segment seg,
>  {
>      struct sh_emulate_ctxt *sh_ctxt =
>          container_of(ctxt, struct sh_emulate_ctxt, ctxt);
> -    unsigned int insn_off = offset - sh_ctxt->insn_buf_eip;
> +    /* Careful, as offset can wrap or truncate WRT insn_buf_eip. */
> +    uint8_t insn_off = offset - sh_ctxt->insn_buf_eip;
>  
>      ASSERT(seg == x86_seg_cs);
>  
> -    /* Fall back if requested bytes are not in the prefetch cache. */
> -    if ( unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
> +    /*
> +     * Fall back if requested bytes are not in the prefetch cache, but always
> +     * perform the zero-length read for segmentation purposes.
> +     */
> +    if ( !bytes || unlikely((insn_off + bytes) > sh_ctxt->insn_buf_bytes) )
>          return hvm_read(seg, offset, p_data, bytes,
>                          hvm_access_insn_fetch, sh_ctxt);
>  
> -- 
> 2.1.4
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-21 15:12 ` [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest() Andrew Cooper
  2017-06-22  8:14   ` Jan Beulich
@ 2017-06-22 12:09   ` Tim Deegan
  2017-06-22 12:46     ` Andrew Cooper
  1 sibling, 1 reply; 28+ messages in thread
From: Tim Deegan @ 2017-06-22 12:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

Hi,

At 16:12 +0100 on 21 Jun (1498061549), Andrew Cooper wrote:
> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
> infrasturcture, but take the opportunity to avoid opencoding it.

s/sturct/struct/.

> @@ -4752,8 +4752,8 @@ sh_x86_emulate_write(struct vcpu *v, unsigned long vaddr, void *src,
>          return X86EMUL_UNHANDLEABLE;
>  
>      addr = sh_emulate_map_dest(v, vaddr, bytes, sh_ctxt);
> -    if ( sh_emulate_map_dest_failed(addr) )
> -        return (long)addr;
> +    if ( IS_ERR(addr) )
> +        return ~PTR_ERR(addr);

Using "return ~PTR_ERR(addr)" when the usual idiom is
"return -PTR_ERR(foo)" is a bit subtle.  Still, the code seems to be
correct, so if people prefer it,

Acked-by: Tim Deegan <tim@xen.org>

Cheers,

Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-22 12:09   ` Tim Deegan
@ 2017-06-22 12:46     ` Andrew Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-06-22 12:46 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jan Beulich, Xen-devel

On 22/06/17 13:09, Tim Deegan wrote:
> Hi,
>
> At 16:12 +0100 on 21 Jun (1498061549), Andrew Cooper wrote:
>> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
>> infrasturcture, but take the opportunity to avoid opencoding it.
> s/sturct/struct/.

D'oh - I'm sure you spotted this before.  I will try to actually fix it
this time.

>
>> @@ -4752,8 +4752,8 @@ sh_x86_emulate_write(struct vcpu *v, unsigned long vaddr, void *src,
>>          return X86EMUL_UNHANDLEABLE;
>>  
>>      addr = sh_emulate_map_dest(v, vaddr, bytes, sh_ctxt);
>> -    if ( sh_emulate_map_dest_failed(addr) )
>> -        return (long)addr;
>> +    if ( IS_ERR(addr) )
>> +        return ~PTR_ERR(addr);
> Using "return ~PTR_ERR(addr)" when the usual idiom is
> "return -PTR_ERR(foo)" is a bit subtle.  Still, the code seems to be
> correct, so if people prefer it,
>
> Acked-by: Tim Deegan <tim@xen.org>

It is necessary to use ~ rather than - because of MAPPING_SILENT_FAIL
being X86EMUL_OKAY under the hood.

Short of sliding the X86EMUL_* constants along by 1 (which itself would
cause other chaos, as there are plenty of return value checks against 0
rather than X86EMUL_OKAY), I can't see any other way of making this
function.

All in all, write-discard p2m types are a pain to work with :(

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-06-22  9:07       ` Jan Beulich
@ 2017-07-03 14:22         ` Andrew Cooper
  2017-07-03 14:30           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-07-03 14:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Xen-devel

On 22/06/17 10:07, Jan Beulich wrote:
>>>> On 22.06.17 at 10:21, <andrew.cooper3@citrix.com> wrote:
>> On 22/06/2017 09:14, Jan Beulich wrote:
>>>>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
>>>> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
>>>> infrasturcture, but take the opportunity to avoid opencoding it.
>>>>
>>>> The chosen error constants require need to be negative to work with 
>> IS_ERR(),
>>>> but no other changes.
>>> Drop one of "require" or "need"?
>>>
>>>> --- a/xen/arch/x86/mm/shadow/private.h
>>>> +++ b/xen/arch/x86/mm/shadow/private.h
>>>> @@ -395,10 +395,9 @@ void shadow_unhook_mappings(struct domain *d, mfn_t 
>> smfn, int user_only);
>>>>  
>>>>  /* Returns a mapped pointer to write to, or one of the following error
>>>>   * indicators. */
>>>> -#define MAPPING_UNHANDLEABLE ((void *)(unsigned long)X86EMUL_UNHANDLEABLE)
>>>> -#define MAPPING_EXCEPTION    ((void *)(unsigned long)X86EMUL_EXCEPTION)
>>>> -#define MAPPING_SILENT_FAIL  ((void *)(unsigned long)X86EMUL_OKAY)
>>>> -#define sh_emulate_map_dest_failed(rc) ((unsigned long)(rc) <= 3)
>>>> +#define MAPPING_UNHANDLEABLE ERR_PTR(~(long)X86EMUL_UNHANDLEABLE)
>>>> +#define MAPPING_EXCEPTION    ERR_PTR(~(long)X86EMUL_EXCEPTION)
>>>> +#define MAPPING_SILENT_FAIL  ERR_PTR(~(long)X86EMUL_OKAY)
>>> While this doesn't change with your patch, this last definition has a
>>> string dependency on X86EMUL_OKAY to be zero, yet there's no
>>> respective BUILD_BUG_ON() anywhere. Would you mind adding
>>> one at once? In any event
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> There is no longer any dependence on X86EMUL_OKAY being 0 (so far as I
>> can see).  All this logic would work fine if OKAY had the value 10 for
>> example.
> The mapping failure is "silent" only if X86EMUL_OKAY is zero, afaict.

It is silent because MAPPING_SILENT_FAIL is considered an error by the
caller (sh_x86_emulate_{write,cmpxchg}()), which unpacks the result
(turning it into X86EMUL_OKAY) and passes it back into the emulation logic.

This doesn't depend on the absolute value of X86EMUL_OKAY.

>
>> The only requirement (now) is that the X86EMUL constants are all within
>> MAX_ERRNO so they count as errors as far as the ERR_PTR logic is
>> concerned, and I suppose those should gain BUILD_BUG_ON()s
> That would be helpful too.

Its sadly not possible, as ERR_PTR() is a static inline and can't be
evaluated in the context of BUILD_BUG_ON().

The best which is possible (without altering the errptr infrastructure)
is BUILD_BUG_ON(IS_ERR_VALUE(~(long)X86EMUL_*)), but this won't catch a
change which redefines the MAPPING_* constants.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest()
  2017-07-03 14:22         ` Andrew Cooper
@ 2017-07-03 14:30           ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-07-03 14:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 03.07.17 at 16:22, <andrew.cooper3@citrix.com> wrote:
> On 22/06/17 10:07, Jan Beulich wrote:
>>>>> On 22.06.17 at 10:21, <andrew.cooper3@citrix.com> wrote:
>>> On 22/06/2017 09:14, Jan Beulich wrote:
>>>>>>> On 21.06.17 at 17:12, <andrew.cooper3@citrix.com> wrote:
>>>>> sh_emulate_map_dest() predates the introduction of the generic ERR_PTR()
>>>>> infrasturcture, but take the opportunity to avoid opencoding it.
>>>>>
>>>>> The chosen error constants require need to be negative to work with 
>>> IS_ERR(),
>>>>> but no other changes.
>>>> Drop one of "require" or "need"?
>>>>
>>>>> --- a/xen/arch/x86/mm/shadow/private.h
>>>>> +++ b/xen/arch/x86/mm/shadow/private.h
>>>>> @@ -395,10 +395,9 @@ void shadow_unhook_mappings(struct domain *d, mfn_t 
>>> smfn, int user_only);
>>>>>  
>>>>>  /* Returns a mapped pointer to write to, or one of the following error
>>>>>   * indicators. */
>>>>> -#define MAPPING_UNHANDLEABLE ((void *)(unsigned long)X86EMUL_UNHANDLEABLE)
>>>>> -#define MAPPING_EXCEPTION    ((void *)(unsigned long)X86EMUL_EXCEPTION)
>>>>> -#define MAPPING_SILENT_FAIL  ((void *)(unsigned long)X86EMUL_OKAY)
>>>>> -#define sh_emulate_map_dest_failed(rc) ((unsigned long)(rc) <= 3)
>>>>> +#define MAPPING_UNHANDLEABLE ERR_PTR(~(long)X86EMUL_UNHANDLEABLE)
>>>>> +#define MAPPING_EXCEPTION    ERR_PTR(~(long)X86EMUL_EXCEPTION)
>>>>> +#define MAPPING_SILENT_FAIL  ERR_PTR(~(long)X86EMUL_OKAY)
>>>> While this doesn't change with your patch, this last definition has a
>>>> string dependency on X86EMUL_OKAY to be zero, yet there's no
>>>> respective BUILD_BUG_ON() anywhere. Would you mind adding
>>>> one at once? In any event
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> There is no longer any dependence on X86EMUL_OKAY being 0 (so far as I
>>> can see).  All this logic would work fine if OKAY had the value 10 for
>>> example.
>> The mapping failure is "silent" only if X86EMUL_OKAY is zero, afaict.
> 
> It is silent because MAPPING_SILENT_FAIL is considered an error by the
> caller (sh_x86_emulate_{write,cmpxchg}()), which unpacks the result
> (turning it into X86EMUL_OKAY) and passes it back into the emulation logic.
> 
> This doesn't depend on the absolute value of X86EMUL_OKAY.

Oh, indeed. R-b then is unconditional.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
  2017-06-22  9:06   ` Jan Beulich
@ 2017-07-03 15:07     ` Andrew Cooper
  2017-07-03 16:06       ` Jan Beulich
       [not found]     ` <594BA4A3?= =?UTF-8?Q?0200007800165AA5@prv=ef=bf=bdmh.provo.novell.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-07-03 15:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Mihai Donțu, Paul Durrant, Razvan Cojocaru, Xen-devel

On 22/06/17 10:06, Jan Beulich wrote:
>
>> +    /*
>> +     * mfn points to the next free slot.  All used slots have a page reference
>> +     * held on them.
>> +     */
>> +    mfn_t *mfn = &hvmemul_ctxt->mfn[0];
>> +
>> +    /*
>> +     * The caller has no legitimate reason for trying a zero-byte write, but
>> +     * final is calculate to fail safe in release builds.
>> +     *
>> +     * The maximum write size depends on the number of adjacent mfns[] which
>> +     * can be vmap()'d, accouting for possible misalignment within the region.
>> +     * The higher level emulation callers are responsible for ensuring that
>> +     * mfns[] is large enough for the requested write size.
>> +     */
>> +    if ( bytes == 0 ||
>> +         final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 )
> Any reason not to use ">= ARRAY_SIZE(hvmemul_ctxt->mfn)"

It more obviously identifies the accounting for misalignment.

>
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        goto unhandleable;
>> +    }
>> +
>> +    do {
>> +        enum hvm_translation_result res;
>> +        struct page_info *page;
>> +        pagefault_info_t pfinfo;
>> +        p2m_type_t p2mt;
>> +
>> +        res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
>> +                                     &pfinfo, &page, NULL, &p2mt);
>> +
>> +        switch ( res )
>> +        {
>> +        case HVMTRANS_okay:
>> +            break;
>> +
>> +        case HVMTRANS_bad_linear_to_gfn:
>> +            x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>> +            err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
>> +            goto out;
>> +
>> +        case HVMTRANS_bad_gfn_to_mfn:
>> +            err = NULL;
>> +            goto out;
>> +
>> +        case HVMTRANS_gfn_paged_out:
>> +        case HVMTRANS_gfn_shared:
>> +            err = ERR_PTR(~(long)X86EMUL_RETRY);
>> +            goto out;
>> +
>> +        default:
>> +            goto unhandleable;
>> +        }
>> +
>> +        /* Error checking.  Confirm that the current slot is clean. */
>> +        ASSERT(mfn_x(*mfn) == 0);
> Wouldn't this better be done first thing in the loop?

IMO its clearer to keep it next to the assignment, but yes, could in
principle move to the top of the loop.

> And wouldn't the value better be INVALID_MFN?

The backing array is zeroed by hvm_emulate_init_once(), so relying on 0
here is more convenient.

Furthermore, INVALID_MFN is used lower down to poison unused slots, so
initialising the whole array to INVALID_MFN reduces the effectiveness of
the checks.

>
>> +        *mfn++ = _mfn(page_to_mfn(page));
>> +        frame++;
>> +
>> +        if ( p2m_is_discard_write(p2mt) )
>> +        {
>> +            err = ERR_PTR(~(long)X86EMUL_OKAY);
>> +            goto out;
> If one page is discard-write and the other isn't, this will end up
> being wrong.

Straddled accesses are always a grey area, and discard-write is an extra
special case which only exists inside Xen.  Discard-write means that the
guest knows that it shouldn't write there at all.

Doing nothing (by logically extending the discard-write restriction over
the entire region) is the least bad option here, IMO.

>> +        goto unhandleable;
>> +
>> +#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */
>> +    while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) )
>> +    {
>> +        ASSERT(mfn_x(*mfn) == 0);
>> +        *mfn++ = INVALID_MFN;
>> +    }
>> +#endif
>> +
>> +    return mapping;
>> +
>> + unhandleable:
>> +    err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE);
>> +
>> + out:
>> +    /* Drop all held references. */
>> +    while ( mfn > hvmemul_ctxt->mfn )
>> +        put_page(mfn_to_page(mfn_x(*mfn--)));
> ITYM
>
>     while ( mfn-- > hvmemul_ctxt->mfn )
>         put_page(mfn_to_page(mfn_x(*mfn)));
>
> or
>
>     while ( mfn > hvmemul_ctxt->mfn )
>         put_page(mfn_to_page(mfn_x(*--mfn)));

Yes, I do.

>
>> +static void hvmemul_unmap_linear_addr(
>> +    void *mapping, unsigned long linear, unsigned int bytes,
> Both vunmap() and unmap_domain_page() take pointers to const, so
> please use const on the pointer here too.

The meaning of const void *p in C is "this function does not modify the
content pointed to by p".

Both vunmap() and unmap_domain_page() mutate the content being pointed
to, so should not take const pointers.

>
>> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
> There upsides and downsides to requiring the caller to pass in the
> same values as to map(): You can do more correctness checking
> here, but you also risk the caller using the wrong values (perhaps
> because of a meanwhile updated local variable). While I don't
> outright object to this approach, personally I'd prefer minimal
> inputs here, and the code deriving everything from hvmemul_ctxt.

I'm not sure exactly how we might wish to extend this logic.  Are we
ever going to want more than one active mapping at once (perhaps rep
movs emulation across two ram regions)?

The other reason is that in the release builds, even if we stored the
pointer in hvmemul_ctxt, we still cant determine which unmapping
function to use without linear and size.

>
>> @@ -1007,23 +1160,15 @@ static int hvmemul_write(
>>           (vio->mmio_gla == (addr & PAGE_MASK)) )
>>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 1);
>>  
>> -    rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
>> -
>> -    switch ( rc )
>> -    {
>> -    case HVMTRANS_okay:
>> -        break;
>> -    case HVMTRANS_bad_linear_to_gfn:
>> -        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>> -        return X86EMUL_EXCEPTION;
>> -    case HVMTRANS_bad_gfn_to_mfn:
>> +    mapping = hvmemul_map_linear_addr(addr, bytes, pfec, hvmemul_ctxt);
>> +    if ( IS_ERR(mapping) )
>> +        return ~PTR_ERR(mapping);
>> +    else if ( !mapping )
> Pointless "else".
>
>>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
> Considering the 2nd linear -> guest-phys translation done here, did
> you consider having hvmemul_map_linear_addr() obtain and provide
> the GFNs?

Paul has some plans which should remove this second entry into the MMIO
handling code.

>
>> --- a/xen/include/asm-x86/hvm/emulate.h
>> +++ b/xen/include/asm-x86/hvm/emulate.h
>> @@ -37,6 +37,13 @@ struct hvm_emulate_ctxt {
>>      unsigned long seg_reg_accessed;
>>      unsigned long seg_reg_dirty;
>>  
>> +    /*
>> +     * MFNs behind temporary mappings in the write callback.  The length is
>> +     * arbitrary, and can be increased if writes longer than PAGE_SIZE are
>> +     * needed.
>> +     */
>> +    mfn_t mfn[2];
> Mind being precise in the comment, saying "PAGE_SIZE+1"?

While that is strictly true, it is not the behaviour which the map()
function takes.  I don't think it is worth the overhead of fixing that
boundary condition for one extra byte, at which point the documentation
should match the implementation.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
  2017-07-03 15:07     ` Andrew Cooper
@ 2017-07-03 16:06       ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-07-03 16:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Mihai Donțu, Paul Durrant, Razvan Cojocaru, Xen-devel

>>> On 03.07.17 at 17:07, <andrew.cooper3@citrix.com> wrote:
> On 22/06/17 10:06, Jan Beulich wrote:
>>> +    {
>>> +        ASSERT_UNREACHABLE();
>>> +        goto unhandleable;
>>> +    }
>>> +
>>> +    do {
>>> +        enum hvm_translation_result res;
>>> +        struct page_info *page;
>>> +        pagefault_info_t pfinfo;
>>> +        p2m_type_t p2mt;
>>> +
>>> +        res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
>>> +                                     &pfinfo, &page, NULL, &p2mt);
>>> +
>>> +        switch ( res )
>>> +        {
>>> +        case HVMTRANS_okay:
>>> +            break;
>>> +
>>> +        case HVMTRANS_bad_linear_to_gfn:
>>> +            x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>>> +            err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
>>> +            goto out;
>>> +
>>> +        case HVMTRANS_bad_gfn_to_mfn:
>>> +            err = NULL;
>>> +            goto out;
>>> +
>>> +        case HVMTRANS_gfn_paged_out:
>>> +        case HVMTRANS_gfn_shared:
>>> +            err = ERR_PTR(~(long)X86EMUL_RETRY);
>>> +            goto out;
>>> +
>>> +        default:
>>> +            goto unhandleable;
>>> +        }
>>> +
>>> +        /* Error checking.  Confirm that the current slot is clean. */
>>> +        ASSERT(mfn_x(*mfn) == 0);
>> Wouldn't this better be done first thing in the loop?
> 
> IMO its clearer to keep it next to the assignment, but yes, could in
> principle move to the top of the loop.
> 
>> And wouldn't the value better be INVALID_MFN?
> 
> The backing array is zeroed by hvm_emulate_init_once(), so relying on 0
> here is more convenient.
> 
> Furthermore, INVALID_MFN is used lower down to poison unused slots, so
> initialising the whole array to INVALID_MFN reduces the effectiveness of
> the checks.

Well, further down I had implicitly asked whether some of the
checks wouldn't better go away (as they can't be carried out
with some of the redundant unmap inputs dropped).

>>> +        *mfn++ = _mfn(page_to_mfn(page));
>>> +        frame++;
>>> +
>>> +        if ( p2m_is_discard_write(p2mt) )
>>> +        {
>>> +            err = ERR_PTR(~(long)X86EMUL_OKAY);
>>> +            goto out;
>> If one page is discard-write and the other isn't, this will end up
>> being wrong.
> 
> Straddled accesses are always a grey area, and discard-write is an extra
> special case which only exists inside Xen.  Discard-write means that the
> guest knows that it shouldn't write there at all.

Is it the case that the guest knows? Iirc this type had been
introduced for introspection tool use.

> Doing nothing (by logically extending the discard-write restriction over
> the entire region) is the least bad option here, IMO.

Okay, that's a reasonable argument. I'd suggest saying so
explicitly in a comment, though.

>>> +static void hvmemul_unmap_linear_addr(
>>> +    void *mapping, unsigned long linear, unsigned int bytes,
>> Both vunmap() and unmap_domain_page() take pointers to const, so
>> please use const on the pointer here too.
> 
> The meaning of const void *p in C is "this function does not modify the
> content pointed to by p".
> 
> Both vunmap() and unmap_domain_page() mutate the content being pointed
> to, so should not take const pointers.

We've had this discussion before, and I continue to take a
different position: When you free a memory block, you implicitly
declare its contents undefined. An undefined modification to
undefined contents still yields undefined. So no actual change.
Furthermore it is not a given that a freeing routine actually
touches the handed memory block at all - that's an internal
implementation detail of the allocator.

Plus - not having the parameters const means you can't
allocate and initialize something in one go, and then store or
pass around a pointer to it which is const-qualified to make
clear the contents of that memory block aren't supposed to be
further modified (until de-allocation).

>>> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
>> There upsides and downsides to requiring the caller to pass in the
>> same values as to map(): You can do more correctness checking
>> here, but you also risk the caller using the wrong values (perhaps
>> because of a meanwhile updated local variable). While I don't
>> outright object to this approach, personally I'd prefer minimal
>> inputs here, and the code deriving everything from hvmemul_ctxt.
> 
> I'm not sure exactly how we might wish to extend this logic.  Are we
> ever going to want more than one active mapping at once (perhaps rep
> movs emulation across two ram regions)?

I could imagine even more than two regions - from an abstract
perspective it may be possible / helpful to have a mapping of
each piece of memory a single instruction accesses, along the
lines of minimal number of architecturally guaranteed TLB
entries on ia64 in order to execute any possible x86 insn.

> The other reason is that in the release builds, even if we stored the
> pointer in hvmemul_ctxt, we still cant determine which unmapping
> function to use without linear and size.

I don't understand - all you pass on is "mapping". And whether to
unmap a single or two pages could be told from hvmemul_ctxt->mfn[1]
(not) being INVALID_MFN.

>>> --- a/xen/include/asm-x86/hvm/emulate.h
>>> +++ b/xen/include/asm-x86/hvm/emulate.h
>>> @@ -37,6 +37,13 @@ struct hvm_emulate_ctxt {
>>>      unsigned long seg_reg_accessed;
>>>      unsigned long seg_reg_dirty;
>>>  
>>> +    /*
>>> +     * MFNs behind temporary mappings in the write callback.  The length is
>>> +     * arbitrary, and can be increased if writes longer than PAGE_SIZE are
>>> +     * needed.
>>> +     */
>>> +    mfn_t mfn[2];
>> Mind being precise in the comment, saying "PAGE_SIZE+1"?
> 
> While that is strictly true, it is not the behaviour which the map()
> function takes.  I don't think it is worth the overhead of fixing that
> boundary condition for one extra byte, at which point the documentation
> should match the implementation.

I don't understand your reply - with today's map()
implementation, any PAGE_SIZE+1 range can be mapped, with
the extremes being one mapping just one byte and the other
mapping an entire page. But perhaps I'm simply not understanding
what map() behavior you're thinking of.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
       [not found]         ` <595A879202000078001680B7@prv-?= =?UTF-8?Q?mh.provo.novell.com>
@ 2017-07-03 17:24           ` Andrew Cooper
  2017-07-04  7:10             ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-07-03 17:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Mihai Donțu, Paul Durrant, Razvan Cojocaru, Xen-devel

On 03/07/17 17:06, Jan Beulich wrote:
>>>> On 03.07.17 at 17:07, <andrew.cooper3@citrix.com> wrote:
>> On 22/06/17 10:06, Jan Beulich wrote:
>>>> +    {
>>>> +        ASSERT_UNREACHABLE();
>>>> +        goto unhandleable;
>>>> +    }
>>>> +
>>>> +    do {
>>>> +        enum hvm_translation_result res;
>>>> +        struct page_info *page;
>>>> +        pagefault_info_t pfinfo;
>>>> +        p2m_type_t p2mt;
>>>> +
>>>> +        res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec,
>>>> +                                     &pfinfo, &page, NULL, &p2mt);
>>>> +
>>>> +        switch ( res )
>>>> +        {
>>>> +        case HVMTRANS_okay:
>>>> +            break;
>>>> +
>>>> +        case HVMTRANS_bad_linear_to_gfn:
>>>> +            x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>>>> +            err = ERR_PTR(~(long)X86EMUL_EXCEPTION);
>>>> +            goto out;
>>>> +
>>>> +        case HVMTRANS_bad_gfn_to_mfn:
>>>> +            err = NULL;
>>>> +            goto out;
>>>> +
>>>> +        case HVMTRANS_gfn_paged_out:
>>>> +        case HVMTRANS_gfn_shared:
>>>> +            err = ERR_PTR(~(long)X86EMUL_RETRY);
>>>> +            goto out;
>>>> +
>>>> +        default:
>>>> +            goto unhandleable;
>>>> +        }
>>>> +
>>>> +        /* Error checking.  Confirm that the current slot is clean. */
>>>> +        ASSERT(mfn_x(*mfn) == 0);
>>> Wouldn't this better be done first thing in the loop?
>> IMO its clearer to keep it next to the assignment, but yes, could in
>> principle move to the top of the loop.
>>
>>> And wouldn't the value better be INVALID_MFN?
>> The backing array is zeroed by hvm_emulate_init_once(), so relying on 0
>> here is more convenient.
>>
>> Furthermore, INVALID_MFN is used lower down to poison unused slots, so
>> initialising the whole array to INVALID_MFN reduces the effectiveness of
>> the checks.
> Well, further down I had implicitly asked whether some of the
> checks wouldn't better go away (as they can't be carried out
> with some of the redundant unmap inputs dropped).
>
>>>> +        *mfn++ = _mfn(page_to_mfn(page));
>>>> +        frame++;
>>>> +
>>>> +        if ( p2m_is_discard_write(p2mt) )
>>>> +        {
>>>> +            err = ERR_PTR(~(long)X86EMUL_OKAY);
>>>> +            goto out;
>>> If one page is discard-write and the other isn't, this will end up
>>> being wrong.
>> Straddled accesses are always a grey area, and discard-write is an extra
>> special case which only exists inside Xen.  Discard-write means that the
>> guest knows that it shouldn't write there at all.
> Is it the case that the guest knows? Iirc this type had been
> introduced for introspection tool use.

This is for Intel GVT-g (c/s 1c02cce0ed).

Introspection would go very wrong if selective writes had no effect.

>
>> Doing nothing (by logically extending the discard-write restriction over
>> the entire region) is the least bad option here, IMO.
> Okay, that's a reasonable argument. I'd suggest saying so
> explicitly in a comment, though.

I will add a comment.

>
>>>> +static void hvmemul_unmap_linear_addr(
>>>> +    void *mapping, unsigned long linear, unsigned int bytes,
>>> Both vunmap() and unmap_domain_page() take pointers to const, so
>>> please use const on the pointer here too.
>> The meaning of const void *p in C is "this function does not modify the
>> content pointed to by p".
>>
>> Both vunmap() and unmap_domain_page() mutate the content being pointed
>> to, so should not take const pointers.
> We've had this discussion before, and I continue to take a
> different position: When you free a memory block, you implicitly
> declare its contents undefined.

In C's view, the object explicitly becomes undefined, as its lifetime
has come to an end, but this is a well defined operation.

> An undefined modification to undefined contents still yields undefined.

The precondition to this statement is false.

Before the unmap() call, the object and its contents are well defined.

By using a function with a const pointer, the contract of the function
says the object doesn't change, and therefore, its content doesn't change.

Therefore, the object and its contents are still well defined after the
call.


This final statement is obviously false, because we blew away the
pagetables under the mapping.  The contradiction exists because of the
use of const pointer in the first place.

>  So no actual change.
> Furthermore it is not a given that a freeing routine actually
> touches the handed memory block at all - that's an internal
> implementation detail of the allocator.
>
> Plus - not having the parameters const means you can't
> allocate and initialize something in one go, and then store or
> pass around a pointer to it which is const-qualified to make
> clear the contents of that memory block aren't supposed to be
> further modified (until de-allocation).

I don't see this as a bad thing.  You can't do it with malloc()/free().

>
>>>> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
>>> There upsides and downsides to requiring the caller to pass in the
>>> same values as to map(): You can do more correctness checking
>>> here, but you also risk the caller using the wrong values (perhaps
>>> because of a meanwhile updated local variable). While I don't
>>> outright object to this approach, personally I'd prefer minimal
>>> inputs here, and the code deriving everything from hvmemul_ctxt.
>> I'm not sure exactly how we might wish to extend this logic.  Are we
>> ever going to want more than one active mapping at once (perhaps rep
>> movs emulation across two ram regions)?
> I could imagine even more than two regions - from an abstract
> perspective it may be possible / helpful to have a mapping of
> each piece of memory a single instruction accesses, along the
> lines of minimal number of architecturally guaranteed TLB
> entries on ia64 in order to execute any possible x86 insn.

A further problem I've just realised is the use of multiple ->write()
calls for a single instruction in x86_emulate().  The preceding writes
obviously can't get backed out in the case of the second write taking a
fault, which will cause the emulator to exhibit similar
non-architectural behaviour as this patch is trying to fix.

>
>> The other reason is that in the release builds, even if we stored the
>> pointer in hvmemul_ctxt, we still cant determine which unmapping
>> function to use without linear and size.
> I don't understand - all you pass on is "mapping". And whether to
> unmap a single or two pages could be told from hvmemul_ctxt->mfn[1]
> (not) being INVALID_MFN.

That isn't safe outside of the debug builds.

You could in principle use mfn[1] != 0 in release builds, if it weren't
for the fact that hvmemul_ctxt could be used for multiple ->write()
calls.  In the case of a straddled write followed by a non-straddled
write, the second unmap() would evaluate incorrectly.

>
>>>> --- a/xen/include/asm-x86/hvm/emulate.h
>>>> +++ b/xen/include/asm-x86/hvm/emulate.h
>>>> @@ -37,6 +37,13 @@ struct hvm_emulate_ctxt {
>>>>      unsigned long seg_reg_accessed;
>>>>      unsigned long seg_reg_dirty;
>>>>  
>>>> +    /*
>>>> +     * MFNs behind temporary mappings in the write callback.  The length is
>>>> +     * arbitrary, and can be increased if writes longer than PAGE_SIZE are
>>>> +     * needed.
>>>> +     */
>>>> +    mfn_t mfn[2];
>>> Mind being precise in the comment, saying "PAGE_SIZE+1"?
>> While that is strictly true, it is not the behaviour which the map()
>> function takes.  I don't think it is worth the overhead of fixing that
>> boundary condition for one extra byte, at which point the documentation
>> should match the implementation.
> I don't understand your reply - with today's map()
> implementation, any PAGE_SIZE+1 range can be mapped, with
> the extremes being one mapping just one byte and the other
> mapping an entire page. But perhaps I'm simply not understanding
> what map() behavior you're thinking of.

Oh so it can.  I'd forgotten that I'd changed the implementation.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings
  2017-07-03 17:24           ` Andrew Cooper
@ 2017-07-04  7:10             ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-07-04  7:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Mihai Donțu, Paul Durrant, Razvan Cojocaru, Xen-devel

>>> On 03.07.17 at 19:24, <andrew.cooper3@citrix.com> wrote:
> On 03/07/17 17:06, Jan Beulich wrote:
>>>>> On 03.07.17 at 17:07, <andrew.cooper3@citrix.com> wrote:
>>> On 22/06/17 10:06, Jan Beulich wrote:
>>>>> +        *mfn++ = _mfn(page_to_mfn(page));
>>>>> +        frame++;
>>>>> +
>>>>> +        if ( p2m_is_discard_write(p2mt) )
>>>>> +        {
>>>>> +            err = ERR_PTR(~(long)X86EMUL_OKAY);
>>>>> +            goto out;
>>>> If one page is discard-write and the other isn't, this will end up
>>>> being wrong.
>>> Straddled accesses are always a grey area, and discard-write is an extra
>>> special case which only exists inside Xen.  Discard-write means that the
>>> guest knows that it shouldn't write there at all.
>> Is it the case that the guest knows? Iirc this type had been
>> introduced for introspection tool use.
> 
> This is for Intel GVT-g (c/s 1c02cce0ed).

Oh, I've mixed this up with the hvmemul_write_discard() & Co,
which ...

> Introspection would go very wrong if selective writes had no effect.

... are introspection specific, so I'm afraid you're not entirely right
with this statement.

>> Plus - not having the parameters const means you can't
>> allocate and initialize something in one go, and then store or
>> pass around a pointer to it which is const-qualified to make
>> clear the contents of that memory block aren't supposed to be
>> further modified (until de-allocation).
> 
> I don't see this as a bad thing.  You can't do it with malloc()/free().

And wrongly so, imo.

>>>>> +    struct hvm_emulate_ctxt *hvmemul_ctxt)
>>>> There upsides and downsides to requiring the caller to pass in the
>>>> same values as to map(): You can do more correctness checking
>>>> here, but you also risk the caller using the wrong values (perhaps
>>>> because of a meanwhile updated local variable). While I don't
>>>> outright object to this approach, personally I'd prefer minimal
>>>> inputs here, and the code deriving everything from hvmemul_ctxt.
>>> I'm not sure exactly how we might wish to extend this logic.  Are we
>>> ever going to want more than one active mapping at once (perhaps rep
>>> movs emulation across two ram regions)?
>> I could imagine even more than two regions - from an abstract
>> perspective it may be possible / helpful to have a mapping of
>> each piece of memory a single instruction accesses, along the
>> lines of minimal number of architecturally guaranteed TLB
>> entries on ia64 in order to execute any possible x86 insn.
> 
> A further problem I've just realised is the use of multiple ->write()
> calls for a single instruction in x86_emulate().  The preceding writes
> obviously can't get backed out in the case of the second write taking a
> fault, which will cause the emulator to exhibit similar
> non-architectural behaviour as this patch is trying to fix.

I'm not convinced hardware guarantees all cases where we use
multiple writes to be carried out in one step. I'm relatively sure
multiple pushes (for far calls) are being carried out one by one.
Just that exception checks likely are being done once up front
(albeit I'm less certain about #PF, whereas for all segment
related things this would seem natural). Same for SGDT/SIDT,
which I think are the only non-push multi-write insns.

And of course the same issue exists for multiple segment register
writes.

>>> The other reason is that in the release builds, even if we stored the
>>> pointer in hvmemul_ctxt, we still cant determine which unmapping
>>> function to use without linear and size.
>> I don't understand - all you pass on is "mapping". And whether to
>> unmap a single or two pages could be told from hvmemul_ctxt->mfn[1]
>> (not) being INVALID_MFN.
> 
> That isn't safe outside of the debug builds.
> 
> You could in principle use mfn[1] != 0 in release builds, if it weren't
> for the fact that hvmemul_ctxt could be used for multiple ->write()
> calls.  In the case of a straddled write followed by a non-straddled
> write, the second unmap() would evaluate incorrectly.

I'd of course imply unmap() to set ->mfn[] to INVALID_MFN.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-07-04  7:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-21 15:12 [PATCH 0/6] Various XSA followups Andrew Cooper
2017-06-21 15:12 ` [PATCH 1/6] x86/hvm: Fixes to hvmemul_insn_fetch() Andrew Cooper
2017-06-21 16:04   ` Paul Durrant
2017-06-21 16:15     ` Andrew Cooper
2017-06-21 16:49       ` Paul Durrant
2017-06-22  8:05   ` Jan Beulich
2017-06-21 15:12 ` [PATCH 2/6] x86/shadow: Fixes to hvm_emulate_insn_fetch() Andrew Cooper
2017-06-22  8:09   ` Jan Beulich
2017-06-22 11:52   ` Tim Deegan
2017-06-21 15:12 ` [PATCH 3/6] x86/shadow: Use ERR_PTR infrastructure for sh_emulate_map_dest() Andrew Cooper
2017-06-22  8:14   ` Jan Beulich
2017-06-22  8:21     ` Andrew Cooper
2017-06-22  9:07       ` Jan Beulich
2017-07-03 14:22         ` Andrew Cooper
2017-07-03 14:30           ` Jan Beulich
2017-06-22 12:09   ` Tim Deegan
2017-06-22 12:46     ` Andrew Cooper
2017-06-21 15:12 ` [PATCH 4/6] [RFC] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result Andrew Cooper
2017-06-22  8:19   ` Jan Beulich
2017-06-21 15:12 ` [PATCH 5/6] x86/hvm: Break out __hvm_copy()'s translation logic Andrew Cooper
2017-06-22  8:30   ` Jan Beulich
2017-06-21 15:12 ` [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings Andrew Cooper
2017-06-21 16:19   ` Paul Durrant
2017-06-22  9:06   ` Jan Beulich
2017-07-03 15:07     ` Andrew Cooper
2017-07-03 16:06       ` Jan Beulich
     [not found]     ` <594BA4A3?= =?UTF-8?Q?0200007800165AA5@prv=ef=bf=bdmh.provo.novell.com>
     [not found]       ` <b9e9b637-0755-?= =?UTF-8?Q?a1bd-99c7-44ad3f13b5a4@citrix.com>
     [not found]         ` <595A879202000078001680B7@prv-?= =?UTF-8?Q?mh.provo.novell.com>
2017-07-03 17:24           ` Andrew Cooper
2017-07-04  7:10             ` Jan Beulich
     [not found] <1498057952-13556-1-git-send-email-andrew.cooper3@citr?= =?UTF-8?Q?ix.com>

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.