All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.9 v2 00/19] XSA-191 followup
@ 2016-11-28 11:13 Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary Andrew Cooper
                   ` (18 more replies)
  0 siblings, 19 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This is the quantity of changes required to fix some edgecases in XSA-191
which were ultimately chosen not to go out in the security fix.  The main
purpose of this series is to fix emulation sufficiently to allow patch 19 to
avoid opencoding all of the segmenation logic.

This version of the patches has had some testing in XenServers test
infrastructure and nothing appears to have blown up.

Changes from v1:
  * Rework of the safey surrounding software interrupt injection
  * 5 new patches (1-2,6,12-13)
  * Fix singlestep handling for emulated pagetable/mmcfg writes

Andrew Cooper (19):
  x86/shadow: Fix #PFs from emulated writes crossing a page boundary
  x86/emul: Drop X86EMUL_CMPXCHG_FAILED
  x86/emul: Simplfy emulation state setup
  x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure
  x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC
  x86/pv: Implement pv_inject_{event,page_fault,hw_exception}()
  x86/emul: Remove opencoded exception generation
  x86/emul: Rework emulator event injection
  x86/vmx: Use hvm_{get,set}_segment_register() rather than vmx_{get,set}_segment_register()
  x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS
  x86/emul: Avoid raising faults behind the emulators back
  x86/pv: Avoid raising faults behind the emulators back
  x86/shadow: Avoid raising faults behind the emulators back
  x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer
  x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info
  x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()
  x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  x86/hvm: Prepare to allow use of system segments for memory references
  x86/hvm: Use system-segment relative memory accesses

 tools/tests/x86_emulator/test_x86_emulator.c |   1 +
 xen/arch/x86/hvm/emulate.c                   | 151 ++++-------
 xen/arch/x86/hvm/hvm.c                       | 367 +++++++++++++++++++--------
 xen/arch/x86/hvm/io.c                        |   4 +-
 xen/arch/x86/hvm/nestedhvm.c                 |   2 +-
 xen/arch/x86/hvm/svm/nestedsvm.c             |  13 +-
 xen/arch/x86/hvm/svm/svm.c                   | 100 +++-----
 xen/arch/x86/hvm/vmx/intr.c                  |   2 +-
 xen/arch/x86/hvm/vmx/realmode.c              |  16 +-
 xen/arch/x86/hvm/vmx/vmx.c                   | 109 ++++----
 xen/arch/x86/hvm/vmx/vvmx.c                  |  44 ++--
 xen/arch/x86/mm.c                            |  92 +++++--
 xen/arch/x86/mm/shadow/common.c              |  40 +--
 xen/arch/x86/mm/shadow/multi.c               |  87 ++++++-
 xen/arch/x86/traps.c                         | 172 ++++++-------
 xen/arch/x86/x86_emulate/x86_emulate.c       | 333 +++++++++++++-----------
 xen/arch/x86/x86_emulate/x86_emulate.h       | 189 +++++++++++---
 xen/include/asm-x86/desc.h                   |   6 +
 xen/include/asm-x86/domain.h                 |  26 ++
 xen/include/asm-x86/hvm/emulate.h            |   3 -
 xen/include/asm-x86/hvm/hvm.h                |  86 +++----
 xen/include/asm-x86/hvm/support.h            |  42 ++-
 xen/include/asm-x86/hvm/svm/nestedsvm.h      |   6 +-
 xen/include/asm-x86/hvm/vcpu.h               |   2 +-
 xen/include/asm-x86/hvm/vmx/vmx.h            |   2 -
 xen/include/asm-x86/hvm/vmx/vvmx.h           |   4 +-
 xen/include/asm-x86/mm.h                     |   1 -
 27 files changed, 1167 insertions(+), 733 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:55   ` Tim Deegan
  2016-11-29 15:24   ` Jan Beulich
  2016-11-28 11:13 ` [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED Andrew Cooper
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

When translating the second frame of a write crossing a page boundary, mask
the linear address down to the page boundary.

This causes the correct %cr2 being reported to the guest in the case that the
second frame suffers a pagefault during translation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm/shadow/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index ced2313..7e5b8b0 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1808,7 +1808,8 @@ void *sh_emulate_map_dest(struct vcpu *v, unsigned long vaddr,
     else
     {
         /* This write crosses a page boundary. Translate the second page. */
-        sh_ctxt->mfn[1] = emulate_gva_to_mfn(v, vaddr + bytes - 1, sh_ctxt);
+        sh_ctxt->mfn[1] = emulate_gva_to_mfn(
+            v, (vaddr + bytes - 1) & PAGE_MASK, sh_ctxt);
         if ( !mfn_valid(sh_ctxt->mfn[1]) )
             return ((mfn_x(sh_ctxt->mfn[1]) == BAD_GVA_TO_GFN) ?
                     MAPPING_EXCEPTION :
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:55   ` Tim Deegan
  2016-11-29 15:29   ` Jan Beulich
  2016-11-28 11:13 ` [PATCH v2 03/19] x86/emul: Simplfy emulation state setup Andrew Cooper
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

X86EMUL_CMPXCHG_FAILED was introduced in c/s d430aae25 in 2005.  Even at the
time it alised what is now X86EMUL_RETRY (as well as what is now
X86EMUL_EXCEPTION).  I am not sure why the distinction was considered useful
at the time.

It is only used twice; there is no need to call it out differently from other
uses of X86EMUL_RETRY.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm.c                      | 2 +-
 xen/arch/x86/mm/shadow/multi.c         | 2 +-
 xen/arch/x86/x86_emulate/x86_emulate.h | 2 --
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 03dcd71..5b0e9f3 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5254,7 +5254,7 @@ static int ptwr_emulated_update(
         {
             unmap_domain_page(pl1e);
             put_page_from_l1e(nl1e, d);
-            return X86EMUL_CMPXCHG_FAILED;
+            return X86EMUL_RETRY;
         }
     }
     else
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index d70b1c6..9ee48a8 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4694,7 +4694,7 @@ sh_x86_emulate_cmpxchg(struct vcpu *v, unsigned long vaddr,
     }
 
     if ( prev != old )
-        rv = X86EMUL_CMPXCHG_FAILED;
+        rv = X86EMUL_RETRY;
 
     SHADOW_DEBUG(EMULATE, "va %#lx was %#lx expected %#lx"
                   " wanted %#lx now %#lx bytes %u\n",
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 993c576..ec824ce 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -109,8 +109,6 @@ struct __attribute__((__packed__)) segment_register {
 #define X86EMUL_EXCEPTION      2
  /* Retry the emulation for some reason. No state modified. */
 #define X86EMUL_RETRY          3
- /* (cmpxchg accessor): CMPXCHG failed. Maps to X86EMUL_RETRY in caller. */
-#define X86EMUL_CMPXCHG_FAILED 3
 
 /* FPU sub-types which may be requested via ->get_fpu(). */
 enum x86_emulate_fpu_type {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 03/19] x86/emul: Simplfy emulation state setup
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:58   ` Paul Durrant
  2016-11-28 12:54   ` Paul Durrant
  2016-11-28 11:13 ` [PATCH v2 04/19] x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure Andrew Cooper
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Paul Durrant

The current code to set up emulation state is ad-hoc and error prone.

 * Consistently zero all emulation state structures.
 * Avoid explicitly initialising some state to 0.
 * Explicitly identify all input and output state in x86_emulate_ctxt.  This
   involves rearanging some fields.
 * Have x86_decode() explicitly initalise all output state at its start.

While making the above changes, two minor tweaks:

 * Move the calculation of hvmemul_ctxt->ctxt.swint_emulate from
   _hvm_emulate_one() to hvm_emulate_init_once().  It doesn't need
   recalculating for each instruction.
 * Change force_writeback to being a boolean, to match its use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>

v2:
 * Split x86_emulate_ctxt into three sections
---
 xen/arch/x86/hvm/emulate.c             | 28 +++++++++++++++-------------
 xen/arch/x86/mm.c                      | 14 ++++++++------
 xen/arch/x86/mm/shadow/common.c        |  4 ++--
 xen/arch/x86/x86_emulate/x86_emulate.c |  1 +
 xen/arch/x86/x86_emulate/x86_emulate.h | 32 ++++++++++++++++++++++----------
 5 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f1f6e2f..3efeead 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1770,13 +1770,6 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
 
     vio->mmio_retry = 0;
 
-    if ( cpu_has_vmx )
-        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
-    else if ( cpu_has_svm_nrips )
-        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
-    else
-        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
-
     rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
 
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
@@ -1947,14 +1940,23 @@ void hvm_emulate_init_once(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
     struct cpu_user_regs *regs)
 {
-    hvmemul_ctxt->intr_shadow = hvm_funcs.get_interrupt_shadow(current);
-    hvmemul_ctxt->ctxt.regs = regs;
-    hvmemul_ctxt->ctxt.force_writeback = 1;
-    hvmemul_ctxt->seg_reg_accessed = 0;
-    hvmemul_ctxt->seg_reg_dirty = 0;
-    hvmemul_ctxt->set_context = 0;
+    struct vcpu *curr = current;
+
+    memset(hvmemul_ctxt, 0, sizeof(*hvmemul_ctxt));
+
+    hvmemul_ctxt->intr_shadow = hvm_funcs.get_interrupt_shadow(curr);
     hvmemul_get_seg_reg(x86_seg_cs, hvmemul_ctxt);
     hvmemul_get_seg_reg(x86_seg_ss, hvmemul_ctxt);
+
+    hvmemul_ctxt->ctxt.regs = regs;
+    hvmemul_ctxt->ctxt.force_writeback = true;
+
+    if ( cpu_has_vmx )
+        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
+    else if ( cpu_has_svm_nrips )
+        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
+    else
+        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
 }
 
 void hvm_emulate_init_per_insn(
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5b0e9f3..d365f59 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5337,7 +5337,14 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
     struct domain *d = v->domain;
     struct page_info *page;
     l1_pgentry_t      pte;
-    struct ptwr_emulate_ctxt ptwr_ctxt;
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .ctxt = {
+            .regs = regs,
+            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .swint_emulate = x86_swint_emulate_none,
+        },
+    };
     int rc;
 
     /* Attempt to read the PTE that maps the VA being accessed. */
@@ -5363,11 +5370,6 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
         goto bail;
     }
 
-    ptwr_ctxt.ctxt.regs = regs;
-    ptwr_ctxt.ctxt.force_writeback = 0;
-    ptwr_ctxt.ctxt.addr_size = ptwr_ctxt.ctxt.sp_size =
-        is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG;
-    ptwr_ctxt.ctxt.swint_emulate = x86_swint_emulate_none;
     ptwr_ctxt.cr2 = addr;
     ptwr_ctxt.pte = pte;
 
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 7e5b8b0..a4a3c4b 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -385,8 +385,9 @@ const struct x86_emulate_ops *shadow_init_emulation(
     struct vcpu *v = current;
     unsigned long addr;
 
+    memset(sh_ctxt, 0, sizeof(*sh_ctxt));
+
     sh_ctxt->ctxt.regs = regs;
-    sh_ctxt->ctxt.force_writeback = 0;
     sh_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
 
     if ( is_pv_vcpu(v) )
@@ -396,7 +397,6 @@ const struct x86_emulate_ops *shadow_init_emulation(
     }
 
     /* Segment cache initialisation. Primed with CS. */
-    sh_ctxt->valid_seg_regs = 0;
     creg = hvm_get_seg_reg(x86_seg_cs, sh_ctxt);
 
     /* Work out the emulation mode. */
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index d82e85d..532bd32 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1904,6 +1904,7 @@ x86_decode(
     state->regs = ctxt->regs;
     state->eip = ctxt->regs->eip;
 
+    /* Initialise output state in x86_emulate_ctxt */
     ctxt->retire.byte = 0;
 
     op_bytes = def_op_bytes = ad_bytes = def_ad_bytes = ctxt->addr_size/8;
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index ec824ce..ab566c0 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -410,6 +410,23 @@ struct cpu_user_regs;
 
 struct x86_emulate_ctxt
 {
+    /*
+     * Input-only state:
+     */
+
+    /* Software event injection support. */
+    enum x86_swint_emulation swint_emulate;
+
+    /* Set this if writes may have side effects. */
+    bool force_writeback;
+
+    /* Caller data that can be used by x86_emulate_ops' routines. */
+    void *data;
+
+    /*
+     * Input/output state:
+     */
+
     /* Register state before/after emulation. */
     struct cpu_user_regs *regs;
 
@@ -419,14 +436,12 @@ struct x86_emulate_ctxt
     /* Stack pointer width in bits (16, 32 or 64). */
     unsigned int sp_size;
 
-    /* Canonical opcode (see below). */
-    unsigned int opcode;
-
-    /* Software event injection support. */
-    enum x86_swint_emulation swint_emulate;
+    /*
+     * Output-only state:
+     */
 
-    /* Set this if writes may have side effects. */
-    uint8_t force_writeback;
+    /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
+    unsigned int opcode;
 
     /* Retirement state, set by the emulator (valid only on X86EMUL_OKAY). */
     union {
@@ -437,9 +452,6 @@ struct x86_emulate_ctxt
         } flags;
         uint8_t byte;
     } retire;
-
-    /* Caller data that can be used by x86_emulate_ops' routines. */
-    void *data;
 };
 
 /*
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 04/19] x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (2 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 03/19] x86/emul: Simplfy emulation state setup Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 05/19] x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC Andrew Cooper
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

The x86 emulator needs to gain an understanding of interrupts and exceptions
generated by its actions.  The naming choice is to match both the Intel and
AMD terms, and to avoid 'trap' specifically as it has an architectural meaning
different to its current usage.

While making this change, make other changes for consistency

 * Rename *_trap() infrastructure to *_event()
 * Rename trapnr/trap parameters to vector
 * Convert hvm_inject_hw_exception() and hvm_inject_page_fault() to being
   static inlines, as they are only thin wrappers around hvm_inject_event()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/emulate.c              |  6 +--
 xen/arch/x86/hvm/hvm.c                  | 33 ++++------------
 xen/arch/x86/hvm/io.c                   |  2 +-
 xen/arch/x86/hvm/svm/nestedsvm.c        |  7 ++--
 xen/arch/x86/hvm/svm/svm.c              | 62 ++++++++++++++---------------
 xen/arch/x86/hvm/vmx/vmx.c              | 66 +++++++++++++++----------------
 xen/arch/x86/hvm/vmx/vvmx.c             | 11 +++---
 xen/arch/x86/x86_emulate/x86_emulate.c  | 11 ++++++
 xen/arch/x86/x86_emulate/x86_emulate.h  | 22 +++++++++++
 xen/include/asm-x86/hvm/emulate.h       |  2 +-
 xen/include/asm-x86/hvm/hvm.h           | 69 ++++++++++++++++-----------------
 xen/include/asm-x86/hvm/svm/nestedsvm.h |  6 +--
 xen/include/asm-x86/hvm/vcpu.h          |  2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h      |  4 +-
 14 files changed, 159 insertions(+), 144 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 3efeead..bb26d40 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1679,7 +1679,7 @@ static int hvmemul_invlpg(
          * violations, so squash them.
          */
         hvmemul_ctxt->exn_pending = 0;
-        hvmemul_ctxt->trap = (struct hvm_trap){};
+        hvmemul_ctxt->trap = (struct x86_event){};
         rc = X86EMUL_OKAY;
     }
 
@@ -1869,7 +1869,7 @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
         break;
     case X86EMUL_EXCEPTION:
         if ( ctxt.exn_pending )
-            hvm_inject_trap(&ctxt.trap);
+            hvm_inject_event(&ctxt.trap);
         /* fallthrough */
     default:
         hvm_emulate_writeback(&ctxt);
@@ -1929,7 +1929,7 @@ void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr,
         break;
     case X86EMUL_EXCEPTION:
         if ( ctx.exn_pending )
-            hvm_inject_trap(&ctx.trap);
+            hvm_inject_event(&ctx.trap);
         break;
     }
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 25dc759..7b434aa 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -535,7 +535,7 @@ void hvm_do_resume(struct vcpu *v)
     /* Inject pending hw/sw trap */
     if ( v->arch.hvm_vcpu.inject_trap.vector != -1 )
     {
-        hvm_inject_trap(&v->arch.hvm_vcpu.inject_trap);
+        hvm_inject_event(&v->arch.hvm_vcpu.inject_trap);
         v->arch.hvm_vcpu.inject_trap.vector = -1;
     }
 }
@@ -1676,19 +1676,19 @@ void hvm_triple_fault(void)
     domain_shutdown(d, reason);
 }
 
-void hvm_inject_trap(const struct hvm_trap *trap)
+void hvm_inject_event(const struct x86_event *event)
 {
     struct vcpu *curr = current;
 
     if ( nestedhvm_enabled(curr->domain) &&
          !nestedhvm_vmswitch_in_progress(curr) &&
          nestedhvm_vcpu_in_guestmode(curr) &&
-         nhvm_vmcx_guest_intercepts_trap(
-             curr, trap->vector, trap->error_code) )
+         nhvm_vmcx_guest_intercepts_event(
+             curr, event->vector, event->error_code) )
     {
         enum nestedhvm_vmexits nsret;
 
-        nsret = nhvm_vcpu_vmexit_trap(curr, trap);
+        nsret = nhvm_vcpu_vmexit_event(curr, event);
 
         switch ( nsret )
         {
@@ -1704,26 +1704,7 @@ void hvm_inject_trap(const struct hvm_trap *trap)
         }
     }
 
-    hvm_funcs.inject_trap(trap);
-}
-
-void hvm_inject_hw_exception(unsigned int trapnr, int errcode)
-{
-    struct hvm_trap trap = {
-        .vector = trapnr,
-        .type = X86_EVENTTYPE_HW_EXCEPTION,
-        .error_code = errcode };
-    hvm_inject_trap(&trap);
-}
-
-void hvm_inject_page_fault(int errcode, unsigned long cr2)
-{
-    struct hvm_trap trap = {
-        .vector = TRAP_page_fault,
-        .type = X86_EVENTTYPE_HW_EXCEPTION,
-        .error_code = errcode,
-        .cr2 = cr2 };
-    hvm_inject_trap(&trap);
+    hvm_funcs.inject_event(event);
 }
 
 int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
@@ -4096,7 +4077,7 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
         break;
     case X86EMUL_EXCEPTION:
         if ( ctxt.exn_pending )
-            hvm_inject_trap(&ctxt.trap);
+            hvm_inject_event(&ctxt.trap);
         /* fall through */
     default:
         hvm_emulate_writeback(&ctxt);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 7305801..1279f68 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -103,7 +103,7 @@ int handle_mmio(void)
         return 0;
     case X86EMUL_EXCEPTION:
         if ( ctxt.exn_pending )
-            hvm_inject_trap(&ctxt.trap);
+            hvm_inject_event(&ctxt.trap);
         break;
     default:
         break;
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index f9b38ab..b6b8526 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -821,7 +821,7 @@ nsvm_vcpu_vmexit_inject(struct vcpu *v, struct cpu_user_regs *regs,
 }
 
 int
-nsvm_vcpu_vmexit_trap(struct vcpu *v, const struct hvm_trap *trap)
+nsvm_vcpu_vmexit_event(struct vcpu *v, const struct x86_event *trap)
 {
     ASSERT(vcpu_nestedhvm(v).nv_vvmcx != NULL);
 
@@ -994,10 +994,11 @@ nsvm_vmcb_guest_intercepts_exitcode(struct vcpu *v,
 }
 
 bool_t
-nsvm_vmcb_guest_intercepts_trap(struct vcpu *v, unsigned int trapnr, int errcode)
+nsvm_vmcb_guest_intercepts_event(
+    struct vcpu *v, unsigned int vector, int errcode)
 {
     return nsvm_vmcb_guest_intercepts_exitcode(v,
-        guest_cpu_user_regs(), VMEXIT_EXCEPTION_DE + trapnr);
+        guest_cpu_user_regs(), VMEXIT_EXCEPTION_DE + vector);
 }
 
 static int
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 37bd6c4..caab5ce 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1203,15 +1203,15 @@ static void svm_vcpu_destroy(struct vcpu *v)
     passive_domain_destroy(v);
 }
 
-static void svm_inject_trap(const struct hvm_trap *trap)
+static void svm_inject_event(const struct x86_event *event)
 {
     struct vcpu *curr = current;
     struct vmcb_struct *vmcb = curr->arch.hvm_svm.vmcb;
-    eventinj_t event = vmcb->eventinj;
-    struct hvm_trap _trap = *trap;
+    eventinj_t eventinj = vmcb->eventinj;
+    struct x86_event _event = *event;
     const struct cpu_user_regs *regs = guest_cpu_user_regs();
 
-    switch ( _trap.vector )
+    switch ( _event.vector )
     {
     case TRAP_debug:
         if ( regs->eflags & X86_EFLAGS_TF )
@@ -1229,21 +1229,21 @@ static void svm_inject_trap(const struct hvm_trap *trap)
         }
     }
 
-    if ( unlikely(event.fields.v) &&
-         (event.fields.type == X86_EVENTTYPE_HW_EXCEPTION) )
+    if ( unlikely(eventinj.fields.v) &&
+         (eventinj.fields.type == X86_EVENTTYPE_HW_EXCEPTION) )
     {
-        _trap.vector = hvm_combine_hw_exceptions(
-            event.fields.vector, _trap.vector);
-        if ( _trap.vector == TRAP_double_fault )
-            _trap.error_code = 0;
+        _event.vector = hvm_combine_hw_exceptions(
+            eventinj.fields.vector, _event.vector);
+        if ( _event.vector == TRAP_double_fault )
+            _event.error_code = 0;
     }
 
-    event.bytes = 0;
-    event.fields.v = 1;
-    event.fields.vector = _trap.vector;
+    eventinj.bytes = 0;
+    eventinj.fields.v = 1;
+    eventinj.fields.vector = _event.vector;
 
     /* Refer to AMD Vol 2: System Programming, 15.20 Event Injection. */
-    switch ( _trap.type )
+    switch ( _event.type )
     {
     case X86_EVENTTYPE_SW_INTERRUPT: /* int $n */
         /*
@@ -1253,8 +1253,8 @@ static void svm_inject_trap(const struct hvm_trap *trap)
          * moved eip forward if appropriate.
          */
         if ( cpu_has_svm_nrips )
-            vmcb->nextrip = regs->eip + _trap.insn_len;
-        event.fields.type = X86_EVENTTYPE_SW_INTERRUPT;
+            vmcb->nextrip = regs->eip + _event.insn_len;
+        eventinj.fields.type = X86_EVENTTYPE_SW_INTERRUPT;
         break;
 
     case X86_EVENTTYPE_PRI_SW_EXCEPTION: /* icebp */
@@ -1265,7 +1265,7 @@ static void svm_inject_trap(const struct hvm_trap *trap)
          */
         if ( cpu_has_svm_nrips )
             vmcb->nextrip = regs->eip;
-        event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
+        eventinj.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
         break;
 
     case X86_EVENTTYPE_SW_EXCEPTION: /* int3, into */
@@ -1279,28 +1279,28 @@ static void svm_inject_trap(const struct hvm_trap *trap)
          * the correct faulting eip should a fault occur.
          */
         if ( cpu_has_svm_nrips )
-            vmcb->nextrip = regs->eip + _trap.insn_len;
-        event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
+            vmcb->nextrip = regs->eip + _event.insn_len;
+        eventinj.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
         break;
 
     default:
-        event.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
-        event.fields.ev = (_trap.error_code != HVM_DELIVER_NO_ERROR_CODE);
-        event.fields.errorcode = _trap.error_code;
+        eventinj.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
+        eventinj.fields.ev = (_event.error_code != HVM_DELIVER_NO_ERROR_CODE);
+        eventinj.fields.errorcode = _event.error_code;
         break;
     }
 
-    vmcb->eventinj = event;
+    vmcb->eventinj = eventinj;
 
-    if ( _trap.vector == TRAP_page_fault )
+    if ( _event.vector == TRAP_page_fault )
     {
-        curr->arch.hvm_vcpu.guest_cr[2] = _trap.cr2;
-        vmcb_set_cr2(vmcb, _trap.cr2);
-        HVMTRACE_LONG_2D(PF_INJECT, _trap.error_code, TRC_PAR_LONG(_trap.cr2));
+        curr->arch.hvm_vcpu.guest_cr[2] = _event.cr2;
+        vmcb_set_cr2(vmcb, _event.cr2);
+        HVMTRACE_LONG_2D(PF_INJECT, _event.error_code, TRC_PAR_LONG(_event.cr2));
     }
     else
     {
-        HVMTRACE_2D(INJ_EXC, _trap.vector, _trap.error_code);
+        HVMTRACE_2D(INJ_EXC, _event.vector, _event.error_code);
     }
 }
 
@@ -2238,7 +2238,7 @@ static struct hvm_function_table __initdata svm_function_table = {
     .set_guest_pat        = svm_set_guest_pat,
     .get_guest_pat        = svm_get_guest_pat,
     .set_tsc_offset       = svm_set_tsc_offset,
-    .inject_trap          = svm_inject_trap,
+    .inject_event         = svm_inject_event,
     .init_hypercall_page  = svm_init_hypercall_page,
     .event_pending        = svm_event_pending,
     .invlpg               = svm_invlpg,
@@ -2253,9 +2253,9 @@ static struct hvm_function_table __initdata svm_function_table = {
     .nhvm_vcpu_initialise = nsvm_vcpu_initialise,
     .nhvm_vcpu_destroy = nsvm_vcpu_destroy,
     .nhvm_vcpu_reset = nsvm_vcpu_reset,
-    .nhvm_vcpu_vmexit_trap = nsvm_vcpu_vmexit_trap,
+    .nhvm_vcpu_vmexit_event = nsvm_vcpu_vmexit_event,
     .nhvm_vcpu_p2m_base = nsvm_vcpu_hostcr3,
-    .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap,
+    .nhvm_vmcx_guest_intercepts_event = nsvm_vmcb_guest_intercepts_event,
     .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
     .nhvm_intr_blocked = nsvm_intr_blocked,
     .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7b2c50c..ed9b69b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1623,9 +1623,9 @@ void nvmx_enqueue_n2_exceptions(struct vcpu *v,
                  nvmx->intr.intr_info, nvmx->intr.error_code);
 }
 
-static int nvmx_vmexit_trap(struct vcpu *v, const struct hvm_trap *trap)
+static int nvmx_vmexit_event(struct vcpu *v, const struct x86_event *event)
 {
-    nvmx_enqueue_n2_exceptions(v, trap->vector, trap->error_code,
+    nvmx_enqueue_n2_exceptions(v, event->vector, event->error_code,
                                hvm_intsrc_none);
     return NESTEDHVM_VMEXIT_DONE;
 }
@@ -1707,13 +1707,13 @@ void vmx_inject_nmi(void)
  *  - #DB is X86_EVENTTYPE_HW_EXCEPTION, except when generated by
  *    opcode 0xf1 (which is X86_EVENTTYPE_PRI_SW_EXCEPTION)
  */
-static void vmx_inject_trap(const struct hvm_trap *trap)
+static void vmx_inject_event(const struct x86_event *event)
 {
     unsigned long intr_info;
     struct vcpu *curr = current;
-    struct hvm_trap _trap = *trap;
+    struct x86_event _event = *event;
 
-    switch ( _trap.vector | -(_trap.type == X86_EVENTTYPE_SW_INTERRUPT) )
+    switch ( _event.vector | -(_event.type == X86_EVENTTYPE_SW_INTERRUPT) )
     {
     case TRAP_debug:
         if ( guest_cpu_user_regs()->eflags & X86_EFLAGS_TF )
@@ -1722,7 +1722,7 @@ static void vmx_inject_trap(const struct hvm_trap *trap)
             write_debugreg(6, read_debugreg(6) | DR_STEP);
         }
         if ( !nestedhvm_vcpu_in_guestmode(curr) ||
-             !nvmx_intercepts_exception(curr, TRAP_debug, _trap.error_code) )
+             !nvmx_intercepts_exception(curr, TRAP_debug, _event.error_code) )
         {
             unsigned long val;
 
@@ -1744,8 +1744,8 @@ static void vmx_inject_trap(const struct hvm_trap *trap)
         break;
 
     case TRAP_page_fault:
-        ASSERT(_trap.type == X86_EVENTTYPE_HW_EXCEPTION);
-        curr->arch.hvm_vcpu.guest_cr[2] = _trap.cr2;
+        ASSERT(_event.type == X86_EVENTTYPE_HW_EXCEPTION);
+        curr->arch.hvm_vcpu.guest_cr[2] = _event.cr2;
         break;
     }
 
@@ -1758,34 +1758,34 @@ static void vmx_inject_trap(const struct hvm_trap *trap)
          (MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK) ==
           X86_EVENTTYPE_HW_EXCEPTION) )
     {
-        _trap.vector = hvm_combine_hw_exceptions(
-            (uint8_t)intr_info, _trap.vector);
-        if ( _trap.vector == TRAP_double_fault )
-            _trap.error_code = 0;
+        _event.vector = hvm_combine_hw_exceptions(
+            (uint8_t)intr_info, _event.vector);
+        if ( _event.vector == TRAP_double_fault )
+            _event.error_code = 0;
     }
 
-    if ( _trap.type >= X86_EVENTTYPE_SW_INTERRUPT )
-        __vmwrite(VM_ENTRY_INSTRUCTION_LEN, _trap.insn_len);
+    if ( _event.type >= X86_EVENTTYPE_SW_INTERRUPT )
+        __vmwrite(VM_ENTRY_INSTRUCTION_LEN, _event.insn_len);
 
     if ( nestedhvm_vcpu_in_guestmode(curr) &&
-         nvmx_intercepts_exception(curr, _trap.vector, _trap.error_code) )
+         nvmx_intercepts_exception(curr, _event.vector, _event.error_code) )
     {
         nvmx_enqueue_n2_exceptions (curr, 
             INTR_INFO_VALID_MASK |
-            MASK_INSR(_trap.type, INTR_INFO_INTR_TYPE_MASK) |
-            MASK_INSR(_trap.vector, INTR_INFO_VECTOR_MASK),
-            _trap.error_code, hvm_intsrc_none);
+            MASK_INSR(_event.type, INTR_INFO_INTR_TYPE_MASK) |
+            MASK_INSR(_event.vector, INTR_INFO_VECTOR_MASK),
+            _event.error_code, hvm_intsrc_none);
         return;
     }
     else
-        __vmx_inject_exception(_trap.vector, _trap.type, _trap.error_code);
+        __vmx_inject_exception(_event.vector, _event.type, _event.error_code);
 
-    if ( (_trap.vector == TRAP_page_fault) &&
-         (_trap.type == X86_EVENTTYPE_HW_EXCEPTION) )
-        HVMTRACE_LONG_2D(PF_INJECT, _trap.error_code,
+    if ( (_event.vector == TRAP_page_fault) &&
+         (_event.type == X86_EVENTTYPE_HW_EXCEPTION) )
+        HVMTRACE_LONG_2D(PF_INJECT, _event.error_code,
                          TRC_PAR_LONG(curr->arch.hvm_vcpu.guest_cr[2]));
     else
-        HVMTRACE_2D(INJ_EXC, _trap.vector, _trap.error_code);
+        HVMTRACE_2D(INJ_EXC, _event.vector, _event.error_code);
 }
 
 static int vmx_event_pending(struct vcpu *v)
@@ -2162,7 +2162,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .set_guest_pat        = vmx_set_guest_pat,
     .get_guest_pat        = vmx_get_guest_pat,
     .set_tsc_offset       = vmx_set_tsc_offset,
-    .inject_trap          = vmx_inject_trap,
+    .inject_event         = vmx_inject_event,
     .init_hypercall_page  = vmx_init_hypercall_page,
     .event_pending        = vmx_event_pending,
     .invlpg               = vmx_invlpg,
@@ -2182,8 +2182,8 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .nhvm_vcpu_reset      = nvmx_vcpu_reset,
     .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
     .nhvm_vmcx_hap_enabled = nvmx_ept_enabled,
-    .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
-    .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
+    .nhvm_vmcx_guest_intercepts_event = nvmx_intercepts_exception,
+    .nhvm_vcpu_vmexit_event = nvmx_vmexit_event,
     .nhvm_intr_blocked    = nvmx_intr_blocked,
     .nhvm_domain_relinquish_resources = nvmx_domain_relinquish_resources,
     .update_eoi_exit_bitmap = vmx_update_eoi_exit_bitmap,
@@ -3201,7 +3201,7 @@ static int vmx_handle_eoi_write(void)
  */
 static void vmx_propagate_intr(unsigned long intr)
 {
-    struct hvm_trap trap = {
+    struct x86_event event = {
         .vector = MASK_EXTR(intr, INTR_INFO_VECTOR_MASK),
         .type = MASK_EXTR(intr, INTR_INFO_INTR_TYPE_MASK),
     };
@@ -3210,20 +3210,20 @@ static void vmx_propagate_intr(unsigned long intr)
     if ( intr & INTR_INFO_DELIVER_CODE_MASK )
     {
         __vmread(VM_EXIT_INTR_ERROR_CODE, &tmp);
-        trap.error_code = tmp;
+        event.error_code = tmp;
     }
     else
-        trap.error_code = HVM_DELIVER_NO_ERROR_CODE;
+        event.error_code = HVM_DELIVER_NO_ERROR_CODE;
 
-    if ( trap.type >= X86_EVENTTYPE_SW_INTERRUPT )
+    if ( event.type >= X86_EVENTTYPE_SW_INTERRUPT )
     {
         __vmread(VM_EXIT_INSTRUCTION_LEN, &tmp);
-        trap.insn_len = tmp;
+        event.insn_len = tmp;
     }
     else
-        trap.insn_len = 0;
+        event.insn_len = 0;
 
-    hvm_inject_trap(&trap);
+    hvm_inject_event(&event);
 }
 
 static void vmx_idtv_reinject(unsigned long idtv_info)
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index bed2e0a..b5837d4 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -491,18 +491,19 @@ static void vmreturn(struct cpu_user_regs *regs, enum vmx_ops_result ops_res)
     regs->eflags = eflags;
 }
 
-bool_t nvmx_intercepts_exception(struct vcpu *v, unsigned int trap,
-                                 int error_code)
+bool_t nvmx_intercepts_exception(
+    struct vcpu *v, unsigned int vector, int error_code)
 {
     u32 exception_bitmap, pfec_match=0, pfec_mask=0;
     int r;
 
-    ASSERT ( trap < 32 );
+    ASSERT(vector < 32);
 
     exception_bitmap = get_vvmcs(v, EXCEPTION_BITMAP);
-    r = exception_bitmap & (1 << trap) ? 1: 0;
+    r = exception_bitmap & (1 << vector) ? 1: 0;
 
-    if ( trap == TRAP_page_fault ) {
+    if ( vector == TRAP_page_fault )
+    {
         pfec_match = get_vvmcs(v, PAGE_FAULT_ERROR_CODE_MATCH);
         pfec_mask  = get_vvmcs(v, PAGE_FAULT_ERROR_CODE_MASK);
         if ( (error_code & pfec_mask) != pfec_match )
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index 532bd32..9c28ed4 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -5451,6 +5451,17 @@ static void __init __maybe_unused build_assertions(void)
     BUILD_BUG_ON(x86_seg_ds != 3);
     BUILD_BUG_ON(x86_seg_fs != 4);
     BUILD_BUG_ON(x86_seg_gs != 5);
+
+    /*
+     * Check X86_EVENTTYPE_* against VMCB EVENTINJ and VMCS INTR_INFO type
+     * fields.
+     */
+    BUILD_BUG_ON(X86_EVENTTYPE_EXT_INTR != 0);
+    BUILD_BUG_ON(X86_EVENTTYPE_NMI != 2);
+    BUILD_BUG_ON(X86_EVENTTYPE_HW_EXCEPTION != 3);
+    BUILD_BUG_ON(X86_EVENTTYPE_SW_INTERRUPT != 4);
+    BUILD_BUG_ON(X86_EVENTTYPE_PRI_SW_EXCEPTION != 5);
+    BUILD_BUG_ON(X86_EVENTTYPE_SW_EXCEPTION != 6);
 }
 
 #ifdef __XEN__
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index ab566c0..54c532c 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -67,6 +67,28 @@ enum x86_swint_emulation {
     x86_swint_emulate_all,  /* Help needed with all software events */
 };
 
+/*
+ * x86 event types. This enumeration is valid for:
+ *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
+ *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
+ */
+enum x86_event_type {
+    X86_EVENTTYPE_EXT_INTR,         /* External interrupt */
+    X86_EVENTTYPE_NMI = 2,          /* NMI */
+    X86_EVENTTYPE_HW_EXCEPTION,     /* Hardware exception */
+    X86_EVENTTYPE_SW_INTERRUPT,     /* Software interrupt (CD nn) */
+    X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
+    X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
+};
+
+struct x86_event {
+    int16_t       vector;
+    uint8_t       type;         /* X86_EVENTTYPE_* */
+    uint8_t       insn_len;     /* Instruction length */
+    uint32_t      error_code;   /* HVM_DELIVER_NO_ERROR_CODE if n/a */
+    unsigned long cr2;          /* Only for TRAP_page_fault h/w exception */
+};
+
 /* 
  * Attribute for segment selector. This is a copy of bit 40:47 & 52:55 of the
  * segment descriptor. It happens to match the format of an AMD SVM VMCB.
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index d4186a2..3b7ec33 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -30,7 +30,7 @@ struct hvm_emulate_ctxt {
     unsigned long seg_reg_dirty;
 
     bool_t exn_pending;
-    struct hvm_trap trap;
+    struct x86_event trap;
 
     uint32_t intr_shadow;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 7e7462e..51a64f7 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -77,14 +77,6 @@ enum hvm_intblk {
 #define HVM_HAP_SUPERPAGE_2MB   0x00000001
 #define HVM_HAP_SUPERPAGE_1GB   0x00000002
 
-struct hvm_trap {
-    int16_t       vector;
-    uint8_t       type;         /* X86_EVENTTYPE_* */
-    uint8_t       insn_len;     /* Instruction length */
-    uint32_t      error_code;   /* HVM_DELIVER_NO_ERROR_CODE if n/a */
-    unsigned long cr2;          /* Only for TRAP_page_fault h/w exception */
-};
-
 /*
  * The hardware virtual machine (HVM) interface abstracts away from the
  * x86/x86_64 CPU virtualization assist specifics. Currently this interface
@@ -152,7 +144,7 @@ struct hvm_function_table {
 
     void (*set_tsc_offset)(struct vcpu *v, u64 offset, u64 at_tsc);
 
-    void (*inject_trap)(const struct hvm_trap *trap);
+    void (*inject_event)(const struct x86_event *event);
 
     void (*init_hypercall_page)(struct domain *d, void *hypercall_page);
 
@@ -185,11 +177,10 @@ struct hvm_function_table {
     int (*nhvm_vcpu_initialise)(struct vcpu *v);
     void (*nhvm_vcpu_destroy)(struct vcpu *v);
     int (*nhvm_vcpu_reset)(struct vcpu *v);
-    int (*nhvm_vcpu_vmexit_trap)(struct vcpu *v, const struct hvm_trap *trap);
+    int (*nhvm_vcpu_vmexit_event)(struct vcpu *v, const struct x86_event *event);
     uint64_t (*nhvm_vcpu_p2m_base)(struct vcpu *v);
-    bool_t (*nhvm_vmcx_guest_intercepts_trap)(struct vcpu *v,
-                                              unsigned int trapnr,
-                                              int errcode);
+    bool_t (*nhvm_vmcx_guest_intercepts_event)(
+        struct vcpu *v, unsigned int vector, int errcode);
 
     bool_t (*nhvm_vmcx_hap_enabled)(struct vcpu *v);
 
@@ -419,9 +410,30 @@ void hvm_migrate_timers(struct vcpu *v);
 void hvm_do_resume(struct vcpu *v);
 void hvm_migrate_pirqs(struct vcpu *v);
 
-void hvm_inject_trap(const struct hvm_trap *trap);
-void hvm_inject_hw_exception(unsigned int trapnr, int errcode);
-void hvm_inject_page_fault(int errcode, unsigned long cr2);
+void hvm_inject_event(const struct x86_event *event);
+
+static inline void hvm_inject_hw_exception(unsigned int vector, int errcode)
+{
+    struct x86_event event = {
+        .vector = vector,
+        .type = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code = errcode,
+    };
+
+    hvm_inject_event(&event);
+}
+
+static inline void hvm_inject_page_fault(int errcode, unsigned long cr2)
+{
+    struct x86_event event = {
+        .vector = TRAP_page_fault,
+        .type = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code = errcode,
+        .cr2 = cr2,
+    };
+
+    hvm_inject_event(&event);
+}
 
 static inline int hvm_event_pending(struct vcpu *v)
 {
@@ -437,18 +449,6 @@ static inline int hvm_event_pending(struct vcpu *v)
                        (1U << TRAP_alignment_check) | \
                        (1U << TRAP_machine_check))
 
-/*
- * x86 event types. This enumeration is valid for:
- *  Intel VMX: {VM_ENTRY,VM_EXIT,IDT_VECTORING}_INTR_INFO[10:8]
- *  AMD SVM: eventinj[10:8] and exitintinfo[10:8] (types 0-4 only)
- */
-#define X86_EVENTTYPE_EXT_INTR         0 /* external interrupt */
-#define X86_EVENTTYPE_NMI              2 /* NMI */
-#define X86_EVENTTYPE_HW_EXCEPTION     3 /* hardware exception */
-#define X86_EVENTTYPE_SW_INTERRUPT     4 /* software interrupt (CD nn) */
-#define X86_EVENTTYPE_PRI_SW_EXCEPTION 5 /* ICEBP (F1) */
-#define X86_EVENTTYPE_SW_EXCEPTION     6 /* INT3 (CC), INTO (CE) */
-
 int hvm_event_needs_reinjection(uint8_t type, uint8_t vector);
 
 uint8_t hvm_combine_hw_exceptions(uint8_t vec1, uint8_t vec2);
@@ -542,10 +542,10 @@ int hvm_x2apic_msr_write(struct vcpu *v, unsigned int msr, uint64_t msr_content)
 /* inject vmexit into l1 guest. l1 guest will see a VMEXIT due to
  * 'trapnr' exception.
  */ 
-static inline int nhvm_vcpu_vmexit_trap(struct vcpu *v,
-                                        const struct hvm_trap *trap)
+static inline int nhvm_vcpu_vmexit_event(
+    struct vcpu *v, const struct x86_event *event)
 {
-    return hvm_funcs.nhvm_vcpu_vmexit_trap(v, trap);
+    return hvm_funcs.nhvm_vcpu_vmexit_event(v, event);
 }
 
 /* returns l1 guest's cr3 that points to the page table used to
@@ -557,11 +557,10 @@ static inline uint64_t nhvm_vcpu_p2m_base(struct vcpu *v)
 }
 
 /* returns true, when l1 guest intercepts the specified trap */
-static inline bool_t nhvm_vmcx_guest_intercepts_trap(struct vcpu *v,
-                                                     unsigned int trap,
-                                                     int errcode)
+static inline bool_t nhvm_vmcx_guest_intercepts_event(
+    struct vcpu *v, unsigned int vector, int errcode)
 {
-    return hvm_funcs.nhvm_vmcx_guest_intercepts_trap(v, trap, errcode);
+    return hvm_funcs.nhvm_vmcx_guest_intercepts_event(v, vector, errcode);
 }
 
 /* returns true when l1 guest wants to use hap to run l2 guest */
diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h
index 0dbc5ec..4b36c25 100644
--- a/xen/include/asm-x86/hvm/svm/nestedsvm.h
+++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h
@@ -110,10 +110,10 @@ void nsvm_vcpu_destroy(struct vcpu *v);
 int nsvm_vcpu_initialise(struct vcpu *v);
 int nsvm_vcpu_reset(struct vcpu *v);
 int nsvm_vcpu_vmrun(struct vcpu *v, struct cpu_user_regs *regs);
-int nsvm_vcpu_vmexit_trap(struct vcpu *v, const struct hvm_trap *trap);
+int nsvm_vcpu_vmexit_event(struct vcpu *v, const struct x86_event *event);
 uint64_t nsvm_vcpu_hostcr3(struct vcpu *v);
-bool_t nsvm_vmcb_guest_intercepts_trap(struct vcpu *v, unsigned int trapnr,
-                                       int errcode);
+bool_t nsvm_vmcb_guest_intercepts_event(
+    struct vcpu *v, unsigned int vector, int errcode);
 bool_t nsvm_vmcb_hap_enabled(struct vcpu *v);
 enum hvm_intblk nsvm_intr_blocked(struct vcpu *v);
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 84d9406..d485536 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -206,7 +206,7 @@ struct hvm_vcpu {
     void *fpu_exception_callback_arg;
 
     /* Pending hw/sw interrupt (.vector = -1 means nothing pending). */
-    struct hvm_trap     inject_trap;
+    struct x86_event     inject_trap;
 
     struct viridian_vcpu viridian;
 };
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index aca8b4b..ead586e 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -112,8 +112,8 @@ void nvmx_vcpu_destroy(struct vcpu *v);
 int nvmx_vcpu_reset(struct vcpu *v);
 uint64_t nvmx_vcpu_eptp_base(struct vcpu *v);
 enum hvm_intblk nvmx_intr_blocked(struct vcpu *v);
-bool_t nvmx_intercepts_exception(struct vcpu *v, unsigned int trap,
-                                 int error_code);
+bool_t nvmx_intercepts_exception(
+    struct vcpu *v, unsigned int vector, int error_code);
 void nvmx_domain_relinquish_resources(struct domain *d);
 
 bool_t nvmx_ept_enabled(struct vcpu *v);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 05/19] x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (3 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 04/19] x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}() Andrew Cooper
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

and move it to live with the other x86_event infrastructure in x86_emulate.h.
Switch it and x86_event.error_code to being signed, matching the rest of the
code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---

v2:
 * Rebase over corrections to the use of HVM_DELIVER_NO_ERROR_CODE
---
 xen/arch/x86/hvm/emulate.c             |  5 ++---
 xen/arch/x86/hvm/hvm.c                 |  6 +++---
 xen/arch/x86/hvm/nestedhvm.c           |  2 +-
 xen/arch/x86/hvm/svm/nestedsvm.c       |  6 +++---
 xen/arch/x86/hvm/svm/svm.c             | 20 ++++++++++----------
 xen/arch/x86/hvm/vmx/intr.c            |  2 +-
 xen/arch/x86/hvm/vmx/vmx.c             | 25 +++++++++++++------------
 xen/arch/x86/hvm/vmx/vvmx.c            |  2 +-
 xen/arch/x86/x86_emulate/x86_emulate.h |  3 ++-
 xen/include/asm-x86/hvm/support.h      |  2 --
 10 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index bb26d40..bc259ec 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1609,7 +1609,7 @@ static int hvmemul_inject_sw_interrupt(
 
     hvmemul_ctxt->exn_pending = 1;
     hvmemul_ctxt->trap.vector = vector;
-    hvmemul_ctxt->trap.error_code = HVM_DELIVER_NO_ERROR_CODE;
+    hvmemul_ctxt->trap.error_code = X86_EVENT_NO_EC;
     hvmemul_ctxt->trap.insn_len = insn_len;
 
     return X86EMUL_OKAY;
@@ -1696,8 +1696,7 @@ static int hvmemul_vmfunc(
 
     rc = hvm_funcs.altp2m_vcpu_emulate_vmfunc(ctxt->regs);
     if ( rc != X86EMUL_OKAY )
-        hvmemul_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE,
-                                    ctxt);
+        hvmemul_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC, ctxt);
 
     return rc;
 }
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7b434aa..b950842 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -502,7 +502,7 @@ void hvm_do_resume(struct vcpu *v)
                 kind = EMUL_KIND_SET_CONTEXT_INSN;
 
             hvm_emulate_one_vm_event(kind, TRAP_invalid_op,
-                                     HVM_DELIVER_NO_ERROR_CODE);
+                                     X86_EVENT_NO_EC);
 
             v->arch.vm_event->emulate_flags = 0;
         }
@@ -3054,7 +3054,7 @@ void hvm_task_switch(
     }
 
     if ( (tss.trace & 1) && !exn_raised )
-        hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
 
  out:
     hvm_unmap_entry(optss_desc);
@@ -4073,7 +4073,7 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
     switch ( hvm_emulate_one(&ctxt) )
     {
     case X86EMUL_UNHANDLEABLE:
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         break;
     case X86EMUL_EXCEPTION:
         if ( ctxt.exn_pending )
diff --git a/xen/arch/x86/hvm/nestedhvm.c b/xen/arch/x86/hvm/nestedhvm.c
index caad525..c4671d8 100644
--- a/xen/arch/x86/hvm/nestedhvm.c
+++ b/xen/arch/x86/hvm/nestedhvm.c
@@ -17,7 +17,7 @@
  */
 
 #include <asm/msr.h>
-#include <asm/hvm/support.h>	/* for HVM_DELIVER_NO_ERROR_CODE */
+#include <asm/hvm/support.h>
 #include <asm/hvm/hvm.h>
 #include <asm/p2m.h>    /* for struct p2m_domain */
 #include <asm/hvm/nestedhvm.h>
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index b6b8526..8c9b073 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -756,7 +756,7 @@ nsvm_vcpu_vmrun(struct vcpu *v, struct cpu_user_regs *regs)
     default:
         gdprintk(XENLOG_ERR,
             "nsvm_vcpu_vmentry failed, injecting #UD\n");
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         /* Must happen after hvm_inject_hw_exception or it doesn't work right. */
         nv->nv_vmswitch_in_progress = 0;
         return 1;
@@ -1581,7 +1581,7 @@ void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v)
     unsigned int inst_len;
 
     if ( !nestedhvm_enabled(v->domain) ) {
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         return;
     }
 
@@ -1601,7 +1601,7 @@ void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v)
     vintr_t intr;
 
     if ( !nestedhvm_enabled(v->domain) ) {
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         return;
     }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index caab5ce..912d871 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -89,7 +89,7 @@ static DEFINE_SPINLOCK(osvw_lock);
 static void svm_crash_or_fault(struct vcpu *v)
 {
     if ( vmcb_get_cpl(v->arch.hvm_svm.vmcb) )
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
     else
         domain_crash(v->domain);
 }
@@ -116,7 +116,7 @@ void __update_guest_eip(struct cpu_user_regs *regs, unsigned int inst_len)
     curr->arch.hvm_svm.vmcb->interrupt_shadow = 0;
 
     if ( regs->eflags & X86_EFLAGS_TF )
-        hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
 }
 
 static void svm_cpu_down(void)
@@ -1285,7 +1285,7 @@ static void svm_inject_event(const struct x86_event *event)
 
     default:
         eventinj.fields.type = X86_EVENTTYPE_HW_EXCEPTION;
-        eventinj.fields.ev = (_event.error_code != HVM_DELIVER_NO_ERROR_CODE);
+        eventinj.fields.ev = (_event.error_code != X86_EVENT_NO_EC);
         eventinj.fields.errorcode = _event.error_code;
         break;
     }
@@ -1553,7 +1553,7 @@ static void svm_fpu_dirty_intercept(void)
     {
        /* Check if l1 guest must make FPU ready for the l2 guest */
        if ( v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_TS )
-           hvm_inject_hw_exception(TRAP_no_device, HVM_DELIVER_NO_ERROR_CODE);
+           hvm_inject_hw_exception(TRAP_no_device, X86_EVENT_NO_EC);
        else
            vmcb_set_cr0(n1vmcb, vmcb_get_cr0(n1vmcb) & ~X86_CR0_TS);
        return;
@@ -2022,7 +2022,7 @@ svm_vmexit_do_vmrun(struct cpu_user_regs *regs,
     if ( !nsvm_efer_svm_enabled(v) )
     {
         gdprintk(XENLOG_ERR, "VMRUN: nestedhvm disabled, injecting #UD\n");
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         return;
     }
 
@@ -2077,7 +2077,7 @@ svm_vmexit_do_vmload(struct vmcb_struct *vmcb,
     if ( !nsvm_efer_svm_enabled(v) ) 
     {
         gdprintk(XENLOG_ERR, "VMLOAD: nestedhvm disabled, injecting #UD\n");
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         return;
     }
 
@@ -2113,7 +2113,7 @@ svm_vmexit_do_vmsave(struct vmcb_struct *vmcb,
     if ( !nsvm_efer_svm_enabled(v) ) 
     {
         gdprintk(XENLOG_ERR, "VMSAVE: nestedhvm disabled, injecting #UD\n");
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         return;
     }
 
@@ -2416,7 +2416,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 
     case VMEXIT_EXCEPTION_DB:
         if ( !v->domain->debugger_attached )
-            hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE);
+            hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
         else
             domain_pause_for_debugger();
         break;
@@ -2604,7 +2604,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 
     case VMEXIT_MONITOR:
     case VMEXIT_MWAIT:
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         break;
 
     case VMEXIT_VMRUN:
@@ -2623,7 +2623,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
         svm_vmexit_do_clgi(regs, v);
         break;
     case VMEXIT_SKINIT:
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         break;
 
     case VMEXIT_XSETBV:
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 8fca08c..639a705 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -302,7 +302,7 @@ void vmx_intr_assist(void)
     }
     else if ( intack.source == hvm_intsrc_mce )
     {
-        hvm_inject_hw_exception(TRAP_machine_check, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
     }
     else if ( cpu_has_vmx_virtual_intr_delivery &&
               intack.source != hvm_intsrc_pic &&
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ed9b69b..31f08d2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1646,7 +1646,8 @@ static void __vmx_inject_exception(int trap, int type, int error_code)
     intr_fields = INTR_INFO_VALID_MASK |
                   MASK_INSR(type, INTR_INFO_INTR_TYPE_MASK) |
                   MASK_INSR(trap, INTR_INFO_VECTOR_MASK);
-    if ( error_code != HVM_DELIVER_NO_ERROR_CODE ) {
+    if ( error_code != X86_EVENT_NO_EC )
+    {
         __vmwrite(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
         intr_fields |= INTR_INFO_DELIVER_CODE_MASK;
     }
@@ -1671,12 +1672,12 @@ void vmx_inject_extint(int trap, uint8_t source)
                INTR_INFO_VALID_MASK |
                MASK_INSR(X86_EVENTTYPE_EXT_INTR, INTR_INFO_INTR_TYPE_MASK) |
                MASK_INSR(trap, INTR_INFO_VECTOR_MASK),
-               HVM_DELIVER_NO_ERROR_CODE, source);
+               X86_EVENT_NO_EC, source);
             return;
         }
     }
     __vmx_inject_exception(trap, X86_EVENTTYPE_EXT_INTR,
-                           HVM_DELIVER_NO_ERROR_CODE);
+                           X86_EVENT_NO_EC);
 }
 
 void vmx_inject_nmi(void)
@@ -1691,12 +1692,12 @@ void vmx_inject_nmi(void)
                INTR_INFO_VALID_MASK |
                MASK_INSR(X86_EVENTTYPE_NMI, INTR_INFO_INTR_TYPE_MASK) |
                MASK_INSR(TRAP_nmi, INTR_INFO_VECTOR_MASK),
-               HVM_DELIVER_NO_ERROR_CODE, hvm_intsrc_nmi);
+               X86_EVENT_NO_EC, hvm_intsrc_nmi);
             return;
         }
     }
     __vmx_inject_exception(2, X86_EVENTTYPE_NMI,
-                           HVM_DELIVER_NO_ERROR_CODE);
+                           X86_EVENT_NO_EC);
 }
 
 /*
@@ -2111,7 +2112,7 @@ static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
     vmx_vmcs_exit(v);
 
     hvm_inject_hw_exception(TRAP_virtualisation,
-                            HVM_DELIVER_NO_ERROR_CODE);
+                            X86_EVENT_NO_EC);
 
  out:
     hvm_unmap_guest_frame(veinfo, 0);
@@ -2387,7 +2388,7 @@ void update_guest_eip(void)
     }
 
     if ( regs->eflags & X86_EFLAGS_TF )
-        hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
 }
 
 static void vmx_fpu_dirty_intercept(void)
@@ -2915,7 +2916,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
         if ( (rc < 0) ||
              (msr_content && (vmx_add_host_load_msr(msr) < 0)) )
-            hvm_inject_hw_exception(TRAP_machine_check, HVM_DELIVER_NO_ERROR_CODE);
+            hvm_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
         else
             __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
 
@@ -3213,7 +3214,7 @@ static void vmx_propagate_intr(unsigned long intr)
         event.error_code = tmp;
     }
     else
-        event.error_code = HVM_DELIVER_NO_ERROR_CODE;
+        event.error_code = X86_EVENT_NO_EC;
 
     if ( event.type >= X86_EVENTTYPE_SW_INTERRUPT )
     {
@@ -3770,7 +3771,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
     case EXIT_REASON_VMFUNC:
         if ( vmx_vmfunc_intercept(regs) != X86EMUL_OKAY )
-            hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+            hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         else
             update_guest_eip();
         break;
@@ -3784,7 +3785,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
          * as far as vmexit.
          */
         WARN_ON(exit_reason == EXIT_REASON_GETSEC);
-        hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+        hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         break;
 
     case EXIT_REASON_TPR_BELOW_THRESHOLD:
@@ -3909,7 +3910,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             vmx_get_segment_register(v, x86_seg_ss, &ss);
             if ( ss.attr.fields.dpl )
                 hvm_inject_hw_exception(TRAP_invalid_op,
-                                        HVM_DELIVER_NO_ERROR_CODE);
+                                        X86_EVENT_NO_EC);
             else
                 domain_crash(v->domain);
         }
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index b5837d4..efaf54c 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -380,7 +380,7 @@ static int vmx_inst_check_privilege(struct cpu_user_regs *regs, int vmxop_check)
     
 invalid_op:
     gdprintk(XENLOG_ERR, "vmx_inst_check_privilege: invalid_op\n");
-    hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
+    hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
     return X86EMUL_EXCEPTION;
 
 gp_fault:
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 54c532c..b0f0304 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -80,12 +80,13 @@ enum x86_event_type {
     X86_EVENTTYPE_PRI_SW_EXCEPTION, /* ICEBP (F1) */
     X86_EVENTTYPE_SW_EXCEPTION,     /* INT3 (CC), INTO (CE) */
 };
+#define X86_EVENT_NO_EC (-1)        /* No error code. */
 
 struct x86_event {
     int16_t       vector;
     uint8_t       type;         /* X86_EVENTTYPE_* */
     uint8_t       insn_len;     /* Instruction length */
-    uint32_t      error_code;   /* HVM_DELIVER_NO_ERROR_CODE if n/a */
+    int32_t       error_code;   /* X86_EVENT_NO_EC if n/a */
     unsigned long cr2;          /* Only for TRAP_page_fault h/w exception */
 };
 
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 2984abc..9938450 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -25,8 +25,6 @@
 #include <xen/hvm/save.h>
 #include <asm/processor.h>
 
-#define HVM_DELIVER_NO_ERROR_CODE  (~0U)
-
 #ifndef NDEBUG
 #define DBG_LEVEL_0                 (1 << 0)
 #define DBG_LEVEL_1                 (1 << 1)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (4 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 05/19] x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:58   ` Tim Deegan
  2016-11-29 16:00   ` Jan Beulich
  2016-11-28 11:13 ` [PATCH v2 07/19] x86/emul: Remove opencoded exception generation Andrew Cooper
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

To help with event injection improvements for the PV uses of x86_emulate(),
implement a event injection API which matches its hvm counterpart.

This is started with taking do_guest_trap() and modifying its calling API to
pv_inject_event(), subsequentally implementing the former in terms of the
latter.

The existing propagate_page_fault() is fairly similar to
pv_inject_page_fault(), although it has a return value.  Only a single caller
makes use of the return value, and non-NULL is only returned if the passed cr2
is non-canonical.  Opencode this single case in
handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
void.

The #PF specific bits are moved into pv_inject_event(), and
pv_inject_page_fault() is implemented as a static inline wrapper.
reserved_bit_page_fault() is pure code motion.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm.c               |   5 +-
 xen/arch/x86/mm/shadow/common.c |   4 +-
 xen/arch/x86/traps.c            | 172 ++++++++++++++++++++--------------------
 xen/include/asm-x86/domain.h    |  26 ++++++
 xen/include/asm-x86/mm.h        |   1 -
 5 files changed, 118 insertions(+), 90 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d365f59..b7c7122 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5136,7 +5136,7 @@ static int ptwr_emulated_read(
     if ( !__addr_ok(addr) ||
          (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
     {
-        propagate_page_fault(addr + bytes - rc, 0); /* read fault */
+        pv_inject_page_fault(0, addr + bytes - rc); /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -5177,7 +5177,8 @@ static int ptwr_emulated_update(
         addr &= ~(sizeof(paddr_t)-1);
         if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
         {
-            propagate_page_fault(addr+sizeof(paddr_t)-rc, 0); /* read fault */
+            pv_inject_page_fault(0, /* Read fault. */
+                                 addr + sizeof(paddr_t) - rc);
             return X86EMUL_EXCEPTION;
         }
         /* Mask out bits provided by caller. */
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index a4a3c4b..f07803b 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -323,7 +323,7 @@ pv_emulate_read(enum x86_segment seg,
 
     if ( (rc = copy_from_user(p_data, (void *)offset, bytes)) != 0 )
     {
-        propagate_page_fault(offset + bytes - rc, 0); /* read fault */
+        pv_inject_page_fault(0, offset + bytes - rc); /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -1723,7 +1723,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v, unsigned long vaddr,
         if ( is_hvm_vcpu(v) )
             hvm_inject_page_fault(pfec, vaddr);
         else
-            propagate_page_fault(vaddr, pfec);
+            pv_inject_page_fault(pfec, vaddr);
         return _mfn(BAD_GVA_TO_GFN);
     }
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b464211..7301298 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -625,37 +625,90 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
           (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-static void do_guest_trap(unsigned int trapnr,
-                          const struct cpu_user_regs *regs)
+static void reserved_bit_page_fault(
+    unsigned long addr, struct cpu_user_regs *regs)
+{
+    printk("%pv: reserved bit in page table (ec=%04X)\n",
+           current, regs->error_code);
+    show_page_walk(addr);
+    show_execution_state(regs);
+}
+
+void pv_inject_event(const struct x86_event *event)
 {
     struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
     struct trap_bounce *tb;
     const struct trap_info *ti;
+    const uint8_t vector = event->vector;
     const bool use_error_code =
-        ((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)));
+        ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
+    unsigned int error_code = event->error_code;
 
-    trace_pv_trap(trapnr, regs->eip, use_error_code, regs->error_code);
+    ASSERT(vector == event->vector); /* Check for no truncation. */
+    if ( use_error_code )
+        ASSERT(error_code != X86_EVENT_NO_EC);
+    else
+        ASSERT(error_code == X86_EVENT_NO_EC);
 
     tb = &v->arch.pv_vcpu.trap_bounce;
-    ti = &v->arch.pv_vcpu.trap_ctxt[trapnr];
+    ti = &v->arch.pv_vcpu.trap_ctxt[vector];
 
     tb->flags = TBF_EXCEPTION;
     tb->cs    = ti->cs;
     tb->eip   = ti->address;
 
+    if ( vector == TRAP_page_fault )
+    {
+        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
+        arch_set_cr2(v, event->cr2);
+
+        /* Re-set error_code.user flag appropriately for the guest. */
+        error_code &= ~PFEC_user_mode;
+        if ( !guest_kernel_mode(v, regs) )
+            error_code |= PFEC_user_mode;
+
+        trace_pv_page_fault(event->cr2, error_code);
+    }
+    else
+        trace_pv_trap(vector, regs->eip, use_error_code, error_code);
+
     if ( use_error_code )
     {
         tb->flags |= TBF_EXCEPTION_ERRCODE;
-        tb->error_code = regs->error_code;
+        tb->error_code = error_code;
     }
 
     if ( TI_GET_IF(ti) )
         tb->flags |= TBF_INTERRUPT;
 
     if ( unlikely(null_trap_bounce(v, tb)) )
-        gprintk(XENLOG_WARNING,
-                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
-                trapstr(trapnr), trapnr, regs->error_code);
+    {
+        if ( vector == TRAP_page_fault )
+        {
+            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
+            show_page_walk(event->cr2);
+
+            if ( unlikely(error_code & PFEC_reserved_bit) )
+                reserved_bit_page_fault(event->cr2, regs);
+        }
+        else
+            gprintk(XENLOG_WARNING,
+                    "Unhandled %s fault/trap [#%d, ec=%04x]\n",
+                    trapstr(vector), vector, error_code);
+    }
+}
+
+static inline void do_guest_trap(unsigned int trapnr,
+                                 const struct cpu_user_regs *regs)
+{
+    const struct x86_event event = {
+        .vector = trapnr,
+        .error_code = (((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)))
+                       ? regs->error_code : X86_EVENT_NO_EC),
+    };
+
+    pv_inject_event(&event);
 }
 
 static void instruction_done(
@@ -1289,7 +1342,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     eip = regs->eip;
     if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
     {
-        propagate_page_fault(eip + sizeof(opcode) - rc, 0);
+        pv_inject_page_fault(0, eip + sizeof(opcode) - rc);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
@@ -1310,7 +1363,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     /* Check for forced emulation signature: ud2 ; .ascii "xen". */
     if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
     {
-        propagate_page_fault(eip + sizeof(sig) - rc, 0);
+        pv_inject_page_fault(0, eip + sizeof(sig) - rc);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(sig, "\xf\xbxen", sizeof(sig)) )
@@ -1320,7 +1373,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     /* We only emulate CPUID. */
     if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
     {
-        propagate_page_fault(eip + sizeof(instr) - rc, 0);
+        pv_inject_page_fault(0, eip + sizeof(instr) - rc);
         return EXCRET_fault_fixed;
     }
     if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
@@ -1478,62 +1531,6 @@ void do_int3(struct cpu_user_regs *regs)
     do_guest_trap(TRAP_int3, regs);
 }
 
-static void reserved_bit_page_fault(
-    unsigned long addr, struct cpu_user_regs *regs)
-{
-    printk("%pv: reserved bit in page table (ec=%04X)\n",
-           current, regs->error_code);
-    show_page_walk(addr);
-    show_execution_state(regs);
-}
-
-struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code)
-{
-    struct trap_info *ti;
-    struct vcpu *v = current;
-    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
-
-    if ( unlikely(!is_canonical_address(addr)) )
-    {
-        ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
-        tb->flags      = TBF_EXCEPTION | TBF_EXCEPTION_ERRCODE;
-        tb->error_code = 0;
-        tb->cs         = ti->cs;
-        tb->eip        = ti->address;
-        if ( TI_GET_IF(ti) )
-            tb->flags |= TBF_INTERRUPT;
-        return tb;
-    }
-
-    v->arch.pv_vcpu.ctrlreg[2] = addr;
-    arch_set_cr2(v, addr);
-
-    /* Re-set error_code.user flag appropriately for the guest. */
-    error_code &= ~PFEC_user_mode;
-    if ( !guest_kernel_mode(v, guest_cpu_user_regs()) )
-        error_code |= PFEC_user_mode;
-
-    trace_pv_page_fault(addr, error_code);
-
-    ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_page_fault];
-    tb->flags = TBF_EXCEPTION | TBF_EXCEPTION_ERRCODE;
-    tb->error_code = error_code;
-    tb->cs         = ti->cs;
-    tb->eip        = ti->address;
-    if ( TI_GET_IF(ti) )
-        tb->flags |= TBF_INTERRUPT;
-    if ( unlikely(null_trap_bounce(v, tb)) )
-    {
-        printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
-        show_page_walk(addr);
-    }
-
-    if ( unlikely(error_code & PFEC_reserved_bit) )
-        reserved_bit_page_fault(addr, guest_cpu_user_regs());
-
-    return NULL;
-}
-
 static int handle_gdt_ldt_mapping_fault(
     unsigned long offset, struct cpu_user_regs *regs)
 {
@@ -1565,17 +1562,22 @@ static int handle_gdt_ldt_mapping_fault(
         }
         else
         {
-            struct trap_bounce *tb;
-
             /* In hypervisor mode? Leave it to the #PF handler to fix up. */
             if ( !guest_mode(regs) )
                 return 0;
-            /* In guest mode? Propagate fault to guest, with adjusted %cr2. */
-            tb = propagate_page_fault(curr->arch.pv_vcpu.ldt_base + offset,
-                                      regs->error_code);
-            if ( tb )
-                tb->error_code = (offset & ~(X86_XEC_EXT | X86_XEC_IDT)) |
-                                 X86_XEC_TI;
+
+            /* Access would have become non-canonical? Pass #GP[sel] back. */
+            if ( unlikely(!is_canonical_address(
+                              curr->arch.pv_vcpu.ldt_base + offset)) )
+            {
+                uint16_t ec = (offset & ~(X86_XEC_EXT | X86_XEC_IDT)) | X86_XEC_TI;
+
+                pv_inject_hw_exception(TRAP_gp_fault, ec);
+            }
+            else
+                /* else pass the #PF back, with adjusted %cr2. */
+                pv_inject_page_fault(regs->error_code,
+                                     curr->arch.pv_vcpu.ldt_base + offset);
         }
     }
     else
@@ -1858,7 +1860,7 @@ void do_page_fault(struct cpu_user_regs *regs)
             return;
     }
 
-    propagate_page_fault(addr, regs->error_code);
+    pv_inject_page_fault(regs->error_code, addr);
 }
 
 /*
@@ -2788,7 +2790,7 @@ int pv_emul_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx,
         goto fail;                                                          \
     if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 )       \
     {                                                                       \
-        propagate_page_fault(_ptr + sizeof(_x) - _rc, 0);                   \
+        pv_inject_page_fault(0, _ptr + sizeof(_x) - _rc);                   \
         goto skip;                                                          \
     }                                                                       \
     (eip) += sizeof(_x); _x; })
@@ -2953,8 +2955,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( (rc = copy_to_user((void *)data_base + rd_ad(edi),
                                     &data, op_bytes)) != 0 )
             {
-                propagate_page_fault(data_base + rd_ad(edi) + op_bytes - rc,
-                                     PFEC_write_access);
+                pv_inject_page_fault(PFEC_write_access,
+                                     data_base + rd_ad(edi) + op_bytes - rc);
                 return EXCRET_fault_fixed;
             }
             wr_ad(edi, regs->edi + (int)((regs->eflags & X86_EFLAGS_DF)
@@ -2971,8 +2973,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( (rc = copy_from_user(&data, (void *)data_base + rd_ad(esi),
                                       op_bytes)) != 0 )
             {
-                propagate_page_fault(data_base + rd_ad(esi)
-                                     + op_bytes - rc, 0);
+                pv_inject_page_fault(0, data_base + rd_ad(esi)
+                                     + op_bytes - rc);
                 return EXCRET_fault_fixed;
             }
             guest_io_write(port, op_bytes, data, currd);
@@ -3529,8 +3531,8 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             rc = __put_user(item, stkp); \
             if ( rc ) \
             { \
-                propagate_page_fault((unsigned long)(stkp + 1) - rc, \
-                                     PFEC_write_access); \
+                pv_inject_page_fault(PFEC_write_access, \
+                                     (unsigned long)(stkp + 1) - rc); \
                 return; \
             } \
         } while ( 0 )
@@ -3597,7 +3599,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                     rc = __get_user(parm, ustkp);
                     if ( rc )
                     {
-                        propagate_page_fault((unsigned long)(ustkp + 1) - rc, 0);
+                        pv_inject_page_fault(0, (unsigned long)(ustkp + 1) - rc);
                         return;
                     }
                     push(parm);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index f6a40eb..39cc658 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -8,6 +8,7 @@
 #include <asm/hvm/domain.h>
 #include <asm/e820.h>
 #include <asm/mce.h>
+#include <asm/x86_emulate.h>
 #include <public/vcpu.h>
 #include <public/hvm/hvm_info_table.h>
 
@@ -632,6 +633,31 @@ static inline void free_vcpu_guest_context(struct vcpu_guest_context *vgc)
 struct vcpu_hvm_context;
 int arch_set_info_hvm_guest(struct vcpu *v, const struct vcpu_hvm_context *ctx);
 
+void pv_inject_event(const struct x86_event *event);
+
+static inline void pv_inject_hw_exception(unsigned int vector, int errcode)
+{
+    const struct x86_event event = {
+        .vector = vector,
+        .type = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code = errcode,
+    };
+
+    pv_inject_event(&event);
+}
+
+static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
+{
+    const struct x86_event event = {
+        .vector = TRAP_page_fault,
+        .type = X86_EVENTTYPE_HW_EXCEPTION,
+        .error_code = errcode,
+        .cr2 = cr2,
+    };
+
+    pv_inject_event(&event);
+}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 1b4d1c3..a15029c 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -539,7 +539,6 @@ int new_guest_cr3(unsigned long pfn);
 void make_cr3(struct vcpu *v, unsigned long mfn);
 void update_cr3(struct vcpu *v);
 int vcpu_destroy_pagetables(struct vcpu *);
-struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 int __sync_local_execstate(void);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 07/19] x86/emul: Remove opencoded exception generation
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (5 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}() Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 08/19] x86/emul: Rework emulator event injection Andrew Cooper
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Introduce generate_exception() for unconditional exception generation, and
replace existing uses.  Both generate_exception() and generate_exception_if()
are updated to make their error code parameters optional, which removes the
use of the -1 sentinal.

The ioport_access_check() check loses the presence check for %tr, as the x86
architecture has no concept of a non-usable task register.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
---
v2:
 * Brackets around &
---
 xen/arch/x86/x86_emulate/x86_emulate.c | 196 +++++++++++++++++----------------
 1 file changed, 100 insertions(+), 96 deletions(-)

diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index 9c28ed4..6a653f9 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -457,14 +457,20 @@ typedef union {
 #define EXC_BR  5
 #define EXC_UD  6
 #define EXC_NM  7
+#define EXC_DF  8
 #define EXC_TS 10
 #define EXC_NP 11
 #define EXC_SS 12
 #define EXC_GP 13
 #define EXC_PF 14
 #define EXC_MF 16
+#define EXC_AC 17
 #define EXC_XM 19
 
+#define EXC_HAS_EC                                                      \
+    ((1u << EXC_DF) | (1u << EXC_TS) | (1u << EXC_NP) |                 \
+     (1u << EXC_SS) | (1u << EXC_GP) | (1u << EXC_PF) | (1u << EXC_AC))
+
 /* Segment selector error code bits. */
 #define ECODE_EXT (1 << 0)
 #define ECODE_IDT (1 << 1)
@@ -667,14 +673,22 @@ do {                                                    \
     if ( rc ) goto done;                                \
 } while (0)
 
-#define generate_exception_if(p, e, ec)                                   \
+static inline int mkec(uint8_t e, int32_t ec, ...)
+{
+    return (e < 32 && ((1u << e) & EXC_HAS_EC)) ? ec : X86_EVENT_NO_EC;
+}
+
+#define generate_exception_if(p, e, ec...)                                \
 ({  if ( (p) ) {                                                          \
         fail_if(ops->inject_hw_exception == NULL);                        \
-        rc = ops->inject_hw_exception(e, ec, ctxt) ? : X86EMUL_EXCEPTION; \
+        rc = ops->inject_hw_exception(e, mkec(e, ##ec, 0), ctxt)          \
+            ? : X86EMUL_EXCEPTION;                                        \
         goto done;                                                        \
     }                                                                     \
 })
 
+#define generate_exception(e, ec...) generate_exception_if(true, e, ##ec)
+
 /*
  * Given byte has even parity (even number of 1s)? SDM Vol. 1 Sec. 3.4.3.1,
  * "Status Flags": EFLAGS.PF reflects parity of least-sig. byte of result only.
@@ -785,7 +799,7 @@ static int _get_fpu(
                 return rc;
             generate_exception_if(!(cr4 & ((type == X86EMUL_FPU_xmm)
                                            ? CR4_OSFXSR : CR4_OSXSAVE)),
-                                  EXC_UD, -1);
+                                  EXC_UD);
         }
 
         rc = ops->read_cr(0, &cr0, ctxt);
@@ -798,13 +812,13 @@ static int _get_fpu(
         }
         if ( cr0 & CR0_EM )
         {
-            generate_exception_if(type == X86EMUL_FPU_fpu, EXC_NM, -1);
-            generate_exception_if(type == X86EMUL_FPU_mmx, EXC_UD, -1);
-            generate_exception_if(type == X86EMUL_FPU_xmm, EXC_UD, -1);
+            generate_exception_if(type == X86EMUL_FPU_fpu, EXC_NM);
+            generate_exception_if(type == X86EMUL_FPU_mmx, EXC_UD);
+            generate_exception_if(type == X86EMUL_FPU_xmm, EXC_UD);
         }
         generate_exception_if((cr0 & CR0_TS) &&
                               (type != X86EMUL_FPU_wait || (cr0 & CR0_MP)),
-                              EXC_NM, -1);
+                              EXC_NM);
     }
 
  done:
@@ -832,7 +846,7 @@ do {                                                            \
             (_fic)->exn_raised = EXC_UD;                        \
     }                                                           \
     generate_exception_if((_fic)->exn_raised >= 0,              \
-                          (_fic)->exn_raised, -1);              \
+                          (_fic)->exn_raised);                  \
 } while (0)
 
 #define emulate_fpu_insn(_op)                           \
@@ -1167,11 +1181,9 @@ static int ioport_access_check(
     if ( (rc = ops->read_segment(x86_seg_tr, &tr, ctxt)) != 0 )
         return rc;
 
-    /* Ensure that the TSS is valid and has an io-bitmap-offset field. */
-    if ( !tr.attr.fields.p ||
-         ((tr.attr.fields.type & 0xd) != 0x9) ||
-         (tr.limit < 0x67) )
-        goto raise_exception;
+    /* Ensure the TSS has an io-bitmap-offset field. */
+    generate_exception_if(tr.attr.fields.type != 0xb ||
+                          tr.limit < 0x67, EXC_GP, 0);
 
     if ( (rc = read_ulong(x86_seg_none, tr.base + 0x66,
                           &iobmp, 2, ctxt, ops)) )
@@ -1179,21 +1191,16 @@ static int ioport_access_check(
 
     /* Ensure TSS includes two bytes including byte containing first port. */
     iobmp += first_port / 8;
-    if ( tr.limit <= iobmp )
-        goto raise_exception;
+    generate_exception_if(tr.limit <= iobmp, EXC_GP, 0);
 
     if ( (rc = read_ulong(x86_seg_none, tr.base + iobmp,
                           &iobmp, 2, ctxt, ops)) )
         return rc;
-    if ( (iobmp & (((1<<bytes)-1) << (first_port&7))) != 0 )
-        goto raise_exception;
+    generate_exception_if(iobmp & (((1 << bytes) - 1) << (first_port & 7)),
+                          EXC_GP, 0);
 
  done:
     return rc;
-
- raise_exception:
-    fail_if(ops->inject_hw_exception == NULL);
-    return ops->inject_hw_exception(EXC_GP, 0, ctxt) ? : X86EMUL_EXCEPTION;
 }
 
 static bool_t
@@ -1262,7 +1269,7 @@ static bool_t vcpu_has(
 #define vcpu_has_rtm()   vcpu_has(0x00000007, EBX, 11, ctxt, ops)
 
 #define vcpu_must_have(leaf, reg, bit) \
-    generate_exception_if(!vcpu_has(leaf, reg, bit, ctxt, ops), EXC_UD, -1)
+    generate_exception_if(!vcpu_has(leaf, reg, bit, ctxt, ops), EXC_UD)
 #define vcpu_must_have_fpu()  vcpu_must_have(0x00000001, EDX, 0)
 #define vcpu_must_have_cmov() vcpu_must_have(0x00000001, EDX, 15)
 #define vcpu_must_have_mmx()  vcpu_must_have(0x00000001, EDX, 23)
@@ -1282,7 +1289,7 @@ static bool_t vcpu_has(
  * the actual operation.
  */
 #define host_and_vcpu_must_have(feat) ({ \
-    generate_exception_if(!cpu_has_##feat, EXC_UD, -1); \
+    generate_exception_if(!cpu_has_##feat, EXC_UD); \
     vcpu_must_have_##feat(); \
 })
 #else
@@ -1485,11 +1492,9 @@ protmode_load_seg(
     return X86EMUL_OKAY;
 
  raise_exn:
-    if ( ops->inject_hw_exception == NULL )
-        return X86EMUL_UNHANDLEABLE;
-    if ( (rc = ops->inject_hw_exception(fault_type, sel & 0xfffc, ctxt)) )
-        return rc;
-    return X86EMUL_EXCEPTION;
+    generate_exception(fault_type, sel & 0xfffc);
+ done:
+    return rc;
 }
 
 static int
@@ -1704,7 +1709,7 @@ static int inject_swint(enum x86_swint_type type,
     return rc;
 
  raise_exn:
-    return ops->inject_hw_exception(fault_type, error_code, ctxt);
+    generate_exception(fault_type, error_code);
 }
 
 int x86emul_unhandleable_rw(
@@ -1795,7 +1800,7 @@ x86_decode_onebyte(
 
     case 0x9a: /* call (far, absolute) */
     case 0xea: /* jmp (far, absolute) */
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
 
         imm1 = insn_fetch_bytes(op_bytes);
         imm2 = insn_fetch_type(uint16_t);
@@ -2023,7 +2028,7 @@ x86_decode(
                 /* fall through */
             case 8:
                 /* VEX / XOP / EVEX */
-                generate_exception_if(rex_prefix || vex.pfx, EXC_UD, -1);
+                generate_exception_if(rex_prefix || vex.pfx, EXC_UD);
 
                 vex.raw[0] = modrm;
                 if ( b == 0xc5 )
@@ -2513,12 +2518,12 @@ x86_emulate(
             (ext != ext_0f ||
              (((b < 0x20) || (b > 0x23)) && /* MOV CRn/DRn */
               (b != 0xc7))),                /* CMPXCHG{8,16}B */
-            EXC_UD, -1);
+            EXC_UD);
         dst.type = OP_NONE;
         break;
 
     case DstReg:
-        generate_exception_if(lock_prefix, EXC_UD, -1);
+        generate_exception_if(lock_prefix, EXC_UD);
         dst.type = OP_REG;
         if ( d & ByteOp )
         {
@@ -2574,7 +2579,7 @@ x86_emulate(
         dst = ea;
         if ( dst.type == OP_REG )
         {
-            generate_exception_if(lock_prefix, EXC_UD, -1);
+            generate_exception_if(lock_prefix, EXC_UD);
             switch ( dst.bytes )
             {
             case 1: dst.val = *(uint8_t  *)dst.reg; break;
@@ -2591,7 +2596,7 @@ x86_emulate(
             dst.orig_val = dst.val;
         }
         else /* Lock prefix is allowed only on RMW instructions. */
-            generate_exception_if(lock_prefix, EXC_UD, -1);
+            generate_exception_if(lock_prefix, EXC_UD);
         break;
     }
 
@@ -2629,7 +2634,7 @@ x86_emulate(
         break;
 
     case 0x38 ... 0x3d: cmp: /* cmp */
-        generate_exception_if(lock_prefix, EXC_UD, -1);
+        generate_exception_if(lock_prefix, EXC_UD);
         emulate_2op_SrcV("cmp", src, dst, _regs.eflags);
         dst.type = OP_NONE;
         break;
@@ -2637,7 +2642,7 @@ x86_emulate(
     case 0x06: /* push %%es */
         src.val = x86_seg_es;
     push_seg:
-        generate_exception_if(mode_64bit() && !ext, EXC_UD, -1);
+        generate_exception_if(mode_64bit() && !ext, EXC_UD);
         fail_if(ops->read_segment == NULL);
         if ( (rc = ops->read_segment(src.val, &sreg, ctxt)) != 0 )
             goto done;
@@ -2647,7 +2652,7 @@ x86_emulate(
     case 0x07: /* pop %%es */
         src.val = x86_seg_es;
     pop_seg:
-        generate_exception_if(mode_64bit() && !ext, EXC_UD, -1);
+        generate_exception_if(mode_64bit() && !ext, EXC_UD);
         fail_if(ops->write_segment == NULL);
         /* 64-bit mode: POP defaults to a 64-bit operand. */
         if ( mode_64bit() && (op_bytes == 4) )
@@ -2684,7 +2689,7 @@ x86_emulate(
         uint8_t al = _regs.eax;
         unsigned long eflags = _regs.eflags;
 
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         _regs.eflags &= ~(EFLG_CF|EFLG_AF|EFLG_SF|EFLG_ZF|EFLG_PF);
         if ( ((al & 0x0f) > 9) || (eflags & EFLG_AF) )
         {
@@ -2706,7 +2711,7 @@ x86_emulate(
 
     case 0x37: /* aaa */
     case 0x3f: /* aas */
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         _regs.eflags &= ~EFLG_CF;
         if ( ((uint8_t)_regs.eax > 9) || (_regs.eflags & EFLG_AF) )
         {
@@ -2750,7 +2755,7 @@ x86_emulate(
         unsigned long regs[] = {
             _regs.eax, _regs.ecx, _regs.edx, _regs.ebx,
             _regs.esp, _regs.ebp, _regs.esi, _regs.edi };
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         for ( i = 0; i < 8; i++ )
             if ( (rc = ops->write(x86_seg_ss, sp_pre_dec(op_bytes),
                                   &regs[i], op_bytes, ctxt)) != 0 )
@@ -2765,7 +2770,7 @@ x86_emulate(
             (unsigned long *)&_regs.ebp, (unsigned long *)&dummy_esp,
             (unsigned long *)&_regs.ebx, (unsigned long *)&_regs.edx,
             (unsigned long *)&_regs.ecx, (unsigned long *)&_regs.eax };
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         for ( i = 0; i < 8; i++ )
         {
             if ( (rc = read_ulong(x86_seg_ss, sp_post_inc(op_bytes),
@@ -2783,14 +2788,14 @@ x86_emulate(
         unsigned long src_val2;
         int lb, ub, idx;
         generate_exception_if(mode_64bit() || (src.type != OP_MEM),
-                              EXC_UD, -1);
+                              EXC_UD);
         if ( (rc = read_ulong(src.mem.seg, src.mem.off + op_bytes,
                               &src_val2, op_bytes, ctxt, ops)) )
             goto done;
         ub  = (op_bytes == 2) ? (int16_t)src_val2 : (int32_t)src_val2;
         lb  = (op_bytes == 2) ? (int16_t)src.val  : (int32_t)src.val;
         idx = (op_bytes == 2) ? (int16_t)dst.val  : (int32_t)dst.val;
-        generate_exception_if((idx < lb) || (idx > ub), EXC_BR, -1);
+        generate_exception_if((idx < lb) || (idx > ub), EXC_BR);
         dst.type = OP_NONE;
         break;
     }
@@ -2828,7 +2833,7 @@ x86_emulate(
                 _regs.eflags &= ~EFLG_ZF;
                 dst.type = OP_NONE;
             }
-            generate_exception_if(!in_protmode(ctxt, ops), EXC_UD, -1);
+            generate_exception_if(!in_protmode(ctxt, ops), EXC_UD);
         }
         break;
 
@@ -2919,7 +2924,7 @@ x86_emulate(
         break;
 
     case 0x82: /* Grp1 (x86/32 only) */
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
     case 0x80: case 0x81: case 0x83: /* Grp1 */
         switch ( modrm_reg & 7 )
         {
@@ -2970,7 +2975,7 @@ x86_emulate(
             dst.type = OP_NONE;
             break;
         }
-        generate_exception_if((modrm_reg & 7) != 0, EXC_UD, -1);
+        generate_exception_if((modrm_reg & 7) != 0, EXC_UD);
     case 0x88 ... 0x8b: /* mov */
     case 0xa0 ... 0xa1: /* mov mem.offs,{%al,%ax,%eax,%rax} */
     case 0xa2 ... 0xa3: /* mov {%al,%ax,%eax,%rax},mem.offs */
@@ -2979,7 +2984,7 @@ x86_emulate(
 
     case 0x8c: /* mov Sreg,r/m */
         seg = modrm_reg & 7; /* REX.R is ignored. */
-        generate_exception_if(!is_x86_user_segment(seg), EXC_UD, -1);
+        generate_exception_if(!is_x86_user_segment(seg), EXC_UD);
     store_selector:
         fail_if(ops->read_segment == NULL);
         if ( (rc = ops->read_segment(seg, &sreg, ctxt)) != 0 )
@@ -2992,7 +2997,7 @@ x86_emulate(
     case 0x8e: /* mov r/m,Sreg */
         seg = modrm_reg & 7; /* REX.R is ignored. */
         generate_exception_if(!is_x86_user_segment(seg) ||
-                              seg == x86_seg_cs, EXC_UD, -1);
+                              seg == x86_seg_cs, EXC_UD);
         if ( (rc = load_seg(seg, src.val, 0, NULL, ctxt, ops)) != 0 )
             goto done;
         if ( seg == x86_seg_ss )
@@ -3001,12 +3006,12 @@ x86_emulate(
         break;
 
     case 0x8d: /* lea */
-        generate_exception_if(ea.type != OP_MEM, EXC_UD, -1);
+        generate_exception_if(ea.type != OP_MEM, EXC_UD);
         dst.val = ea.mem.off;
         break;
 
     case 0x8f: /* pop (sole member of Grp1a) */
-        generate_exception_if((modrm_reg & 7) != 0, EXC_UD, -1);
+        generate_exception_if((modrm_reg & 7) != 0, EXC_UD);
         /* 64-bit mode: POP defaults to a 64-bit operand. */
         if ( mode_64bit() && (dst.bytes == 4) )
             dst.bytes = 8;
@@ -3283,8 +3288,8 @@ x86_emulate(
         unsigned long sel;
         dst.val = x86_seg_es;
     les: /* dst.val identifies the segment */
-        generate_exception_if(mode_64bit() && !ext, EXC_UD, -1);
-        generate_exception_if(src.type != OP_MEM, EXC_UD, -1);
+        generate_exception_if(mode_64bit() && !ext, EXC_UD);
+        generate_exception_if(src.type != OP_MEM, EXC_UD);
         if ( (rc = read_ulong(src.mem.seg, src.mem.off + src.bytes,
                               &sel, 2, ctxt, ops)) != 0 )
             goto done;
@@ -3374,7 +3379,7 @@ x86_emulate(
         goto done;
 
     case 0xce: /* into */
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         if ( !(_regs.eflags & EFLG_OF) )
             break;
         src.val = EXC_OF;
@@ -3416,7 +3421,7 @@ x86_emulate(
     case 0xd5: /* aad */ {
         unsigned int base = (uint8_t)src.val;
 
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         if ( b & 0x01 )
         {
             uint16_t ax = _regs.eax;
@@ -3427,7 +3432,7 @@ x86_emulate(
         {
             uint8_t al = _regs.eax;
 
-            generate_exception_if(!base, EXC_DE, -1);
+            generate_exception_if(!base, EXC_DE);
             *(uint16_t *)&_regs.eax = ((al / base) << 8) | (al % base);
         }
         _regs.eflags &= ~(EFLG_SF|EFLG_ZF|EFLG_PF);
@@ -3438,7 +3443,7 @@ x86_emulate(
     }
 
     case 0xd6: /* salc */
-        generate_exception_if(mode_64bit(), EXC_UD, -1);
+        generate_exception_if(mode_64bit(), EXC_UD);
         *(uint8_t *)&_regs.eax = (_regs.eflags & EFLG_CF) ? 0xff : 0x00;
         break;
 
@@ -4046,7 +4051,7 @@ x86_emulate(
             unsigned long u[2], v;
 
         case 0 ... 1: /* test */
-            generate_exception_if(lock_prefix, EXC_UD, -1);
+            generate_exception_if(lock_prefix, EXC_UD);
             goto test;
         case 2: /* not */
             dst.val = ~dst.val;
@@ -4144,7 +4149,7 @@ x86_emulate(
                 v    = (uint8_t)src.val;
                 generate_exception_if(
                     div_dbl(u, v) || ((uint8_t)u[0] != (uint16_t)u[0]),
-                    EXC_DE, -1);
+                    EXC_DE);
                 dst.val = (uint8_t)u[0];
                 ((uint8_t *)&_regs.eax)[1] = u[1];
                 break;
@@ -4154,7 +4159,7 @@ x86_emulate(
                 v    = (uint16_t)src.val;
                 generate_exception_if(
                     div_dbl(u, v) || ((uint16_t)u[0] != (uint32_t)u[0]),
-                    EXC_DE, -1);
+                    EXC_DE);
                 dst.val = (uint16_t)u[0];
                 *(uint16_t *)&_regs.edx = u[1];
                 break;
@@ -4165,7 +4170,7 @@ x86_emulate(
                 v    = (uint32_t)src.val;
                 generate_exception_if(
                     div_dbl(u, v) || ((uint32_t)u[0] != u[0]),
-                    EXC_DE, -1);
+                    EXC_DE);
                 dst.val   = (uint32_t)u[0];
                 _regs.edx = (uint32_t)u[1];
                 break;
@@ -4174,7 +4179,7 @@ x86_emulate(
                 u[0] = _regs.eax;
                 u[1] = _regs.edx;
                 v    = src.val;
-                generate_exception_if(div_dbl(u, v), EXC_DE, -1);
+                generate_exception_if(div_dbl(u, v), EXC_DE);
                 dst.val   = u[0];
                 _regs.edx = u[1];
                 break;
@@ -4190,7 +4195,7 @@ x86_emulate(
                 v    = (int8_t)src.val;
                 generate_exception_if(
                     idiv_dbl(u, v) || ((int8_t)u[0] != (int16_t)u[0]),
-                    EXC_DE, -1);
+                    EXC_DE);
                 dst.val = (int8_t)u[0];
                 ((int8_t *)&_regs.eax)[1] = u[1];
                 break;
@@ -4200,7 +4205,7 @@ x86_emulate(
                 v    = (int16_t)src.val;
                 generate_exception_if(
                     idiv_dbl(u, v) || ((int16_t)u[0] != (int32_t)u[0]),
-                    EXC_DE, -1);
+                    EXC_DE);
                 dst.val = (int16_t)u[0];
                 *(int16_t *)&_regs.edx = u[1];
                 break;
@@ -4211,7 +4216,7 @@ x86_emulate(
                 v    = (int32_t)src.val;
                 generate_exception_if(
                     idiv_dbl(u, v) || ((int32_t)u[0] != u[0]),
-                    EXC_DE, -1);
+                    EXC_DE);
                 dst.val   = (int32_t)u[0];
                 _regs.edx = (uint32_t)u[1];
                 break;
@@ -4220,7 +4225,7 @@ x86_emulate(
                 u[0] = _regs.eax;
                 u[1] = _regs.edx;
                 v    = src.val;
-                generate_exception_if(idiv_dbl(u, v), EXC_DE, -1);
+                generate_exception_if(idiv_dbl(u, v), EXC_DE);
                 dst.val   = u[0];
                 _regs.edx = u[1];
                 break;
@@ -4260,7 +4265,7 @@ x86_emulate(
         break;
 
     case 0xfe: /* Grp4 */
-        generate_exception_if((modrm_reg & 7) >= 2, EXC_UD, -1);
+        generate_exception_if((modrm_reg & 7) >= 2, EXC_UD);
     case 0xff: /* Grp5 */
         switch ( modrm_reg & 7 )
         {
@@ -4285,7 +4290,7 @@ x86_emulate(
             break;
         case 3: /* call (far, absolute indirect) */
         case 5: /* jmp (far, absolute indirect) */
-            generate_exception_if(src.type != OP_MEM, EXC_UD, -1);
+            generate_exception_if(src.type != OP_MEM, EXC_UD);
 
             if ( (rc = read_ulong(src.mem.seg, src.mem.off + op_bytes,
                                   &imm2, 2, ctxt, ops)) )
@@ -4297,13 +4302,13 @@ x86_emulate(
         case 6: /* push */
             goto push;
         case 7:
-            generate_exception_if(1, EXC_UD, -1);
+            generate_exception(EXC_UD);
         }
         break;
 
     case X86EMUL_OPC(0x0f, 0x00): /* Grp6 */
         seg = (modrm_reg & 1) ? x86_seg_tr : x86_seg_ldtr;
-        generate_exception_if(!in_protmode(ctxt, ops), EXC_UD, -1);
+        generate_exception_if(!in_protmode(ctxt, ops), EXC_UD);
         switch ( modrm_reg & 6 )
         {
         case 0: /* sldt / str */
@@ -4315,7 +4320,7 @@ x86_emulate(
                 goto done;
             break;
         default:
-            generate_exception_if(true, EXC_UD, -1);
+            generate_exception_if(true, EXC_UD);
             break;
         }
         break;
@@ -4330,10 +4335,10 @@ x86_emulate(
         {
             unsigned long cr4;
 
-            generate_exception_if(vex.pfx, EXC_UD, -1);
+            generate_exception_if(vex.pfx, EXC_UD);
             if ( !ops->read_cr || ops->read_cr(4, &cr4, ctxt) != X86EMUL_OKAY )
                 cr4 = 0;
-            generate_exception_if(!(cr4 & X86_CR4_OSXSAVE), EXC_UD, -1);
+            generate_exception_if(!(cr4 & X86_CR4_OSXSAVE), EXC_UD);
             generate_exception_if(!mode_ring0() ||
                                   handle_xsetbv(_regs._ecx,
                                                 _regs._eax | (_regs.rdx << 32)),
@@ -4344,28 +4349,28 @@ x86_emulate(
 
         case 0xd4: /* vmfunc */
             generate_exception_if(lock_prefix | rep_prefix() | (vex.pfx == vex_66),
-                                  EXC_UD, -1);
+                                  EXC_UD);
             fail_if(!ops->vmfunc);
             if ( (rc = ops->vmfunc(ctxt) != X86EMUL_OKAY) )
                 goto done;
             goto no_writeback;
 
         case 0xd5: /* xend */
-            generate_exception_if(vex.pfx, EXC_UD, -1);
-            generate_exception_if(!vcpu_has_rtm(), EXC_UD, -1);
+            generate_exception_if(vex.pfx, EXC_UD);
+            generate_exception_if(!vcpu_has_rtm(), EXC_UD);
             generate_exception_if(vcpu_has_rtm(), EXC_GP, 0);
             break;
 
         case 0xd6: /* xtest */
-            generate_exception_if(vex.pfx, EXC_UD, -1);
+            generate_exception_if(vex.pfx, EXC_UD);
             generate_exception_if(!vcpu_has_rtm() && !vcpu_has_hle(),
-                                  EXC_UD, -1);
+                                  EXC_UD);
             /* Neither HLE nor RTM can be active when we get here. */
             _regs.eflags |= EFLG_ZF;
             goto no_writeback;
 
         case 0xdf: /* invlpga */
-            generate_exception_if(!in_protmode(ctxt, ops), EXC_UD, -1);
+            generate_exception_if(!in_protmode(ctxt, ops), EXC_UD);
             generate_exception_if(!mode_ring0(), EXC_GP, 0);
             fail_if(ops->invlpg == NULL);
             if ( (rc = ops->invlpg(x86_seg_none, truncate_ea(_regs.eax),
@@ -4395,7 +4400,7 @@ x86_emulate(
                  ops->cpuid(&eax, &ebx, &dummy, &dummy, ctxt) == X86EMUL_OKAY )
                 limit = ((ebx >> 8) & 0xff) * 8;
             generate_exception_if(limit < sizeof(long) ||
-                                  (limit & (limit - 1)), EXC_UD, -1);
+                                  (limit & (limit - 1)), EXC_UD);
             base &= ~(limit - 1);
             if ( override_seg == -1 )
                 override_seg = x86_seg_ds;
@@ -4431,7 +4436,7 @@ x86_emulate(
         {
         case 0: /* sgdt */
         case 1: /* sidt */
-            generate_exception_if(ea.type != OP_MEM, EXC_UD, -1);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             generate_exception_if(umip_active(ctxt, ops), EXC_GP, 0);
             fail_if(ops->read_segment == NULL);
             if ( (rc = ops->read_segment(seg, &sreg, ctxt)) )
@@ -4452,7 +4457,7 @@ x86_emulate(
         case 2: /* lgdt */
         case 3: /* lidt */
             generate_exception_if(!mode_ring0(), EXC_GP, 0);
-            generate_exception_if(ea.type != OP_MEM, EXC_UD, -1);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             fail_if(ops->write_segment == NULL);
             memset(&sreg, 0, sizeof(sreg));
             if ( (rc = read_ulong(ea.mem.seg, ea.mem.off+0,
@@ -4495,7 +4500,7 @@ x86_emulate(
             break;
         case 7: /* invlpg */
             generate_exception_if(!mode_ring0(), EXC_GP, 0);
-            generate_exception_if(ea.type != OP_MEM, EXC_UD, -1);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             fail_if(ops->invlpg == NULL);
             if ( (rc = ops->invlpg(ea.mem.seg, ea.mem.off, ctxt)) )
                 goto done;
@@ -4509,13 +4514,13 @@ x86_emulate(
     case X86EMUL_OPC(0x0f, 0x05): /* syscall */ {
         uint64_t msr_content;
 
-        generate_exception_if(!in_protmode(ctxt, ops), EXC_UD, -1);
+        generate_exception_if(!in_protmode(ctxt, ops), EXC_UD);
 
         /* Inject #UD if syscall/sysret are disabled. */
         fail_if(ops->read_msr == NULL);
         if ( (rc = ops->read_msr(MSR_EFER, &msr_content, ctxt)) != 0 )
             goto done;
-        generate_exception_if((msr_content & EFER_SCE) == 0, EXC_UD, -1);
+        generate_exception_if((msr_content & EFER_SCE) == 0, EXC_UD);
 
         if ( (rc = ops->read_msr(MSR_STAR, &msr_content, ctxt)) != 0 )
             goto done;
@@ -4584,7 +4589,7 @@ x86_emulate(
     case X86EMUL_OPC(0x0f, 0x0b): /* ud2 */
     case X86EMUL_OPC(0x0f, 0xb9): /* ud1 */
     case X86EMUL_OPC(0x0f, 0xff): /* ud0 */
-        generate_exception_if(1, EXC_UD, -1);
+        generate_exception(EXC_UD);
 
     case X86EMUL_OPC(0x0f, 0x0d): /* GrpP (prefetch) */
     case X86EMUL_OPC(0x0f, 0x18): /* Grp16 (prefetch/nop) */
@@ -4703,7 +4708,7 @@ x86_emulate(
     case X86EMUL_OPC(0x0f, 0x21): /* mov dr,reg */
     case X86EMUL_OPC(0x0f, 0x22): /* mov reg,cr */
     case X86EMUL_OPC(0x0f, 0x23): /* mov reg,dr */
-        generate_exception_if(ea.type != OP_REG, EXC_UD, -1);
+        generate_exception_if(ea.type != OP_REG, EXC_UD);
         generate_exception_if(!mode_ring0(), EXC_GP, 0);
         modrm_reg |= lock_prefix << 3;
         if ( b & 2 )
@@ -4939,11 +4944,11 @@ x86_emulate(
         switch ( b )
         {
         case 0x7e:
-            generate_exception_if(vex.l, EXC_UD, -1);
+            generate_exception_if(vex.l, EXC_UD);
             ea.bytes = op_bytes;
             break;
         case 0xd6:
-            generate_exception_if(vex.l, EXC_UD, -1);
+            generate_exception_if(vex.l, EXC_UD);
             ea.bytes = 8;
             break;
         }
@@ -5033,7 +5038,7 @@ x86_emulate(
     case X86EMUL_OPC(0x0f, 0xad): /* shrd %%cl,r,r/m */ {
         uint8_t shift, width = dst.bytes << 3;
 
-        generate_exception_if(lock_prefix, EXC_UD, -1);
+        generate_exception_if(lock_prefix, EXC_UD);
         if ( b & 1 )
             shift = _regs.ecx;
         else
@@ -5148,7 +5153,7 @@ x86_emulate(
         case 5: goto bts;
         case 6: goto btr;
         case 7: goto btc;
-        default: generate_exception_if(1, EXC_UD, -1);
+        default: generate_exception(EXC_UD);
         }
         break;
 
@@ -5249,15 +5254,15 @@ x86_emulate(
     case X86EMUL_OPC(0x0f, 0xc3): /* movnti */
         /* Ignore the non-temporal hint for now. */
         vcpu_must_have_sse2();
-        generate_exception_if(dst.bytes <= 2, EXC_UD, -1);
+        generate_exception_if(dst.bytes <= 2, EXC_UD);
         dst.val = src.val;
         break;
 
     case X86EMUL_OPC(0x0f, 0xc7): /* Grp9 (cmpxchg8b/cmpxchg16b) */ {
         unsigned long old[2], exp[2], new[2];
 
-        generate_exception_if((modrm_reg & 7) != 1, EXC_UD, -1);
-        generate_exception_if(ea.type != OP_MEM, EXC_UD, -1);
+        generate_exception_if((modrm_reg & 7) != 1, EXC_UD);
+        generate_exception_if(ea.type != OP_MEM, EXC_UD);
         if ( op_bytes == 8 )
             host_and_vcpu_must_have(cx16);
         op_bytes *= 2;
@@ -5414,8 +5419,7 @@ x86_emulate(
     *ctxt->regs = _regs;
 
     /* Inject #DB if single-step tracing was enabled at instruction start. */
-    if ( tf && (rc == X86EMUL_OKAY) && ops->inject_hw_exception )
-        rc = ops->inject_hw_exception(EXC_DB, -1, ctxt) ? : X86EMUL_EXCEPTION;
+    generate_exception_if(tf && (rc == X86EMUL_OKAY), EXC_DB);
 
  done:
     _put_fpu();
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 08/19] x86/emul: Rework emulator event injection
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (6 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 07/19] x86/emul: Remove opencoded exception generation Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 12:04   ` Tim Deegan
  2016-11-28 11:13 ` [PATCH v2 09/19] x86/vmx: Use hvm_{get, set}_segment_register() rather than vmx_{get, set}_segment_register() Andrew Cooper
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, George Dunlap, Andrew Cooper, Tim Deegan,
	Paul Durrant, Jun Nakajima, Suravee Suthikulpanit

The emulator needs to gain an understanding of interrupts and exceptions
generated by its actions.

Move hvm_emulate_ctxt.{exn_pending,trap} into struct x86_emulate_ctxt so they
are visible to the emulator.  This removes the need for the
inject_{hw_exception,sw_interrupt}() hooks, which are dropped and replaced
with x86_emul_{hw_exception,software_event,reset_event}() instead.

For exceptions raised by x86_emulate() itself (rather than its callbacks), the
shadow pagetable and PV uses of x86_emulate() previously failed with
X86EMUL_UNHANDLEABLE due to the lack of inject_*() hooks.

This behaviour has changed, and such cases will now return X86EMUL_EXCEPTION
with event_pending set.  Until the callers of x86_emulate() have been updated
to inject events back into the guest, divert the event_pending case back into
the X86EMUL_UNHANDLEABLE path to maintain the same guest-visible behaviour.

No overall functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: Tim Deegan <tim@xen.org>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

v2:
 * Change x86_emul_hw_exception()'s error_code parameter to being signed
 * Clarify how software interrupt injection happens.
 * More ASSERT()'s and description of how event_pending works without the
   inject_sw_interrupt() hook
---
 tools/tests/x86_emulator/test_x86_emulator.c |  1 +
 xen/arch/x86/hvm/emulate.c                   | 94 +++++++--------------------
 xen/arch/x86/hvm/hvm.c                       |  4 +-
 xen/arch/x86/hvm/io.c                        |  4 +-
 xen/arch/x86/hvm/vmx/realmode.c              | 16 ++---
 xen/arch/x86/mm.c                            | 31 ++++++++-
 xen/arch/x86/mm/shadow/multi.c               | 28 +++++++-
 xen/arch/x86/x86_emulate/x86_emulate.c       | 12 ++--
 xen/arch/x86/x86_emulate/x86_emulate.h       | 95 +++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/emulate.h            |  3 -
 10 files changed, 175 insertions(+), 113 deletions(-)

diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index f255fef..b54fd11 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -1,3 +1,4 @@
+#include <assert.h>
 #include <errno.h>
 #include <limits.h>
 #include <stdbool.h>
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index bc259ec..7745c5b 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -568,12 +568,9 @@ static int hvmemul_virtual_to_linear(
         return X86EMUL_UNHANDLEABLE;
 
     /* This is a singleton operation: fail it with an exception. */
-    hvmemul_ctxt->exn_pending = 1;
-    hvmemul_ctxt->trap.vector =
-        (seg == x86_seg_ss) ? TRAP_stack_error : TRAP_gp_fault;
-    hvmemul_ctxt->trap.type = X86_EVENTTYPE_HW_EXCEPTION;
-    hvmemul_ctxt->trap.error_code = 0;
-    hvmemul_ctxt->trap.insn_len = 0;
+    x86_emul_hw_exception((seg == x86_seg_ss)
+                          ? TRAP_stack_error
+                          : TRAP_gp_fault, 0, &hvmemul_ctxt->ctxt);
     return X86EMUL_EXCEPTION;
 }
 
@@ -1562,59 +1559,6 @@ int hvmemul_cpuid(
     return X86EMUL_OKAY;
 }
 
-static int hvmemul_inject_hw_exception(
-    uint8_t vector,
-    int32_t error_code,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct hvm_emulate_ctxt *hvmemul_ctxt =
-        container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
-
-    hvmemul_ctxt->exn_pending = 1;
-    hvmemul_ctxt->trap.vector = vector;
-    hvmemul_ctxt->trap.type = X86_EVENTTYPE_HW_EXCEPTION;
-    hvmemul_ctxt->trap.error_code = error_code;
-    hvmemul_ctxt->trap.insn_len = 0;
-
-    return X86EMUL_OKAY;
-}
-
-static int hvmemul_inject_sw_interrupt(
-    enum x86_swint_type type,
-    uint8_t vector,
-    uint8_t insn_len,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct hvm_emulate_ctxt *hvmemul_ctxt =
-        container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
-
-    switch ( type )
-    {
-    case x86_swint_icebp:
-        hvmemul_ctxt->trap.type = X86_EVENTTYPE_PRI_SW_EXCEPTION;
-        break;
-
-    case x86_swint_int3:
-    case x86_swint_into:
-        hvmemul_ctxt->trap.type = X86_EVENTTYPE_SW_EXCEPTION;
-        break;
-
-    case x86_swint_int:
-        hvmemul_ctxt->trap.type = X86_EVENTTYPE_SW_INTERRUPT;
-        break;
-
-    default:
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    hvmemul_ctxt->exn_pending = 1;
-    hvmemul_ctxt->trap.vector = vector;
-    hvmemul_ctxt->trap.error_code = X86_EVENT_NO_EC;
-    hvmemul_ctxt->trap.insn_len = insn_len;
-
-    return X86EMUL_OKAY;
-}
-
 static int hvmemul_get_fpu(
     void (*exception_callback)(void *, struct cpu_user_regs *),
     void *exception_callback_arg,
@@ -1678,8 +1622,7 @@ static int hvmemul_invlpg(
          * hvmemul_virtual_to_linear() raises exceptions for type/limit
          * violations, so squash them.
          */
-        hvmemul_ctxt->exn_pending = 0;
-        hvmemul_ctxt->trap = (struct x86_event){};
+        x86_emul_reset_event(ctxt);
         rc = X86EMUL_OKAY;
     }
 
@@ -1696,7 +1639,7 @@ static int hvmemul_vmfunc(
 
     rc = hvm_funcs.altp2m_vcpu_emulate_vmfunc(ctxt->regs);
     if ( rc != X86EMUL_OKAY )
-        hvmemul_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC, ctxt);
+        x86_emul_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC, ctxt);
 
     return rc;
 }
@@ -1720,8 +1663,6 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
     .write_msr     = hvmemul_write_msr,
     .wbinvd        = hvmemul_wbinvd,
     .cpuid         = hvmemul_cpuid,
-    .inject_hw_exception = hvmemul_inject_hw_exception,
-    .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
     .get_fpu       = hvmemul_get_fpu,
     .put_fpu       = hvmemul_put_fpu,
     .invlpg        = hvmemul_invlpg,
@@ -1747,8 +1688,6 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
     .write_msr     = hvmemul_write_msr_discard,
     .wbinvd        = hvmemul_wbinvd_discard,
     .cpuid         = hvmemul_cpuid,
-    .inject_hw_exception = hvmemul_inject_hw_exception,
-    .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
     .get_fpu       = hvmemul_get_fpu,
     .put_fpu       = hvmemul_put_fpu,
     .invlpg        = hvmemul_invlpg,
@@ -1771,6 +1710,19 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
 
     rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
 
+    /*
+     * TODO: Make this true:
+     *
+    ASSERT(hvmemul_ctxt->ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
+     *
+     * Some codepaths still raise exceptions behind the back of the
+     * emulator. (i.e. return X86EMUL_EXCEPTION but without
+     * event_pending being set).  In the meantime, use a slightly
+     * relaxed check...
+     */
+    if ( hvmemul_ctxt->ctxt.event_pending )
+        ASSERT(rc == X86EMUL_EXCEPTION);
+
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
         rc = X86EMUL_RETRY;
     if ( rc != X86EMUL_RETRY )
@@ -1867,8 +1819,8 @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
         hvm_dump_emulation_state(XENLOG_G_WARNING "MMCFG", &ctxt);
         break;
     case X86EMUL_EXCEPTION:
-        if ( ctxt.exn_pending )
-            hvm_inject_event(&ctxt.trap);
+        if ( ctxt.ctxt.event_pending )
+            hvm_inject_event(&ctxt.ctxt.event);
         /* fallthrough */
     default:
         hvm_emulate_writeback(&ctxt);
@@ -1927,8 +1879,8 @@ void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr,
         hvm_inject_hw_exception(trapnr, errcode);
         break;
     case X86EMUL_EXCEPTION:
-        if ( ctx.exn_pending )
-            hvm_inject_event(&ctx.trap);
+        if ( ctx.ctxt.event_pending )
+            hvm_inject_event(&ctx.ctxt.event);
         break;
     }
 
@@ -2003,8 +1955,6 @@ void hvm_emulate_init_per_insn(
         hvmemul_ctxt->insn_buf_bytes = insn_bytes;
         memcpy(hvmemul_ctxt->insn_buf, insn_buf, insn_bytes);
     }
-
-    hvmemul_ctxt->exn_pending = 0;
 }
 
 void hvm_emulate_writeback(
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b950842..ef83100 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4076,8 +4076,8 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
         hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         break;
     case X86EMUL_EXCEPTION:
-        if ( ctxt.exn_pending )
-            hvm_inject_event(&ctxt.trap);
+        if ( ctxt.ctxt.event_pending )
+            hvm_inject_event(&ctxt.ctxt.event);
         /* fall through */
     default:
         hvm_emulate_writeback(&ctxt);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 1279f68..abb9d51 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -102,8 +102,8 @@ int handle_mmio(void)
         hvm_dump_emulation_state(XENLOG_G_WARNING "MMIO", &ctxt);
         return 0;
     case X86EMUL_EXCEPTION:
-        if ( ctxt.exn_pending )
-            hvm_inject_event(&ctxt.trap);
+        if ( ctxt.ctxt.event_pending )
+            hvm_inject_event(&ctxt.ctxt.event);
         break;
     default:
         break;
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 9002638..dc3ab44 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -122,7 +122,7 @@ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
 
     if ( rc == X86EMUL_EXCEPTION )
     {
-        if ( !hvmemul_ctxt->exn_pending )
+        if ( !hvmemul_ctxt->ctxt.event_pending )
         {
             unsigned long intr_info;
 
@@ -133,27 +133,27 @@ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
                 gdprintk(XENLOG_ERR, "Exception pending but no info.\n");
                 goto fail;
             }
-            hvmemul_ctxt->trap.vector = (uint8_t)intr_info;
-            hvmemul_ctxt->trap.insn_len = 0;
+            hvmemul_ctxt->ctxt.event.vector = (uint8_t)intr_info;
+            hvmemul_ctxt->ctxt.event.insn_len = 0;
         }
 
         if ( unlikely(curr->domain->debugger_attached) &&
-             ((hvmemul_ctxt->trap.vector == TRAP_debug) ||
-              (hvmemul_ctxt->trap.vector == TRAP_int3)) )
+             ((hvmemul_ctxt->ctxt.event.vector == TRAP_debug) ||
+              (hvmemul_ctxt->ctxt.event.vector == TRAP_int3)) )
         {
             domain_pause_for_debugger();
         }
         else if ( curr->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE )
         {
             gdprintk(XENLOG_ERR, "Exception %02x in protected mode.\n",
-                     hvmemul_ctxt->trap.vector);
+                     hvmemul_ctxt->ctxt.event.vector);
             goto fail;
         }
         else
         {
             realmode_deliver_exception(
-                hvmemul_ctxt->trap.vector,
-                hvmemul_ctxt->trap.insn_len,
+                hvmemul_ctxt->ctxt.event.vector,
+                hvmemul_ctxt->ctxt.event.insn_len,
                 hvmemul_ctxt);
         }
     }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b7c7122..8a1e7b4 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5379,7 +5379,20 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
     page_unlock(page);
     put_page(page);
 
-    if ( rc == X86EMUL_UNHANDLEABLE )
+    /*
+     * TODO: Make this true:
+     *
+    ASSERT(ptwr_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
+     *
+     * Some codepaths still raise exceptions behind the back of the
+     * emulator. (i.e. return X86EMUL_EXCEPTION but without
+     * event_pending being set).  In the meantime, use a slightly
+     * relaxed check...
+     */
+    if ( ptwr_ctxt.ctxt.event_pending )
+        ASSERT(rc == X86EMUL_EXCEPTION);
+
+    if ( rc == X86EMUL_UNHANDLEABLE || ptwr_ctxt.ctxt.event_pending )
         goto bail;
 
     perfc_incr(ptwr_emulations);
@@ -5503,7 +5516,21 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
     else
         rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
 
-    return rc != X86EMUL_UNHANDLEABLE ? EXCRET_fault_fixed : 0;
+    /*
+     * TODO: Make this true:
+     *
+    ASSERT(ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
+     *
+     * Some codepaths still raise exceptions behind the back of the
+     * emulator. (i.e. return X86EMUL_EXCEPTION but without
+     * event_pending being set).  In the meantime, use a slightly
+     * relaxed check...
+     */
+    if ( ctxt.event_pending )
+        ASSERT(rc == X86EMUL_EXCEPTION);
+
+    return ((rc != X86EMUL_UNHANDLEABLE && !ctxt.event_pending)
+            ? EXCRET_fault_fixed : 0);
 }
 
 void *alloc_xen_pagetable(void)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 9ee48a8..13fa1bf 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3374,11 +3374,23 @@ static int sh_page_fault(struct vcpu *v,
     r = x86_emulate(&emul_ctxt.ctxt, emul_ops);
 
     /*
+     * TODO: Make this true:
+     *
+    ASSERT(emul_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
+     *
+     * Some codepaths still raise exceptions behind the back of the
+     * emulator. (i.e. return X86EMUL_EXCEPTION but without event_pending
+     * being set).  In the meantime, use a slightly relaxed check...
+     */
+    if ( emul_ctxt.ctxt.event_pending )
+        ASSERT(r == X86EMUL_EXCEPTION);
+
+    /*
      * NB. We do not unshadow on X86EMUL_EXCEPTION. It's not clear that it
      * would be a good unshadow hint. If we *do* decide to unshadow-on-fault
      * then it must be 'failable': we cannot require the unshadow to succeed.
      */
-    if ( r == X86EMUL_UNHANDLEABLE )
+    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )
     {
         perfc_incr(shadow_fault_emulate_failed);
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
@@ -3433,6 +3445,20 @@ static int sh_page_fault(struct vcpu *v,
             shadow_continue_emulation(&emul_ctxt, regs);
             v->arch.paging.last_write_was_pt = 0;
             r = x86_emulate(&emul_ctxt.ctxt, emul_ops);
+
+            /*
+             * TODO: Make this true:
+             *
+            ASSERT(emul_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
+             *
+             * Some codepaths still raise exceptions behind the back of the
+             * emulator. (i.e. return X86EMUL_EXCEPTION but without
+             * event_pending being set).  In the meantime, use a slightly
+             * relaxed check...
+             */
+            if ( emul_ctxt.ctxt.event_pending )
+                ASSERT(r == X86EMUL_EXCEPTION);
+
             if ( r == X86EMUL_OKAY )
             {
                 emulation_count++;
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index 6a653f9..fa6fba1 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -680,9 +680,8 @@ static inline int mkec(uint8_t e, int32_t ec, ...)
 
 #define generate_exception_if(p, e, ec...)                                \
 ({  if ( (p) ) {                                                          \
-        fail_if(ops->inject_hw_exception == NULL);                        \
-        rc = ops->inject_hw_exception(e, mkec(e, ##ec, 0), ctxt)          \
-            ? : X86EMUL_EXCEPTION;                                        \
+        x86_emul_hw_exception(e, mkec(e, ##ec, 0), ctxt);                 \
+        rc = X86EMUL_EXCEPTION;                                           \
         goto done;                                                        \
     }                                                                     \
 })
@@ -1604,9 +1603,6 @@ static int inject_swint(enum x86_swint_type type,
 {
     int rc, error_code, fault_type = EXC_GP;
 
-    fail_if(ops->inject_sw_interrupt == NULL);
-    fail_if(ops->inject_hw_exception == NULL);
-
     /*
      * Without hardware support, injecting software interrupts/exceptions is
      * problematic.
@@ -1703,7 +1699,8 @@ static int inject_swint(enum x86_swint_type type,
         ctxt->regs->eip += insn_len;
     }
 
-    rc = ops->inject_sw_interrupt(type, vector, insn_len, ctxt);
+    x86_emul_software_event(type, vector, insn_len, ctxt);
+    rc = X86EMUL_OKAY;
 
  done:
     return rc;
@@ -1911,6 +1908,7 @@ x86_decode(
 
     /* Initialise output state in x86_emulate_ctxt */
     ctxt->retire.byte = 0;
+    x86_emul_reset_event(ctxt);
 
     op_bytes = def_op_bytes = ad_bytes = def_ad_bytes = ctxt->addr_size/8;
     if ( op_bytes == 8 )
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index b0f0304..8019ee1 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -62,9 +62,31 @@ enum x86_swint_type {
 
 /* How much help is required with software event injection? */
 enum x86_swint_emulation {
-    x86_swint_emulate_none, /* Hardware supports all software injection properly */
-    x86_swint_emulate_icebp,/* Help needed with `icebp` (0xf1) */
-    x86_swint_emulate_all,  /* Help needed with all software events */
+    /*
+     * Hardware supports all software injection properly.  All traps exit the
+     * emulator with an unmodified %eip, and a suitable x86_event pending.
+     *
+     * Callers not wishing to deal with trap semantics should chose this
+     * option, as all causes all software traps to have fault semantics
+     * (i.e. no movement of %eip).
+     */
+    x86_swint_emulate_none,
+
+    /*
+     * Help needed with `icebp` (0xf1).  The emulator will emulate injection
+     * of `icebp` only.  If all checks are successful, %eip will be moved
+     * forwards to the end of the instruction, and a suitable x86_event will
+     * be pending.
+     */
+    x86_swint_emulate_icebp,
+
+    /*
+     * Help needed with all software events.  The emulator will emulate
+     * injection of all software events.  If all checks are successful, %eip
+     * will be moved forwards to the end of the instruction, and a suitable
+     * x86_event will be pending.
+     */
+    x86_swint_emulate_all,
 };
 
 /*
@@ -386,19 +408,6 @@ struct x86_emulate_ops
         unsigned int *edx,
         struct x86_emulate_ctxt *ctxt);
 
-    /* inject_hw_exception */
-    int (*inject_hw_exception)(
-        uint8_t vector,
-        int32_t error_code,
-        struct x86_emulate_ctxt *ctxt);
-
-    /* inject_sw_interrupt */
-    int (*inject_sw_interrupt)(
-        enum x86_swint_type type,
-        uint8_t vector,
-        uint8_t insn_len,
-        struct x86_emulate_ctxt *ctxt);
-
     /*
      * get_fpu: Load emulated environment's FPU state onto processor.
      *  @exn_callback: On any FPU or SIMD exception, pass control to
@@ -475,6 +484,9 @@ struct x86_emulate_ctxt
         } flags;
         uint8_t byte;
     } retire;
+
+    bool event_pending;
+    struct x86_event event;
 };
 
 /*
@@ -596,4 +608,55 @@ void x86_emulate_free_state(struct x86_emulate_state *state);
 
 #endif
 
+#ifndef ASSERT
+#define ASSERT assert
+#endif
+
+static inline void x86_emul_hw_exception(
+    unsigned int vector, int error_code, struct x86_emulate_ctxt *ctxt)
+{
+    ASSERT(!ctxt->event_pending);
+
+    ctxt->event.vector = vector;
+    ctxt->event.type = X86_EVENTTYPE_HW_EXCEPTION;
+    ctxt->event.error_code = error_code;
+
+    ctxt->event_pending = true;
+}
+
+static inline void x86_emul_software_event(
+    enum x86_swint_type type, uint8_t vector, uint8_t insn_len,
+    struct x86_emulate_ctxt *ctxt)
+{
+    ASSERT(!ctxt->event_pending);
+
+    switch ( type )
+    {
+    case x86_swint_icebp:
+        ctxt->event.type = X86_EVENTTYPE_PRI_SW_EXCEPTION;
+        break;
+
+    case x86_swint_int3:
+    case x86_swint_into:
+        ctxt->event.type = X86_EVENTTYPE_SW_EXCEPTION;
+        break;
+
+    case x86_swint_int:
+        ctxt->event.type = X86_EVENTTYPE_SW_INTERRUPT;
+        break;
+    }
+
+    ctxt->event.vector = vector;
+    ctxt->event.error_code = X86_EVENT_NO_EC;
+    ctxt->event.insn_len = insn_len;
+
+    ctxt->event_pending = true;
+}
+
+static inline void x86_emul_reset_event(struct x86_emulate_ctxt *ctxt)
+{
+    ctxt->event_pending = false;
+    ctxt->event = (struct x86_event){};
+}
+
 #endif /* __X86_EMULATE_H__ */
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 3b7ec33..d64d834 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -29,9 +29,6 @@ struct hvm_emulate_ctxt {
     unsigned long seg_reg_accessed;
     unsigned long seg_reg_dirty;
 
-    bool_t exn_pending;
-    struct x86_event trap;
-
     uint32_t intr_shadow;
 
     bool_t set_context;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 09/19] x86/vmx: Use hvm_{get, set}_segment_register() rather than vmx_{get, set}_segment_register()
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (7 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 08/19] x86/emul: Rework emulator event injection Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 10/19] x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS Andrew Cooper
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

No functional change at this point, but this is a prerequisite for forthcoming
functional changes.

Make vmx_get_segment_register() private to vmx.c like all the other Vendor
get/set functions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c        | 14 +++++++-------
 xen/arch/x86/hvm/vmx/vvmx.c       |  6 +++---
 xen/include/asm-x86/hvm/vmx/vmx.h |  2 --
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 31f08d2..377c789 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -940,8 +940,8 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
         .fields = { .type = 0xb, .s = 0, .dpl = 0, .p = 1, .avl = 0,    \
                     .l = 0, .db = 0, .g = 0, .pad = 0 } }).bytes)
 
-void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg,
-                              struct segment_register *reg)
+static void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg,
+                                     struct segment_register *reg)
 {
     unsigned long attr = 0, sel = 0, limit;
 
@@ -1504,19 +1504,19 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
              * Need to read them all either way, as realmode reads can update
              * the saved values we'll use when returning to prot mode. */
             for ( s = 0; s < ARRAY_SIZE(reg); s++ )
-                vmx_get_segment_register(v, s, &reg[s]);
+                hvm_get_segment_register(v, s, &reg[s]);
             v->arch.hvm_vmx.vmx_realmode = realmode;
             
             if ( realmode )
             {
                 for ( s = 0; s < ARRAY_SIZE(reg); s++ )
-                    vmx_set_segment_register(v, s, &reg[s]);
+                    hvm_set_segment_register(v, s, &reg[s]);
             }
             else 
             {
                 for ( s = 0; s < ARRAY_SIZE(reg); s++ )
                     if ( !(v->arch.hvm_vmx.vm86_segment_mask & (1<<s)) )
-                        vmx_set_segment_register(
+                        hvm_set_segment_register(
                             v, s, &v->arch.hvm_vmx.vm86_saved_seg[s]);
             }
 
@@ -3907,7 +3907,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             gdprintk(XENLOG_WARNING, "Bad vmexit (reason %#lx)\n",
                      exit_reason);
 
-            vmx_get_segment_register(v, x86_seg_ss, &ss);
+            hvm_get_segment_register(v, x86_seg_ss, &ss);
             if ( ss.attr.fields.dpl )
                 hvm_inject_hw_exception(TRAP_invalid_op,
                                         X86_EVENT_NO_EC);
@@ -3939,7 +3939,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
         gprintk(XENLOG_WARNING, "Bad rIP %lx for mode %u\n", regs->rip, mode);
 
-        vmx_get_segment_register(v, x86_seg_ss, &ss);
+        hvm_get_segment_register(v, x86_seg_ss, &ss);
         if ( ss.attr.fields.dpl )
         {
             __vmread(VM_ENTRY_INTR_INFO, &intr_info);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index efaf54c..bcc4a97 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -360,7 +360,7 @@ static int vmx_inst_check_privilege(struct cpu_user_regs *regs, int vmxop_check)
     else if ( !vcpu_2_nvmx(v).vmxon_region_pa )
         goto invalid_op;
 
-    vmx_get_segment_register(v, x86_seg_cs, &cs);
+    hvm_get_segment_register(v, x86_seg_cs, &cs);
 
     if ( (regs->eflags & X86_EFLAGS_VM) ||
          (hvm_long_mode_enabled(v) && cs.attr.fields.l == 0) )
@@ -419,13 +419,13 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
 
         if ( hvm_long_mode_enabled(v) )
         {
-            vmx_get_segment_register(v, x86_seg_cs, &seg);
+            hvm_get_segment_register(v, x86_seg_cs, &seg);
             mode_64bit = seg.attr.fields.l;
         }
 
         if ( info.fields.segment > VMX_SREG_GS )
             goto gp_fault;
-        vmx_get_segment_register(v, sreg_to_index[info.fields.segment], &seg);
+        hvm_get_segment_register(v, sreg_to_index[info.fields.segment], &seg);
         seg_base = seg.base;
 
         base = info.fields.base_reg_invalid ? 0 :
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 4cdd9b1..0e5902d 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -550,8 +550,6 @@ static inline int __vmxon(u64 addr)
     return rc;
 }
 
-void vmx_get_segment_register(struct vcpu *, enum x86_segment,
-                              struct segment_register *);
 void vmx_inject_extint(int trap, uint8_t source);
 void vmx_inject_nmi(void);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 10/19] x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (8 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 09/19] x86/vmx: Use hvm_{get, set}_segment_register() rather than vmx_{get, set}_segment_register() Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 14:18   ` Boris Ostrovsky
  2016-11-28 11:13 ` [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back Andrew Cooper
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Boris Ostrovsky, Suravee Suthikulpanit

Intel VT-x and AMD SVM provide access to the full segment descriptor cache via
fields in the VMCB/VMCS.  However, the bits which are actually checked by
hardware and preserved across vmentry/exit are inconsistent, and the vendor
accessor functions perform inconsistent modification to the raw values.

Convert {svm,vmx}_{get,set}_segment_register() into raw accessors, and alter
hvm_{get,set}_segment_register() to cook the values consistently.  This allows
the common emulation code to better rely on finding architecturally-expected
values.

While moving the code performing the cooking, fix the %ss.db quirk.  A NULL
selector is indicated by .p being clear, not the value of the .type field.

This does cause some functional changes because of the modifications being
applied uniformly.  A side effect of this fixes latent bugs where
vmx_set_segment_register() didn't correctly fix up .G for segments, and
inconsistent fixing up of the GDTR/IDTR limits.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
v2:
 * Clarify the change of the %ss.db quirk
 * Rework %tr typecheck logic
 * Swap a break for return following ASSERT_UNREACHABLE()
---
 xen/arch/x86/hvm/hvm.c        | 154 ++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c    |  20 +-----
 xen/arch/x86/hvm/vmx/vmx.c    |   6 +-
 xen/include/asm-x86/desc.h    |   6 ++
 xen/include/asm-x86/hvm/hvm.h |  17 ++---
 5 files changed, 167 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ef83100..bdfd94e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6051,6 +6051,160 @@ void hvm_domain_soft_reset(struct domain *d)
 }
 
 /*
+ * Segment caches in VMCB/VMCS are inconsistent about which bits are checked,
+ * important, and preserved across vmentry/exit.  Cook the values to make them
+ * closer to what is architecturally expected from entries in the segment
+ * cache.
+ */
+void hvm_get_segment_register(struct vcpu *v, enum x86_segment seg,
+                              struct segment_register *reg)
+{
+    hvm_funcs.get_segment_register(v, seg, reg);
+
+    switch ( seg )
+    {
+    case x86_seg_ss:
+        /* SVM may retain %ss.DB when %ss is loaded with a NULL selector. */
+        if ( !reg->attr.fields.p )
+            reg->attr.fields.db = 0;
+        break;
+
+    case x86_seg_tr:
+        /*
+         * SVM doesn't track %tr.B. Architecturally, a loaded TSS segment will
+         * always be busy.
+         */
+        reg->attr.fields.type |= 0x2;
+
+        /*
+         * %cs and %tr are unconditionally present.  SVM ignores these present
+         * bits and will happily run without them set.
+         */
+    case x86_seg_cs:
+        reg->attr.fields.p = 1;
+        break;
+
+    case x86_seg_gdtr:
+    case x86_seg_idtr:
+        /*
+         * Treat GDTR/IDTR as being present system segments.  This avoids them
+         * needing special casing for segmentation checks.
+         */
+        reg->attr.bytes = 0x80;
+        break;
+
+    default: /* Avoid triggering -Werror=switch */
+        break;
+    }
+
+    if ( reg->attr.fields.p )
+    {
+        /*
+         * For segments which are present/usable, cook the system flag.  SVM
+         * ignores the S bit on all segments and will happily run with them in
+         * any state.
+         */
+        reg->attr.fields.s = is_x86_user_segment(seg);
+
+        /*
+         * SVM discards %cs.G on #VMEXIT.  Other user segments do have .G
+         * tracked, but Linux commit 80112c89ed87 "KVM: Synthesize G bit for
+         * all segments." indicates that this isn't necessarily the case when
+         * nested under ESXi.
+         *
+         * Unconditionally recalculate G.
+         */
+        reg->attr.fields.g = !!(reg->limit >> 20);
+
+        /*
+         * SVM doesn't track the Accessed flag.  It will always be set for
+         * usable user segments loaded into the descriptor cache.
+         */
+        if ( is_x86_user_segment(seg) )
+            reg->attr.fields.type |= 0x1;
+    }
+}
+
+void hvm_set_segment_register(struct vcpu *v, enum x86_segment seg,
+                              struct segment_register *reg)
+{
+    /* Set G to match the limit field.  VT-x cares, while SVM doesn't. */
+    if ( reg->attr.fields.p )
+        reg->attr.fields.g = !!(reg->limit >> 20);
+
+    switch ( seg )
+    {
+    case x86_seg_cs:
+        ASSERT(reg->attr.fields.p);                  /* Usable. */
+        ASSERT(reg->attr.fields.s);                  /* User segment. */
+        ASSERT((reg->base >> 32) == 0);              /* Upper bits clear. */
+        break;
+
+    case x86_seg_ss:
+        if ( reg->attr.fields.p )
+        {
+            ASSERT(reg->attr.fields.s);              /* User segment. */
+            ASSERT(!(reg->attr.fields.type & 0x8));  /* Data segment. */
+            ASSERT(reg->attr.fields.type & 0x2);     /* Writeable. */
+            ASSERT((reg->base >> 32) == 0);          /* Upper bits clear. */
+        }
+        break;
+
+    case x86_seg_ds:
+    case x86_seg_es:
+    case x86_seg_fs:
+    case x86_seg_gs:
+        if ( reg->attr.fields.p )
+        {
+            ASSERT(reg->attr.fields.s);              /* User segment. */
+
+            if ( reg->attr.fields.type & 0x8 )
+                ASSERT(reg->attr.fields.type & 0x2); /* Readable. */
+
+            if ( seg == x86_seg_fs || seg == x86_seg_gs )
+                ASSERT(is_canonical_address(reg->base));
+            else
+                ASSERT((reg->base >> 32) == 0);      /* Upper bits clear. */
+        }
+        break;
+
+    case x86_seg_tr:
+        ASSERT(reg->attr.fields.p);                  /* Usable. */
+        ASSERT(!reg->attr.fields.s);                 /* System segment. */
+        ASSERT(!(reg->sel & 0x4));                   /* !TI. */
+        if ( reg->attr.fields.type == SYS_DESC_tss_busy )
+            ASSERT(is_canonical_address(reg->base));
+        else if ( reg->attr.fields.type == SYS_DESC_tss16_busy )
+            ASSERT((reg->base >> 32) == 0);
+        else
+            ASSERT(!"%tr typecheck failure");
+        break;
+
+    case x86_seg_ldtr:
+        if ( reg->attr.fields.p )
+        {
+            ASSERT(!reg->attr.fields.s);             /* System segment. */
+            ASSERT(!(reg->sel & 0x4));               /* !TI. */
+            ASSERT(reg->attr.fields.type == SYS_DESC_ldt);
+            ASSERT(is_canonical_address(reg->base));
+        }
+        break;
+
+    case x86_seg_gdtr:
+    case x86_seg_idtr:
+        ASSERT(is_canonical_address(reg->base));
+        ASSERT((reg->limit >> 16) == 0);             /* Upper bits clear. */
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    hvm_funcs.set_segment_register(v, seg, reg);
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 912d871..a944739 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -627,50 +627,34 @@ static void svm_get_segment_register(struct vcpu *v, enum x86_segment seg,
     {
     case x86_seg_cs:
         memcpy(reg, &vmcb->cs, sizeof(*reg));
-        reg->attr.fields.p = 1;
-        reg->attr.fields.g = reg->limit > 0xFFFFF;
         break;
     case x86_seg_ds:
         memcpy(reg, &vmcb->ds, sizeof(*reg));
-        if ( reg->attr.fields.type != 0 )
-            reg->attr.fields.type |= 0x1;
         break;
     case x86_seg_es:
         memcpy(reg, &vmcb->es, sizeof(*reg));
-        if ( reg->attr.fields.type != 0 )
-            reg->attr.fields.type |= 0x1;
         break;
     case x86_seg_fs:
         svm_sync_vmcb(v);
         memcpy(reg, &vmcb->fs, sizeof(*reg));
-        if ( reg->attr.fields.type != 0 )
-            reg->attr.fields.type |= 0x1;
         break;
     case x86_seg_gs:
         svm_sync_vmcb(v);
         memcpy(reg, &vmcb->gs, sizeof(*reg));
-        if ( reg->attr.fields.type != 0 )
-            reg->attr.fields.type |= 0x1;
         break;
     case x86_seg_ss:
         memcpy(reg, &vmcb->ss, sizeof(*reg));
         reg->attr.fields.dpl = vmcb->_cpl;
-        if ( reg->attr.fields.type == 0 )
-            reg->attr.fields.db = 0;
         break;
     case x86_seg_tr:
         svm_sync_vmcb(v);
         memcpy(reg, &vmcb->tr, sizeof(*reg));
-        reg->attr.fields.p = 1;
-        reg->attr.fields.type |= 0x2;
         break;
     case x86_seg_gdtr:
         memcpy(reg, &vmcb->gdtr, sizeof(*reg));
-        reg->attr.bytes = 0x80;
         break;
     case x86_seg_idtr:
         memcpy(reg, &vmcb->idtr, sizeof(*reg));
-        reg->attr.bytes = 0x80;
         break;
     case x86_seg_ldtr:
         svm_sync_vmcb(v);
@@ -740,11 +724,11 @@ static void svm_set_segment_register(struct vcpu *v, enum x86_segment seg,
         break;
     case x86_seg_gdtr:
         vmcb->gdtr.base = reg->base;
-        vmcb->gdtr.limit = (uint16_t)reg->limit;
+        vmcb->gdtr.limit = reg->limit;
         break;
     case x86_seg_idtr:
         vmcb->idtr.base = reg->base;
-        vmcb->idtr.limit = (uint16_t)reg->limit;
+        vmcb->idtr.limit = reg->limit;
         break;
     case x86_seg_ldtr:
         memcpy(&vmcb->ldtr, reg, sizeof(*reg));
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 377c789..004dad8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1126,9 +1126,6 @@ static void vmx_set_segment_register(struct vcpu *v, enum x86_segment seg,
      */
     attr = (!(attr & (1u << 7)) << 16) | ((attr & 0xf00) << 4) | (attr & 0xff);
 
-    /* VMX has strict consistency requirement for flag G. */
-    attr |= !!(limit >> 20) << 15;
-
     vmx_vmcs_enter(v);
 
     switch ( seg )
@@ -1173,8 +1170,7 @@ static void vmx_set_segment_register(struct vcpu *v, enum x86_segment seg,
         __vmwrite(GUEST_TR_SELECTOR, sel);
         __vmwrite(GUEST_TR_LIMIT, limit);
         __vmwrite(GUEST_TR_BASE, base);
-        /* VMX checks that the the busy flag (bit 1) is set. */
-        __vmwrite(GUEST_TR_AR_BYTES, attr | 2);
+        __vmwrite(GUEST_TR_AR_BYTES, attr);
         break;
     case x86_seg_gdtr:
         __vmwrite(GUEST_GDTR_LIMIT, limit);
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 0e2d97f..da924bf 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -89,7 +89,13 @@
 #ifndef __ASSEMBLY__
 
 /* System Descriptor types for GDT and IDT entries. */
+#define SYS_DESC_tss16_avail  1
 #define SYS_DESC_ldt          2
+#define SYS_DESC_tss16_busy   3
+#define SYS_DESC_call_gate16  4
+#define SYS_DESC_task_gate    5
+#define SYS_DESC_irq_gate16   6
+#define SYS_DESC_trap_gate16  7
 #define SYS_DESC_tss_avail    9
 #define SYS_DESC_tss_busy     11
 #define SYS_DESC_call_gate    12
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 51a64f7..b37b335 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -358,19 +358,10 @@ static inline void hvm_flush_guest_tlbs(void)
 void hvm_hypercall_page_initialise(struct domain *d,
                                    void *hypercall_page);
 
-static inline void
-hvm_get_segment_register(struct vcpu *v, enum x86_segment seg,
-                         struct segment_register *reg)
-{
-    hvm_funcs.get_segment_register(v, seg, reg);
-}
-
-static inline void
-hvm_set_segment_register(struct vcpu *v, enum x86_segment seg,
-                         struct segment_register *reg)
-{
-    hvm_funcs.set_segment_register(v, seg, reg);
-}
+void hvm_get_segment_register(struct vcpu *v, enum x86_segment seg,
+                              struct segment_register *reg);
+void hvm_set_segment_register(struct vcpu *v, enum x86_segment seg,
+                              struct segment_register *reg);
 
 static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v)
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (9 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 10/19] x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 12:47   ` Paul Durrant
  2016-11-29 16:02   ` Jan Beulich
  2016-11-28 11:13 ` [PATCH v2 12/19] x86/pv: " Andrew Cooper
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich

Introduce a new x86_emul_pagefault() similar to x86_emul_hw_exception(), and
use this instead of hvm_inject_page_fault() from emulation codepaths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Paul Durrant <paul.durrant@citrix.com>

v2:
 * Change x86_emul_pagefault()'s error_code parameter to being signed
 * Split out shadow changes
---
 xen/arch/x86/hvm/emulate.c             |  4 ++--
 xen/arch/x86/x86_emulate/x86_emulate.h | 13 +++++++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 7745c5b..35d1d1c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -459,7 +459,7 @@ static int hvmemul_linear_to_phys(
     {
         if ( pfec & (PFEC_page_paged | PFEC_page_shared) )
             return X86EMUL_RETRY;
-        hvm_inject_page_fault(pfec, addr);
+        x86_emul_pagefault(pfec, addr, &hvmemul_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
     }
 
@@ -483,7 +483,7 @@ static int hvmemul_linear_to_phys(
                 ASSERT(!reverse);
                 if ( npfn != gfn_x(INVALID_GFN) )
                     return X86EMUL_UNHANDLEABLE;
-                hvm_inject_page_fault(pfec, addr & PAGE_MASK);
+                x86_emul_pagefault(pfec, addr & PAGE_MASK, &hvmemul_ctxt->ctxt);
                 return X86EMUL_EXCEPTION;
             }
             *reps = done;
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 8019ee1..4679711 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -624,6 +624,19 @@ static inline void x86_emul_hw_exception(
     ctxt->event_pending = true;
 }
 
+static inline void x86_emul_pagefault(
+    int error_code, unsigned long cr2, struct x86_emulate_ctxt *ctxt)
+{
+    ASSERT(!ctxt->event_pending);
+
+    ctxt->event.vector = 14; /* TRAP_page_fault */
+    ctxt->event.type = X86_EVENTTYPE_HW_EXCEPTION;
+    ctxt->event.error_code = error_code;
+    ctxt->event.cr2 = cr2;
+
+    ctxt->event_pending = true;
+}
+
 static inline void x86_emul_software_event(
     enum x86_swint_type type, uint8_t vector, uint8_t insn_len,
     struct x86_emulate_ctxt *ctxt)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 12/19] x86/pv: Avoid raising faults behind the emulators back
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (10 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 13/19] x86/shadow: " Andrew Cooper
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

Use x86_emul_pagefault() rather than pv_inject_page_fault() to cause raised
pagefaults to be known to the emulator.  This requires altering the callers of
x86_emulate() to properly re-inject the event.

While fixing this, fix the singlestep behaviour.  Previously, an otherwise
successful emulation would fail if singlestepping was active, as the emulator
couldn't raise #DB.  This is unreasonable from the point of view of the guest.

We therefore tolerate either #PF or #DB being raised by the emulator, but
reject anything else as unexpected.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm.c | 96 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 64 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8a1e7b4..5b60b59 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5136,7 +5136,7 @@ static int ptwr_emulated_read(
     if ( !__addr_ok(addr) ||
          (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
     {
-        pv_inject_page_fault(0, addr + bytes - rc); /* Read fault. */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -5177,8 +5177,9 @@ static int ptwr_emulated_update(
         addr &= ~(sizeof(paddr_t)-1);
         if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
         {
-            pv_inject_page_fault(0, /* Read fault. */
-                                 addr + sizeof(paddr_t) - rc);
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
             return X86EMUL_EXCEPTION;
         }
         /* Mask out bits provided by caller. */
@@ -5379,27 +5380,40 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
     page_unlock(page);
     put_page(page);
 
-    /*
-     * TODO: Make this true:
-     *
     ASSERT(ptwr_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
-     *
-     * Some codepaths still raise exceptions behind the back of the
-     * emulator. (i.e. return X86EMUL_EXCEPTION but without
-     * event_pending being set).  In the meantime, use a slightly
-     * relaxed check...
-     */
-    if ( ptwr_ctxt.ctxt.event_pending )
-        ASSERT(rc == X86EMUL_EXCEPTION);
 
-    if ( rc == X86EMUL_UNHANDLEABLE || ptwr_ctxt.ctxt.event_pending )
-        goto bail;
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which marked
+         * read-only by Xen.  We tolerate #PF (from hitting an adjacent page)
+         * and #DB (from singlestepping).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             (ptwr_ctxt.ctxt.event.vector == TRAP_debug ||
+              ptwr_ctxt.ctxt.event.vector == TRAP_page_fault) )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+            pv_inject_hw_exception(TRAP_gp_fault, 0);
+        }
 
-    perfc_incr(ptwr_emulations);
-    return EXCRET_fault_fixed;
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
 
  bail:
-    return 0;
+    default:
+        return 0;
+    }
 }
 
 /*************************
@@ -5516,21 +5530,39 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
     else
         rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
 
-    /*
-     * TODO: Make this true:
-     *
     ASSERT(ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
-     *
-     * Some codepaths still raise exceptions behind the back of the
-     * emulator. (i.e. return X86EMUL_EXCEPTION but without
-     * event_pending being set).  In the meantime, use a slightly
-     * relaxed check...
-     */
-    if ( ctxt.event_pending )
-        ASSERT(rc == X86EMUL_EXCEPTION);
 
-    return ((rc != X86EMUL_UNHANDLEABLE && !ctxt.event_pending)
-            ? EXCRET_fault_fixed : 0);
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page) and #DB (from
+         * singlestepping).  Anything else is an emulation bug, or a guest
+         * playing with the instruction stream under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             (ctxt.event.vector == TRAP_debug ||
+              ctxt.event.vector == TRAP_page_fault) )
+            pv_inject_event(&ctxt.event);
+        else
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+            pv_inject_hw_exception(TRAP_gp_fault, 0);
+        }
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+
+    default:
+        return 0;
+    }
 }
 
 void *alloc_xen_pagetable(void)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (11 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 12/19] x86/pv: " Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 14:49   ` Tim Deegan
  2016-11-28 11:13 ` [PATCH v2 14/19] x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer Andrew Cooper
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Tim Deegan, Jan Beulich

Use x86_emul_{hw_exception,pagefault}() rather than
{pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
faults to be known to the emulator.  This requires altering the callers of
x86_emulate() to properly re-inject the event.

While fixing this, fix the singlestep behaviour.  Previously, an otherwise
successful emulation would fail if singlestepping was active, as the emulator
couldn't raise #DB.  This is unreasonable from the point of view of the guest.

We therefore tolerate #PF/#GP/SS and #DB being raised by the emulator, but
reject anything else as unexpected.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

v2:
 * New
---
 xen/arch/x86/mm/shadow/common.c | 13 ++++-----
 xen/arch/x86/mm/shadow/multi.c  | 61 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index f07803b..e509cc1 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -162,8 +162,9 @@ static int hvm_translate_linear_addr(
 
     if ( !okay )
     {
-        hvm_inject_hw_exception(
-            (seg == x86_seg_ss) ? TRAP_stack_error : TRAP_gp_fault, 0);
+        x86_emul_hw_exception(
+            (seg == x86_seg_ss) ? TRAP_stack_error : TRAP_gp_fault,
+            0, &sh_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
     }
 
@@ -323,7 +324,7 @@ pv_emulate_read(enum x86_segment seg,
 
     if ( (rc = copy_from_user(p_data, (void *)offset, bytes)) != 0 )
     {
-        pv_inject_page_fault(0, offset + bytes - rc); /* Read fault. */
+        x86_emul_pagefault(0, offset + bytes - rc, ctxt); /* Read fault. */
         return X86EMUL_EXCEPTION;
     }
 
@@ -1720,10 +1721,8 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v, unsigned long vaddr,
     gfn = paging_get_hostmode(v)->gva_to_gfn(v, NULL, vaddr, &pfec);
     if ( gfn == gfn_x(INVALID_GFN) )
     {
-        if ( is_hvm_vcpu(v) )
-            hvm_inject_page_fault(pfec, vaddr);
-        else
-            pv_inject_page_fault(pfec, vaddr);
+        x86_emul_pagefault(pfec, vaddr, &sh_ctxt->ctxt);
+
         return _mfn(BAD_GVA_TO_GFN);
     }
 
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 13fa1bf..50705a0 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3390,7 +3390,7 @@ static int sh_page_fault(struct vcpu *v,
      * would be a good unshadow hint. If we *do* decide to unshadow-on-fault
      * then it must be 'failable': we cannot require the unshadow to succeed.
      */
-    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )
+    if ( r == X86EMUL_UNHANDLEABLE )
     {
         perfc_incr(shadow_fault_emulate_failed);
 #if SHADOW_OPTIMIZATIONS & SHOPT_FAST_EMULATION
@@ -3434,6 +3434,34 @@ static int sh_page_fault(struct vcpu *v,
         v->arch.paging.last_write_emul_ok = 0;
 #endif
 
+    if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
+    {
+        /*
+         * This emulation covers writes to shadow pagetables.  We tolerate #PF
+         * (from hitting adjacent pages), #GP/#SS (from segmentation errors),
+         * and #DB (from singlestepping).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             (emul_ctxt.ctxt.event.vector < 32) &&
+             ((1u << emul_ctxt.ctxt.event.vector) &
+              ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
+               (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
+        {
+            if ( is_hvm_vcpu(v) )
+                hvm_inject_event(&emul_ctxt.ctxt.event);
+            else
+                pv_inject_event(&emul_ctxt.ctxt.event);
+        }
+        else
+        {
+            if ( is_hvm_vcpu(v) )
+                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+            else
+                pv_inject_hw_exception(TRAP_gp_fault, 0);
+        }
+    }
+
 #if GUEST_PAGING_LEVELS == 3 /* PAE guest */
     if ( r == X86EMUL_OKAY ) {
         int i, emulation_count=0;
@@ -3475,6 +3503,37 @@ static int sh_page_fault(struct vcpu *v,
             {
                 perfc_incr(shadow_em_ex_fail);
                 TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_EMULATION_LAST_FAILED);
+
+                if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
+                {
+                    /*
+                     * This emulation covers writes to shadow pagetables.  We
+                     * tolerate #PF (from hitting adjacent pages), #GP/#SS
+                     * (from segmentation errors), and #DB (from
+                     * singlestepping).  Anything else is an emulation bug, or
+                     * a guest playing with the instruction stream under Xen's
+                     * feet.
+                     */
+                    if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+                         (emul_ctxt.ctxt.event.vector < 32) &&
+                         ((1u << emul_ctxt.ctxt.event.vector) &
+                          ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
+                           (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
+                    {
+                        if ( is_hvm_vcpu(v) )
+                            hvm_inject_event(&emul_ctxt.ctxt.event);
+                        else
+                            pv_inject_event(&emul_ctxt.ctxt.event);
+                    }
+                    else
+                    {
+                        if ( is_hvm_vcpu(v) )
+                            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                        else
+                            pv_inject_hw_exception(TRAP_gp_fault, 0);
+                    }
+                }
+
                 break; /* Don't emulate again if we failed! */
             }
         }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 14/19] x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (12 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 13/19] x86/shadow: " Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info Andrew Cooper
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

which is filled with pagefault information should one occur.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/emulate.c        |  8 ++++---
 xen/arch/x86/hvm/hvm.c            | 49 +++++++++++++++++++++++++--------------
 xen/arch/x86/hvm/vmx/vvmx.c       |  9 ++++---
 xen/arch/x86/mm/shadow/common.c   |  5 ++--
 xen/include/asm-x86/hvm/support.h | 23 +++++++++++++-----
 5 files changed, 63 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 35d1d1c..6de94d4 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -770,6 +770,7 @@ static int __hvmemul_read(
     struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
     struct vcpu *curr = current;
+    pagefault_info_t pfinfo;
     unsigned long addr, reps = 1;
     uint32_t pfec = PFEC_page_present;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
@@ -790,8 +791,8 @@ static int __hvmemul_read(
         pfec |= PFEC_user_mode;
 
     rc = ((access_type == hvm_access_insn_fetch) ?
-          hvm_fetch_from_guest_virt(p_data, addr, bytes, pfec) :
-          hvm_copy_from_guest_virt(p_data, addr, bytes, pfec));
+          hvm_fetch_from_guest_virt(p_data, addr, bytes, pfec, &pfinfo) :
+          hvm_copy_from_guest_virt(p_data, addr, bytes, pfec, &pfinfo));
 
     switch ( rc )
     {
@@ -878,6 +879,7 @@ static int hvmemul_write(
     struct hvm_emulate_ctxt *hvmemul_ctxt =
         container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
     struct vcpu *curr = current;
+    pagefault_info_t pfinfo;
     unsigned long addr, reps = 1;
     uint32_t pfec = PFEC_page_present | PFEC_write_access;
     struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
@@ -896,7 +898,7 @@ static int hvmemul_write(
          (hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
         pfec |= PFEC_user_mode;
 
-    rc = hvm_copy_to_guest_virt(addr, p_data, bytes, pfec);
+    rc = hvm_copy_to_guest_virt(addr, p_data, bytes, pfec, &pfinfo);
 
     switch ( rc )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bdfd94e..390f76d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2859,6 +2859,7 @@ void hvm_task_switch(
     struct desc_struct *optss_desc = NULL, *nptss_desc = NULL, tss_desc;
     bool_t otd_writable, ntd_writable;
     unsigned long eflags;
+    pagefault_info_t pfinfo;
     int exn_raised, rc;
     struct {
         u16 back_link,__blh;
@@ -2925,7 +2926,7 @@ void hvm_task_switch(
     }
 
     rc = hvm_copy_from_guest_virt(
-        &tss, prev_tr.base, sizeof(tss), PFEC_page_present);
+        &tss, prev_tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
     if ( rc != HVMCOPY_okay )
         goto out;
 
@@ -2963,12 +2964,12 @@ void hvm_task_switch(
                                 &tss.eip,
                                 offsetof(typeof(tss), trace) -
                                 offsetof(typeof(tss), eip),
-                                PFEC_page_present);
+                                PFEC_page_present, &pfinfo);
     if ( rc != HVMCOPY_okay )
         goto out;
 
     rc = hvm_copy_from_guest_virt(
-        &tss, tr.base, sizeof(tss), PFEC_page_present);
+        &tss, tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
     /*
      * Note: The HVMCOPY_gfn_shared case could be optimised, if the callee
      * functions knew we want RO access.
@@ -3008,7 +3009,8 @@ void hvm_task_switch(
         tss.back_link = prev_tr.sel;
 
         rc = hvm_copy_to_guest_virt(tr.base + offsetof(typeof(tss), back_link),
-                                    &tss.back_link, sizeof(tss.back_link), 0);
+                                    &tss.back_link, sizeof(tss.back_link), 0,
+                                    &pfinfo);
         if ( rc == HVMCOPY_bad_gva_to_gfn )
             exn_raised = 1;
         else if ( rc != HVMCOPY_okay )
@@ -3045,7 +3047,8 @@ void hvm_task_switch(
                                         16 << segr.attr.fields.db,
                                         &linear_addr) )
         {
-            rc = hvm_copy_to_guest_virt(linear_addr, &errcode, opsz, 0);
+            rc = hvm_copy_to_guest_virt(linear_addr, &errcode, opsz, 0,
+                                        &pfinfo);
             if ( rc == HVMCOPY_bad_gva_to_gfn )
                 exn_raised = 1;
             else if ( rc != HVMCOPY_okay )
@@ -3068,7 +3071,8 @@ void hvm_task_switch(
 #define HVMCOPY_phys       (0u<<2)
 #define HVMCOPY_virt       (1u<<2)
 static enum hvm_copy_result __hvm_copy(
-    void *buf, paddr_t addr, int size, unsigned int flags, uint32_t pfec)
+    void *buf, paddr_t addr, int size, unsigned int flags, uint32_t pfec,
+    pagefault_info_t *pfinfo)
 {
     struct vcpu *curr = current;
     unsigned long gfn;
@@ -3109,7 +3113,15 @@ static enum hvm_copy_result __hvm_copy(
                 if ( pfec & PFEC_page_shared )
                     return HVMCOPY_gfn_shared;
                 if ( flags & HVMCOPY_fault )
+                {
+                    if ( pfinfo )
+                    {
+                        pfinfo->linear = addr;
+                        pfinfo->ec = pfec;
+                    }
+
                     hvm_inject_page_fault(pfec, addr);
+                }
                 return HVMCOPY_bad_gva_to_gfn;
             }
             gpa |= (paddr_t)gfn << PAGE_SHIFT;
@@ -3279,7 +3291,7 @@ enum hvm_copy_result hvm_copy_to_guest_phys(
 {
     return __hvm_copy(buf, paddr, size,
                       HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys,
-                      0);
+                      0, NULL);
 }
 
 enum hvm_copy_result hvm_copy_from_guest_phys(
@@ -3287,31 +3299,34 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
 {
     return __hvm_copy(buf, paddr, size,
                       HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys,
-                      0);
+                      0, NULL);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt(
-    unsigned long vaddr, void *buf, int size, uint32_t pfec)
+    unsigned long vaddr, void *buf, int size, uint32_t pfec,
+    pagefault_info_t *pfinfo)
 {
     return __hvm_copy(buf, vaddr, size,
                       HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_write_access | pfec);
+                      PFEC_page_present | PFEC_write_access | pfec, pfinfo);
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec)
+    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+    pagefault_info_t *pfinfo)
 {
     return __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec);
+                      PFEC_page_present | pfec, pfinfo);
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec)
+    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+    pagefault_info_t *pfinfo)
 {
     return __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_insn_fetch | pfec);
+                      PFEC_page_present | PFEC_insn_fetch | pfec, pfinfo);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
@@ -3319,7 +3334,7 @@ enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
 {
     return __hvm_copy(buf, vaddr, size,
                       HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_write_access | pfec);
+                      PFEC_page_present | PFEC_write_access | pfec, NULL);
 }
 
 enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
@@ -3327,7 +3342,7 @@ enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
 {
     return __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec);
+                      PFEC_page_present | pfec, NULL);
 }
 
 enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
@@ -3335,7 +3350,7 @@ enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
 {
     return __hvm_copy(buf, vaddr, size,
                       HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_insn_fetch | pfec);
+                      PFEC_page_present | PFEC_insn_fetch | pfec, NULL);
 }
 
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index bcc4a97..7342d12 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -396,6 +396,7 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
     struct vcpu *v = current;
     union vmx_inst_info info;
     struct segment_register seg;
+    pagefault_info_t pfinfo;
     unsigned long base, index, seg_base, disp, offset;
     int scale, size;
 
@@ -451,7 +452,7 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
             goto gp_fault;
 
         if ( poperandS != NULL &&
-             hvm_copy_from_guest_virt(poperandS, base, size, 0)
+             hvm_copy_from_guest_virt(poperandS, base, size, 0, &pfinfo)
                   != HVMCOPY_okay )
             return X86EMUL_EXCEPTION;
         decode->mem = base;
@@ -1611,6 +1612,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     struct vmx_inst_decoded decode;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    pagefault_info_t pfinfo;
     unsigned long gpa = 0;
     int rc;
 
@@ -1620,7 +1622,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
 
     gpa = nvcpu->nv_vvmcxaddr;
 
-    rc = hvm_copy_to_guest_virt(decode.mem, &gpa, decode.len, 0);
+    rc = hvm_copy_to_guest_virt(decode.mem, &gpa, decode.len, 0, &pfinfo);
     if ( rc != HVMCOPY_okay )
         return X86EMUL_EXCEPTION;
 
@@ -1679,6 +1681,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
     struct vmx_inst_decoded decode;
+    pagefault_info_t pfinfo;
     u64 value = 0;
     int rc;
 
@@ -1690,7 +1693,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
 
     switch ( decode.type ) {
     case VMX_INST_MEMREG_TYPE_MEMORY:
-        rc = hvm_copy_to_guest_virt(decode.mem, &value, decode.len, 0);
+        rc = hvm_copy_to_guest_virt(decode.mem, &value, decode.len, 0, &pfinfo);
         if ( rc != HVMCOPY_okay )
             return X86EMUL_EXCEPTION;
         break;
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index e509cc1..e8501ce 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -179,6 +179,7 @@ hvm_read(enum x86_segment seg,
          enum hvm_access_type access_type,
          struct sh_emulate_ctxt *sh_ctxt)
 {
+    pagefault_info_t pfinfo;
     unsigned long addr;
     int rc;
 
@@ -188,9 +189,9 @@ hvm_read(enum x86_segment seg,
         return rc;
 
     if ( access_type == hvm_access_insn_fetch )
-        rc = hvm_fetch_from_guest_virt(p_data, addr, bytes, 0);
+        rc = hvm_fetch_from_guest_virt(p_data, addr, bytes, 0, &pfinfo);
     else
-        rc = hvm_copy_from_guest_virt(p_data, addr, bytes, 0);
+        rc = hvm_copy_from_guest_virt(p_data, addr, bytes, 0, &pfinfo);
 
     switch ( rc )
     {
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 9938450..4aa5a36 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -83,16 +83,27 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
  *  HVMCOPY_bad_gfn_to_mfn: Some guest physical address did not map to
  *                          ordinary machine memory.
  *  HVMCOPY_bad_gva_to_gfn: Some guest virtual address did not have a valid
- *                          mapping to a guest physical address. In this case
- *                          a page fault exception is automatically queued
- *                          for injection into the current HVM VCPU.
+ *                          mapping to a guest physical address.  The
+ *                          pagefault_info_t structure will be filled in if
+ *                          provided, and a page fault exception is
+ *                          automatically queued for injection into the
+ *                          current HVM VCPU.
  */
+typedef struct pagefault_info
+{
+    unsigned long linear;
+    int ec;
+} pagefault_info_t;
+
 enum hvm_copy_result hvm_copy_to_guest_virt(
-    unsigned long vaddr, void *buf, int size, uint32_t pfec);
+    unsigned long vaddr, void *buf, int size, uint32_t pfec,
+    pagefault_info_t *pfinfo);
 enum hvm_copy_result hvm_copy_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec);
+    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+    pagefault_info_t *pfinfo);
 enum hvm_copy_result hvm_fetch_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec);
+    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+    pagefault_info_t *pfinfo);
 
 /*
  * As above (copy to/from a guest virtual address), but no fault is generated
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (13 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 14/19] x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 12:56   ` Paul Durrant
  2016-11-28 11:13 ` [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear() Andrew Cooper
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Paul Durrant

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
---
CC: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        |  6 ++---
 xen/arch/x86/hvm/hvm.c            | 56 +++++++++------------------------------
 xen/arch/x86/mm/shadow/common.c   |  8 +++---
 xen/include/asm-x86/hvm/support.h | 11 --------
 4 files changed, 19 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 6de94d4..5165bb2 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1947,9 +1947,9 @@ void hvm_emulate_init_per_insn(
                                         hvm_access_insn_fetch,
                                         hvmemul_ctxt->ctxt.addr_size,
                                         &addr) &&
-             hvm_fetch_from_guest_virt_nofault(hvmemul_ctxt->insn_buf, addr,
-                                               sizeof(hvmemul_ctxt->insn_buf),
-                                               pfec) == HVMCOPY_okay) ?
+             hvm_fetch_from_guest_virt(hvmemul_ctxt->insn_buf, addr,
+                                       sizeof(hvmemul_ctxt->insn_buf),
+                                       pfec, NULL) == HVMCOPY_okay) ?
             sizeof(hvmemul_ctxt->insn_buf) : 0;
     }
     else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 390f76d..5eae06a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3066,8 +3066,6 @@ void hvm_task_switch(
 
 #define HVMCOPY_from_guest (0u<<0)
 #define HVMCOPY_to_guest   (1u<<0)
-#define HVMCOPY_no_fault   (0u<<1)
-#define HVMCOPY_fault      (1u<<1)
 #define HVMCOPY_phys       (0u<<2)
 #define HVMCOPY_virt       (1u<<2)
 static enum hvm_copy_result __hvm_copy(
@@ -3112,13 +3110,10 @@ static enum hvm_copy_result __hvm_copy(
                     return HVMCOPY_gfn_paged_out;
                 if ( pfec & PFEC_page_shared )
                     return HVMCOPY_gfn_shared;
-                if ( flags & HVMCOPY_fault )
+                if ( pfinfo )
                 {
-                    if ( pfinfo )
-                    {
-                        pfinfo->linear = addr;
-                        pfinfo->ec = pfec;
-                    }
+                    pfinfo->linear = addr;
+                    pfinfo->ec = pfec;
 
                     hvm_inject_page_fault(pfec, addr);
                 }
@@ -3290,16 +3285,14 @@ enum hvm_copy_result hvm_copy_to_guest_phys(
     paddr_t paddr, void *buf, int size)
 {
     return __hvm_copy(buf, paddr, size,
-                      HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys,
-                      0, NULL);
+                      HVMCOPY_to_guest | HVMCOPY_phys, 0, NULL);
 }
 
 enum hvm_copy_result hvm_copy_from_guest_phys(
     void *buf, paddr_t paddr, int size)
 {
     return __hvm_copy(buf, paddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys,
-                      0, NULL);
+                      HVMCOPY_from_guest | HVMCOPY_phys, 0, NULL);
 }
 
 enum hvm_copy_result hvm_copy_to_guest_virt(
@@ -3307,7 +3300,7 @@ enum hvm_copy_result hvm_copy_to_guest_virt(
     pagefault_info_t *pfinfo)
 {
     return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
+                      HVMCOPY_to_guest | HVMCOPY_virt,
                       PFEC_page_present | PFEC_write_access | pfec, pfinfo);
 }
 
@@ -3316,7 +3309,7 @@ enum hvm_copy_result hvm_copy_from_guest_virt(
     pagefault_info_t *pfinfo)
 {
     return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
+                      HVMCOPY_from_guest | HVMCOPY_virt,
                       PFEC_page_present | pfec, pfinfo);
 }
 
@@ -3325,34 +3318,10 @@ enum hvm_copy_result hvm_fetch_from_guest_virt(
     pagefault_info_t *pfinfo)
 {
     return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
+                      HVMCOPY_from_guest | HVMCOPY_virt,
                       PFEC_page_present | PFEC_insn_fetch | pfec, pfinfo);
 }
 
-enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
-    unsigned long vaddr, void *buf, int size, uint32_t pfec)
-{
-    return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_write_access | pfec, NULL);
-}
-
-enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec)
-{
-    return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | pfec, NULL);
-}
-
-enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec)
-{
-    return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
-                      PFEC_page_present | PFEC_insn_fetch | pfec, NULL);
-}
-
 unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
 {
     int rc;
@@ -3364,8 +3333,7 @@ unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
         return 0;
     }
 
-    rc = hvm_copy_to_guest_virt_nofault((unsigned long)to, (void *)from,
-                                        len, 0);
+    rc = hvm_copy_to_guest_virt((unsigned long)to, (void *)from, len, 0, NULL);
     return rc ? len : 0; /* fake a copy_to_user() return code */
 }
 
@@ -3395,7 +3363,7 @@ unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len)
         return 0;
     }
 
-    rc = hvm_copy_from_guest_virt_nofault(to, (unsigned long)from, len, 0);
+    rc = hvm_copy_from_guest_virt(to, (unsigned long)from, len, 0, NULL);
     return rc ? len : 0; /* fake a copy_from_user() return code */
 }
 
@@ -4070,8 +4038,8 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
                                         (hvm_long_mode_enabled(cur) &&
                                          cs->attr.fields.l) ? 64 :
                                         cs->attr.fields.db ? 32 : 16, &addr) &&
-             (hvm_fetch_from_guest_virt_nofault(sig, addr, sizeof(sig),
-                                                walk) == HVMCOPY_okay) &&
+             (hvm_fetch_from_guest_virt(sig, addr, sizeof(sig),
+                                        walk, NULL) == HVMCOPY_okay) &&
              (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
         {
             regs->eip += sizeof(sig);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index e8501ce..b659324 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -419,8 +419,8 @@ const struct x86_emulate_ops *shadow_init_emulation(
         (!hvm_translate_linear_addr(
             x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
             hvm_access_insn_fetch, sh_ctxt, &addr) &&
-         !hvm_fetch_from_guest_virt_nofault(
-             sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0))
+         !hvm_fetch_from_guest_virt(
+             sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
         ? sizeof(sh_ctxt->insn_buf) : 0;
 
     return &hvm_shadow_emulator_ops;
@@ -447,8 +447,8 @@ void shadow_continue_emulation(struct sh_emulate_ctxt *sh_ctxt,
                 (!hvm_translate_linear_addr(
                     x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
                     hvm_access_insn_fetch, sh_ctxt, &addr) &&
-                 !hvm_fetch_from_guest_virt_nofault(
-                     sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0))
+                 !hvm_fetch_from_guest_virt(
+                     sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
                 ? sizeof(sh_ctxt->insn_buf) : 0;
             sh_ctxt->insn_buf_eip = regs->eip;
         }
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 4aa5a36..114aa04 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -105,17 +105,6 @@ enum hvm_copy_result hvm_fetch_from_guest_virt(
     void *buf, unsigned long vaddr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
 
-/*
- * As above (copy to/from a guest virtual address), but no fault is generated
- * when HVMCOPY_bad_gva_to_gfn is returned.
- */
-enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
-    unsigned long vaddr, void *buf, int size, uint32_t pfec);
-enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec);
-enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec);
-
 #define HVM_HCALL_completed  0 /* hypercall completed - no further action */
 #define HVM_HCALL_preempted  1 /* hypercall preempted - re-execute VMCALL */
 #define HVM_HCALL_invalidate 2 /* invalidate ioemu-dm memory cache        */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (14 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:59   ` Paul Durrant
  2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Paul Durrant

The functions use linear addresses, not virtual addresses, as no segmentation
is used.  (Lots of other code in Xen makes this mistake.)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c        | 12 ++++----
 xen/arch/x86/hvm/hvm.c            | 60 +++++++++++++++++++--------------------
 xen/arch/x86/hvm/vmx/vvmx.c       |  6 ++--
 xen/arch/x86/mm/shadow/common.c   |  8 +++---
 xen/include/asm-x86/hvm/support.h | 14 ++++-----
 5 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 5165bb2..efd6d32 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -791,8 +791,8 @@ static int __hvmemul_read(
         pfec |= PFEC_user_mode;
 
     rc = ((access_type == hvm_access_insn_fetch) ?
-          hvm_fetch_from_guest_virt(p_data, addr, bytes, pfec, &pfinfo) :
-          hvm_copy_from_guest_virt(p_data, addr, bytes, pfec, &pfinfo));
+          hvm_fetch_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo) :
+          hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo));
 
     switch ( rc )
     {
@@ -898,7 +898,7 @@ static int hvmemul_write(
          (hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
         pfec |= PFEC_user_mode;
 
-    rc = hvm_copy_to_guest_virt(addr, p_data, bytes, pfec, &pfinfo);
+    rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
 
     switch ( rc )
     {
@@ -1947,9 +1947,9 @@ void hvm_emulate_init_per_insn(
                                         hvm_access_insn_fetch,
                                         hvmemul_ctxt->ctxt.addr_size,
                                         &addr) &&
-             hvm_fetch_from_guest_virt(hvmemul_ctxt->insn_buf, addr,
-                                       sizeof(hvmemul_ctxt->insn_buf),
-                                       pfec, NULL) == HVMCOPY_okay) ?
+             hvm_fetch_from_guest_linear(hvmemul_ctxt->insn_buf, addr,
+                                         sizeof(hvmemul_ctxt->insn_buf),
+                                         pfec, NULL) == HVMCOPY_okay) ?
             sizeof(hvmemul_ctxt->insn_buf) : 0;
     }
     else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5eae06a..37eaee2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2925,7 +2925,7 @@ void hvm_task_switch(
         goto out;
     }
 
-    rc = hvm_copy_from_guest_virt(
+    rc = hvm_copy_from_guest_linear(
         &tss, prev_tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
     if ( rc != HVMCOPY_okay )
         goto out;
@@ -2960,15 +2960,15 @@ void hvm_task_switch(
     hvm_get_segment_register(v, x86_seg_ldtr, &segr);
     tss.ldt = segr.sel;
 
-    rc = hvm_copy_to_guest_virt(prev_tr.base + offsetof(typeof(tss), eip),
-                                &tss.eip,
-                                offsetof(typeof(tss), trace) -
-                                offsetof(typeof(tss), eip),
-                                PFEC_page_present, &pfinfo);
+    rc = hvm_copy_to_guest_linear(prev_tr.base + offsetof(typeof(tss), eip),
+                                  &tss.eip,
+                                  offsetof(typeof(tss), trace) -
+                                  offsetof(typeof(tss), eip),
+                                  PFEC_page_present, &pfinfo);
     if ( rc != HVMCOPY_okay )
         goto out;
 
-    rc = hvm_copy_from_guest_virt(
+    rc = hvm_copy_from_guest_linear(
         &tss, tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
     /*
      * Note: The HVMCOPY_gfn_shared case could be optimised, if the callee
@@ -3008,9 +3008,9 @@ void hvm_task_switch(
         regs->eflags |= X86_EFLAGS_NT;
         tss.back_link = prev_tr.sel;
 
-        rc = hvm_copy_to_guest_virt(tr.base + offsetof(typeof(tss), back_link),
-                                    &tss.back_link, sizeof(tss.back_link), 0,
-                                    &pfinfo);
+        rc = hvm_copy_to_guest_linear(tr.base + offsetof(typeof(tss), back_link),
+                                      &tss.back_link, sizeof(tss.back_link), 0,
+                                      &pfinfo);
         if ( rc == HVMCOPY_bad_gva_to_gfn )
             exn_raised = 1;
         else if ( rc != HVMCOPY_okay )
@@ -3047,8 +3047,8 @@ void hvm_task_switch(
                                         16 << segr.attr.fields.db,
                                         &linear_addr) )
         {
-            rc = hvm_copy_to_guest_virt(linear_addr, &errcode, opsz, 0,
-                                        &pfinfo);
+            rc = hvm_copy_to_guest_linear(linear_addr, &errcode, opsz, 0,
+                                          &pfinfo);
             if ( rc == HVMCOPY_bad_gva_to_gfn )
                 exn_raised = 1;
             else if ( rc != HVMCOPY_okay )
@@ -3067,7 +3067,7 @@ void hvm_task_switch(
 #define HVMCOPY_from_guest (0u<<0)
 #define HVMCOPY_to_guest   (1u<<0)
 #define HVMCOPY_phys       (0u<<2)
-#define HVMCOPY_virt       (1u<<2)
+#define HVMCOPY_linear     (1u<<2)
 static enum hvm_copy_result __hvm_copy(
     void *buf, paddr_t addr, int size, unsigned int flags, uint32_t pfec,
     pagefault_info_t *pfinfo)
@@ -3101,7 +3101,7 @@ static enum hvm_copy_result __hvm_copy(
 
         count = min_t(int, PAGE_SIZE - gpa, todo);
 
-        if ( flags & HVMCOPY_virt )
+        if ( flags & HVMCOPY_linear )
         {
             gfn = paging_gva_to_gfn(curr, addr, &pfec);
             if ( gfn == gfn_x(INVALID_GFN) )
@@ -3295,30 +3295,30 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
                       HVMCOPY_from_guest | HVMCOPY_phys, 0, NULL);
 }
 
-enum hvm_copy_result hvm_copy_to_guest_virt(
-    unsigned long vaddr, void *buf, int size, uint32_t pfec,
+enum hvm_copy_result hvm_copy_to_guest_linear(
+    unsigned long addr, void *buf, int size, uint32_t pfec,
     pagefault_info_t *pfinfo)
 {
-    return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_to_guest | HVMCOPY_virt,
+    return __hvm_copy(buf, addr, size,
+                      HVMCOPY_to_guest | HVMCOPY_linear,
                       PFEC_page_present | PFEC_write_access | pfec, pfinfo);
 }
 
-enum hvm_copy_result hvm_copy_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+enum hvm_copy_result hvm_copy_from_guest_linear(
+    void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo)
 {
-    return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_virt,
+    return __hvm_copy(buf, addr, size,
+                      HVMCOPY_from_guest | HVMCOPY_linear,
                       PFEC_page_present | pfec, pfinfo);
 }
 
-enum hvm_copy_result hvm_fetch_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+enum hvm_copy_result hvm_fetch_from_guest_linear(
+    void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo)
 {
-    return __hvm_copy(buf, vaddr, size,
-                      HVMCOPY_from_guest | HVMCOPY_virt,
+    return __hvm_copy(buf, addr, size,
+                      HVMCOPY_from_guest | HVMCOPY_linear,
                       PFEC_page_present | PFEC_insn_fetch | pfec, pfinfo);
 }
 
@@ -3333,7 +3333,7 @@ unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
         return 0;
     }
 
-    rc = hvm_copy_to_guest_virt((unsigned long)to, (void *)from, len, 0, NULL);
+    rc = hvm_copy_to_guest_linear((unsigned long)to, (void *)from, len, 0, NULL);
     return rc ? len : 0; /* fake a copy_to_user() return code */
 }
 
@@ -3363,7 +3363,7 @@ unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len)
         return 0;
     }
 
-    rc = hvm_copy_from_guest_virt(to, (unsigned long)from, len, 0, NULL);
+    rc = hvm_copy_from_guest_linear(to, (unsigned long)from, len, 0, NULL);
     return rc ? len : 0; /* fake a copy_from_user() return code */
 }
 
@@ -4038,8 +4038,8 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
                                         (hvm_long_mode_enabled(cur) &&
                                          cs->attr.fields.l) ? 64 :
                                         cs->attr.fields.db ? 32 : 16, &addr) &&
-             (hvm_fetch_from_guest_virt(sig, addr, sizeof(sig),
-                                        walk, NULL) == HVMCOPY_okay) &&
+             (hvm_fetch_from_guest_linear(sig, addr, sizeof(sig),
+                                          walk, NULL) == HVMCOPY_okay) &&
              (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
         {
             regs->eip += sizeof(sig);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 7342d12..fd7ea0a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -452,7 +452,7 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
             goto gp_fault;
 
         if ( poperandS != NULL &&
-             hvm_copy_from_guest_virt(poperandS, base, size, 0, &pfinfo)
+             hvm_copy_from_guest_linear(poperandS, base, size, 0, &pfinfo)
                   != HVMCOPY_okay )
             return X86EMUL_EXCEPTION;
         decode->mem = base;
@@ -1622,7 +1622,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
 
     gpa = nvcpu->nv_vvmcxaddr;
 
-    rc = hvm_copy_to_guest_virt(decode.mem, &gpa, decode.len, 0, &pfinfo);
+    rc = hvm_copy_to_guest_linear(decode.mem, &gpa, decode.len, 0, &pfinfo);
     if ( rc != HVMCOPY_okay )
         return X86EMUL_EXCEPTION;
 
@@ -1693,7 +1693,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
 
     switch ( decode.type ) {
     case VMX_INST_MEMREG_TYPE_MEMORY:
-        rc = hvm_copy_to_guest_virt(decode.mem, &value, decode.len, 0, &pfinfo);
+        rc = hvm_copy_to_guest_linear(decode.mem, &value, decode.len, 0, &pfinfo);
         if ( rc != HVMCOPY_okay )
             return X86EMUL_EXCEPTION;
         break;
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index b659324..0760e76 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -189,9 +189,9 @@ hvm_read(enum x86_segment seg,
         return rc;
 
     if ( access_type == hvm_access_insn_fetch )
-        rc = hvm_fetch_from_guest_virt(p_data, addr, bytes, 0, &pfinfo);
+        rc = hvm_fetch_from_guest_linear(p_data, addr, bytes, 0, &pfinfo);
     else
-        rc = hvm_copy_from_guest_virt(p_data, addr, bytes, 0, &pfinfo);
+        rc = hvm_copy_from_guest_linear(p_data, addr, bytes, 0, &pfinfo);
 
     switch ( rc )
     {
@@ -419,7 +419,7 @@ const struct x86_emulate_ops *shadow_init_emulation(
         (!hvm_translate_linear_addr(
             x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
             hvm_access_insn_fetch, sh_ctxt, &addr) &&
-         !hvm_fetch_from_guest_virt(
+         !hvm_fetch_from_guest_linear(
              sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
         ? sizeof(sh_ctxt->insn_buf) : 0;
 
@@ -447,7 +447,7 @@ void shadow_continue_emulation(struct sh_emulate_ctxt *sh_ctxt,
                 (!hvm_translate_linear_addr(
                     x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
                     hvm_access_insn_fetch, sh_ctxt, &addr) &&
-                 !hvm_fetch_from_guest_virt(
+                 !hvm_fetch_from_guest_linear(
                      sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
                 ? sizeof(sh_ctxt->insn_buf) : 0;
             sh_ctxt->insn_buf_eip = regs->eip;
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 114aa04..78349f8 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -73,7 +73,7 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
     void *buf, paddr_t paddr, int size);
 
 /*
- * Copy to/from a guest virtual address. @pfec should include PFEC_user_mode
+ * Copy to/from a guest linear address. @pfec should include PFEC_user_mode
  * if emulating a user-mode access (CPL=3). All other flags in @pfec are
  * managed by the called function: it is therefore optional for the caller
  * to set them.
@@ -95,14 +95,14 @@ typedef struct pagefault_info
     int ec;
 } pagefault_info_t;
 
-enum hvm_copy_result hvm_copy_to_guest_virt(
-    unsigned long vaddr, void *buf, int size, uint32_t pfec,
+enum hvm_copy_result hvm_copy_to_guest_linear(
+    unsigned long addr, void *buf, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
-enum hvm_copy_result hvm_copy_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+enum hvm_copy_result hvm_copy_from_guest_linear(
+    void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
-enum hvm_copy_result hvm_fetch_from_guest_virt(
-    void *buf, unsigned long vaddr, int size, uint32_t pfec,
+enum hvm_copy_result hvm_fetch_from_guest_linear(
+    void *buf, unsigned long addr, int size, uint32_t pfec,
     pagefault_info_t *pfinfo);
 
 #define HVM_HCALL_completed  0 /* hypercall completed - no further action */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (15 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear() Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:56   ` Paul Durrant
                     ` (3 more replies)
  2016-11-28 11:13 ` [PATCH v2 18/19] x86/hvm: Prepare to allow use of system segments for memory references Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 19/19] x86/hvm: Use system-segment relative memory accesses Andrew Cooper
  18 siblings, 4 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Kevin Tian, Jan Beulich, Andrew Cooper, Tim Deegan, Paul Durrant,
	Jun Nakajima

Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
to inject the pagefault themselves.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: Tim Deegan <tim@xen.org>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/emulate.c        |  2 ++
 xen/arch/x86/hvm/hvm.c            | 11 +++++++++--
 xen/arch/x86/hvm/vmx/vvmx.c       | 20 +++++++++++++++-----
 xen/arch/x86/mm/shadow/common.c   |  1 +
 xen/include/asm-x86/hvm/support.h |  4 +---
 5 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index efd6d32..f07c026 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -799,6 +799,7 @@ static int __hvmemul_read(
     case HVMCOPY_okay:
         break;
     case HVMCOPY_bad_gva_to_gfn:
+        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
     case HVMCOPY_bad_gfn_to_mfn:
         if ( access_type == hvm_access_insn_fetch )
@@ -905,6 +906,7 @@ static int hvmemul_write(
     case HVMCOPY_okay:
         break;
     case HVMCOPY_bad_gva_to_gfn:
+        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
     case HVMCOPY_bad_gfn_to_mfn:
         return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, hvmemul_ctxt, 0);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 37eaee2..ce77520 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2927,6 +2927,8 @@ void hvm_task_switch(
 
     rc = hvm_copy_from_guest_linear(
         &tss, prev_tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
+    if ( rc == HVMCOPY_bad_gva_to_gfn )
+        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
     if ( rc != HVMCOPY_okay )
         goto out;
 
@@ -2965,11 +2967,15 @@ void hvm_task_switch(
                                   offsetof(typeof(tss), trace) -
                                   offsetof(typeof(tss), eip),
                                   PFEC_page_present, &pfinfo);
+    if ( rc == HVMCOPY_bad_gva_to_gfn )
+        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
     if ( rc != HVMCOPY_okay )
         goto out;
 
     rc = hvm_copy_from_guest_linear(
         &tss, tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
+    if ( rc == HVMCOPY_bad_gva_to_gfn )
+        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
     /*
      * Note: The HVMCOPY_gfn_shared case could be optimised, if the callee
      * functions knew we want RO access.
@@ -3012,7 +3018,10 @@ void hvm_task_switch(
                                       &tss.back_link, sizeof(tss.back_link), 0,
                                       &pfinfo);
         if ( rc == HVMCOPY_bad_gva_to_gfn )
+        {
+            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
             exn_raised = 1;
+        }
         else if ( rc != HVMCOPY_okay )
             goto out;
     }
@@ -3114,8 +3123,6 @@ static enum hvm_copy_result __hvm_copy(
                 {
                     pfinfo->linear = addr;
                     pfinfo->ec = pfec;
-
-                    hvm_inject_page_fault(pfec, addr);
                 }
                 return HVMCOPY_bad_gva_to_gfn;
             }
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index fd7ea0a..e6e9ebd 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -396,7 +396,6 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
     struct vcpu *v = current;
     union vmx_inst_info info;
     struct segment_register seg;
-    pagefault_info_t pfinfo;
     unsigned long base, index, seg_base, disp, offset;
     int scale, size;
 
@@ -451,10 +450,17 @@ static int decode_vmx_inst(struct cpu_user_regs *regs,
               offset + size - 1 > seg.limit) )
             goto gp_fault;
 
-        if ( poperandS != NULL &&
-             hvm_copy_from_guest_linear(poperandS, base, size, 0, &pfinfo)
-                  != HVMCOPY_okay )
-            return X86EMUL_EXCEPTION;
+        if ( poperandS != NULL )
+        {
+            pagefault_info_t pfinfo;
+            int rc = hvm_copy_from_guest_linear(poperandS, base, size,
+                                                0, &pfinfo);
+
+            if ( rc == HVMCOPY_bad_gva_to_gfn )
+                hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
+            if ( rc != HVMCOPY_okay )
+                return X86EMUL_EXCEPTION;
+        }
         decode->mem = base;
         decode->len = size;
     }
@@ -1623,6 +1629,8 @@ int nvmx_handle_vmptrst(struct cpu_user_regs *regs)
     gpa = nvcpu->nv_vvmcxaddr;
 
     rc = hvm_copy_to_guest_linear(decode.mem, &gpa, decode.len, 0, &pfinfo);
+    if ( rc == HVMCOPY_bad_gva_to_gfn )
+        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
     if ( rc != HVMCOPY_okay )
         return X86EMUL_EXCEPTION;
 
@@ -1694,6 +1702,8 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs)
     switch ( decode.type ) {
     case VMX_INST_MEMREG_TYPE_MEMORY:
         rc = hvm_copy_to_guest_linear(decode.mem, &value, decode.len, 0, &pfinfo);
+        if ( rc == HVMCOPY_bad_gva_to_gfn )
+            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
         if ( rc != HVMCOPY_okay )
             return X86EMUL_EXCEPTION;
         break;
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 0760e76..fbe49e1 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -198,6 +198,7 @@ hvm_read(enum x86_segment seg,
     case HVMCOPY_okay:
         return X86EMUL_OKAY;
     case HVMCOPY_bad_gva_to_gfn:
+        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &sh_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
     case HVMCOPY_bad_gfn_to_mfn:
     case HVMCOPY_unhandleable:
diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h
index 78349f8..3d767d7 100644
--- a/xen/include/asm-x86/hvm/support.h
+++ b/xen/include/asm-x86/hvm/support.h
@@ -85,9 +85,7 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
  *  HVMCOPY_bad_gva_to_gfn: Some guest virtual address did not have a valid
  *                          mapping to a guest physical address.  The
  *                          pagefault_info_t structure will be filled in if
- *                          provided, and a page fault exception is
- *                          automatically queued for injection into the
- *                          current HVM VCPU.
+ *                          provided.
  */
 typedef struct pagefault_info
 {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 18/19] x86/hvm: Prepare to allow use of system segments for memory references
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (16 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  2016-11-28 11:13 ` [PATCH v2 19/19] x86/hvm: Use system-segment relative memory accesses Andrew Cooper
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

All system segments (GDT/IDT/LDT and TR) describe a linear address and limit,
and act similarly to user segments.  However all current uses of these tables
in the emulator opencode the address calculations and limit checks.  In
particular, no care is taken for access which wrap around the 4GB or
non-canonical boundaries.

Alter hvm_virtual_to_linear_addr() to cope with performing segmentation checks
on system segments.  This involves restricting access checks in the 32bit case
to user segments only, and adding presence/limit checks in the 64bit case.

When suffering a segmentation fault for a system segments, return
X86EMUL_EXCEPTION but leave the fault injection to the caller.  The fault type
depends on the higher level action being performed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/emulate.c             | 14 ++++++++----
 xen/arch/x86/hvm/hvm.c                 | 40 ++++++++++++++++++++++------------
 xen/arch/x86/mm/shadow/common.c        | 12 +++++++---
 xen/arch/x86/x86_emulate/x86_emulate.h | 26 ++++++++++++++--------
 4 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f07c026..d3fd492 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -567,10 +567,16 @@ static int hvmemul_virtual_to_linear(
     if ( *reps != 1 )
         return X86EMUL_UNHANDLEABLE;
 
-    /* This is a singleton operation: fail it with an exception. */
-    x86_emul_hw_exception((seg == x86_seg_ss)
-                          ? TRAP_stack_error
-                          : TRAP_gp_fault, 0, &hvmemul_ctxt->ctxt);
+    /*
+     * Leave exception injection to the caller for non-user segments: We
+     * neither know the exact error code to be used, nor can we easily
+     * determine the kind of exception (#GP or #TS) in that case.
+     */
+    if ( is_x86_user_segment(seg) )
+        x86_emul_hw_exception((seg == x86_seg_ss)
+                              ? TRAP_stack_error
+                              : TRAP_gp_fault, 0, &hvmemul_ctxt->ctxt);
+
     return X86EMUL_EXCEPTION;
 }
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ce77520..5abdc3c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2497,24 +2497,28 @@ bool_t hvm_virtual_to_linear_addr(
         if ( !reg->attr.fields.p )
             goto out;
 
-        switch ( access_type )
+        /* Read/write restrictions only exist for user segments. */
+        if ( reg->attr.fields.s )
         {
-        case hvm_access_read:
-            if ( (reg->attr.fields.type & 0xa) == 0x8 )
-                goto out; /* execute-only code segment */
-            break;
-        case hvm_access_write:
-            if ( (reg->attr.fields.type & 0xa) != 0x2 )
-                goto out; /* not a writable data segment */
-            break;
-        default:
-            break;
+            switch ( access_type )
+            {
+            case hvm_access_read:
+                if ( (reg->attr.fields.type & 0xa) == 0x8 )
+                    goto out; /* execute-only code segment */
+                break;
+            case hvm_access_write:
+                if ( (reg->attr.fields.type & 0xa) != 0x2 )
+                    goto out; /* not a writable data segment */
+                break;
+            default:
+                break;
+            }
         }
 
         last_byte = (uint32_t)offset + bytes - !!bytes;
 
         /* Is this a grows-down data segment? Special limit check if so. */
-        if ( (reg->attr.fields.type & 0xc) == 0x4 )
+        if ( reg->attr.fields.s && (reg->attr.fields.type & 0xc) == 0x4 )
         {
             /* Is upper limit 0xFFFF or 0xFFFFFFFF? */
             if ( !reg->attr.fields.db )
@@ -2530,10 +2534,18 @@ bool_t hvm_virtual_to_linear_addr(
     else
     {
         /*
-         * LONG MODE: FS and GS add segment base. Addresses must be canonical.
+         * User segments are always treated as present.  System segment may
+         * not be, and also incur limit checks.
          */
+        if ( is_x86_system_segment(seg) &&
+             (!reg->attr.fields.p || (offset + bytes - !!bytes) > reg->limit) )
+            goto out;
 
-        if ( (seg == x86_seg_fs) || (seg == x86_seg_gs) )
+        /*
+         * LONG MODE: FS, GS and system segments: add segment base. All
+         * addresses must be canonical.
+         */
+        if ( seg >= x86_seg_fs )
             addr += reg->base;
 
         last_byte = addr + bytes - !!bytes;
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index fbe49e1..6c146f8 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -162,9 +162,15 @@ static int hvm_translate_linear_addr(
 
     if ( !okay )
     {
-        x86_emul_hw_exception(
-            (seg == x86_seg_ss) ? TRAP_stack_error : TRAP_gp_fault,
-            0, &sh_ctxt->ctxt);
+        /*
+         * Leave exception injection to the caller for non-user segments: We
+         * neither know the exact error code to be used, nor can we easily
+         * determine the kind of exception (#GP or #TS) in that case.
+         */
+        if ( is_x86_user_segment(seg) )
+            x86_emul_hw_exception(
+                (seg == x86_seg_ss) ? TRAP_stack_error : TRAP_gp_fault,
+                0, &sh_ctxt->ctxt);
         return X86EMUL_EXCEPTION;
     }
 
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 4679711..5af1958 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -27,7 +27,11 @@
 
 struct x86_emulate_ctxt;
 
-/* Comprehensive enumeration of x86 segment registers. */
+/*
+ * Comprehensive enumeration of x86 segment registers.  Various bits of code
+ * rely on this order (general purpose before system, tr at the beginning of
+ * system).
+ */
 enum x86_segment {
     /* General purpose.  Matches the SReg3 encoding in opcode/ModRM bytes. */
     x86_seg_es,
@@ -36,21 +40,25 @@ enum x86_segment {
     x86_seg_ds,
     x86_seg_fs,
     x86_seg_gs,
-    /* System. */
+    /* System: Valid to use for implicit table references. */
     x86_seg_tr,
     x86_seg_ldtr,
     x86_seg_gdtr,
     x86_seg_idtr,
-    /*
-     * Dummy: used to emulate direct processor accesses to management
-     * structures (TSS, GDT, LDT, IDT, etc.) which use linear addressing
-     * (no segment component) and bypass usual segment- and page-level
-     * protection checks.
-     */
+    /* No Segment: For accesses which are already linear. */
     x86_seg_none
 };
 
-#define is_x86_user_segment(seg) ((unsigned)(seg) <= x86_seg_gs)
+static inline bool is_x86_user_segment(enum x86_segment seg)
+{
+    unsigned int idx = seg;
+
+    return idx <= x86_seg_gs;
+}
+static inline bool is_x86_system_segment(enum x86_segment seg)
+{
+    return seg >= x86_seg_tr && seg < x86_seg_none;
+}
 
 /* Classification of the types of software generated interrupts/exceptions. */
 enum x86_swint_type {
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v2 19/19] x86/hvm: Use system-segment relative memory accesses
  2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
                   ` (17 preceding siblings ...)
  2016-11-28 11:13 ` [PATCH v2 18/19] x86/hvm: Prepare to allow use of system segments for memory references Andrew Cooper
@ 2016-11-28 11:13 ` Andrew Cooper
  18 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:13 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

With hvm_virtual_to_linear_addr() capable of doing proper system-segment
relative memory accesses, avoid open-coding the address and limit calculations
locally.

When a table spans the 4GB boundary (32bit) or non-canonical boundary (64bit),
segmentation errors are now raised.  Previously, the use of x86_seg_none
resulted in segmentation being skipped, and the linear address being truncated
through the pagewalk, and possibly coming out valid on the far side.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
v2:
 * Shorten exception handling
 * Replace ->cmpxchg() assertion with proper exception handling
---
 xen/arch/x86/hvm/hvm.c                 |   8 +++
 xen/arch/x86/x86_emulate/x86_emulate.c | 123 +++++++++++++++++++++------------
 2 files changed, 85 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5abdc3c..dd4df47 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2470,6 +2470,14 @@ bool_t hvm_virtual_to_linear_addr(
     unsigned long addr = offset, last_byte;
     bool_t okay = 0;
 
+    /*
+     * These checks are for a memory access through an active segment.
+     *
+     * It is expected that the access rights of reg are suitable for seg (and
+     * that this is enforced at the point that seg is loaded).
+     */
+    ASSERT(seg < x86_seg_none);
+
     if ( !(current->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
     {
         /*
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index fa6fba1..dad696c 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -1181,20 +1181,36 @@ static int ioport_access_check(
         return rc;
 
     /* Ensure the TSS has an io-bitmap-offset field. */
-    generate_exception_if(tr.attr.fields.type != 0xb ||
-                          tr.limit < 0x67, EXC_GP, 0);
+    generate_exception_if(tr.attr.fields.type != 0xb, EXC_GP, 0);
 
-    if ( (rc = read_ulong(x86_seg_none, tr.base + 0x66,
-                          &iobmp, 2, ctxt, ops)) )
+    switch ( rc = read_ulong(x86_seg_tr, 0x66, &iobmp, 2, ctxt, ops) )
+    {
+    case X86EMUL_OKAY:
+        break;
+
+    case X86EMUL_EXCEPTION:
+        generate_exception_if(!ctxt->event_pending, EXC_GP, 0);
+        /* fallthrough */
+
+    default:
         return rc;
+    }
 
-    /* Ensure TSS includes two bytes including byte containing first port. */
-    iobmp += first_port / 8;
-    generate_exception_if(tr.limit <= iobmp, EXC_GP, 0);
+    /* Read two bytes including byte containing first port. */
+    switch ( rc = read_ulong(x86_seg_tr, iobmp + first_port / 8,
+                             &iobmp, 2, ctxt, ops) )
+    {
+    case X86EMUL_OKAY:
+        break;
+
+    case X86EMUL_EXCEPTION:
+        generate_exception_if(!ctxt->event_pending, EXC_GP, 0);
+        /* fallthrough */
 
-    if ( (rc = read_ulong(x86_seg_none, tr.base + iobmp,
-                          &iobmp, 2, ctxt, ops)) )
+    default:
         return rc;
+    }
+
     generate_exception_if(iobmp & (((1 << bytes) - 1) << (first_port & 7)),
                           EXC_GP, 0);
 
@@ -1317,9 +1333,12 @@ realmode_load_seg(
     struct x86_emulate_ctxt *ctxt,
     const struct x86_emulate_ops *ops)
 {
-    int rc = ops->read_segment(seg, sreg, ctxt);
+    int rc;
+
+    if ( !ops->read_segment )
+        return X86EMUL_UNHANDLEABLE;
 
-    if ( !rc )
+    if ( (rc = ops->read_segment(seg, sreg, ctxt)) == X86EMUL_OKAY )
     {
         sreg->sel  = sel;
         sreg->base = (uint32_t)sel << 4;
@@ -1336,7 +1355,7 @@ protmode_load_seg(
     struct x86_emulate_ctxt *ctxt,
     const struct x86_emulate_ops *ops)
 {
-    struct segment_register desctab;
+    enum x86_segment sel_seg = (sel & 4) ? x86_seg_ldtr : x86_seg_gdtr;
     struct { uint32_t a, b; } desc;
     uint8_t dpl, rpl;
     int cpl = get_cpl(ctxt, ops);
@@ -1369,21 +1388,19 @@ protmode_load_seg(
     if ( !is_x86_user_segment(seg) && (sel & 4) )
         goto raise_exn;
 
-    if ( (rc = ops->read_segment((sel & 4) ? x86_seg_ldtr : x86_seg_gdtr,
-                                 &desctab, ctxt)) )
-        return rc;
-
-    /* Segment not valid for use (cooked meaning of .p)? */
-    if ( !desctab.attr.fields.p )
-        goto raise_exn;
+    switch ( rc = ops->read(sel_seg, sel & 0xfff8, &desc, sizeof(desc), ctxt) )
+    {
+    case X86EMUL_OKAY:
+        break;
 
-    /* Check against descriptor table limit. */
-    if ( ((sel & 0xfff8) + 7) > desctab.limit )
-        goto raise_exn;
+    case X86EMUL_EXCEPTION:
+        if ( !ctxt->event_pending )
+            goto raise_exn;
+        /* fallthrough */
 
-    if ( (rc = ops->read(x86_seg_none, desctab.base + (sel & 0xfff8),
-                         &desc, sizeof(desc), ctxt)) )
+    default:
         return rc;
+    }
 
     if ( !is_x86_user_segment(seg) )
     {
@@ -1471,9 +1488,20 @@ protmode_load_seg(
     {
         uint32_t new_desc_b = desc.b | a_flag;
 
-        if ( (rc = ops->cmpxchg(x86_seg_none, desctab.base + (sel & 0xfff8) + 4,
-                                &desc.b, &new_desc_b, 4, ctxt)) != 0 )
+        switch ( (rc = ops->cmpxchg(sel_seg, (sel & 0xfff8) + 4, &desc.b,
+                                    &new_desc_b, sizeof(desc.b), ctxt)) )
+        {
+        case X86EMUL_OKAY:
+            break;
+
+        case X86EMUL_EXCEPTION:
+            if ( !ctxt->event_pending )
+                goto raise_exn;
+            /* fallthrough */
+
+        default:
             return rc;
+        }
 
         /* Force the Accessed flag in our local copy. */
         desc.b = new_desc_b;
@@ -1507,8 +1535,7 @@ load_seg(
     struct segment_register reg;
     int rc;
 
-    if ( (ops->read_segment == NULL) ||
-         (ops->write_segment == NULL) )
+    if ( !ops->write_segment )
         return X86EMUL_UNHANDLEABLE;
 
     if ( !sreg )
@@ -1636,8 +1663,7 @@ static int inject_swint(enum x86_swint_type type,
         if ( !in_realmode(ctxt, ops) )
         {
             unsigned int idte_size, idte_offset;
-            struct segment_register idtr;
-            uint32_t idte_ctl;
+            struct { uint32_t a, b, c, d; } idte;
             int lm = in_longmode(ctxt, ops);
 
             if ( lm < 0 )
@@ -1660,24 +1686,30 @@ static int inject_swint(enum x86_swint_type type,
                  ((ctxt->regs->eflags & EFLG_IOPL) != EFLG_IOPL) )
                 goto raise_exn;
 
-            fail_if(ops->read_segment == NULL);
             fail_if(ops->read == NULL);
-            if ( (rc = ops->read_segment(x86_seg_idtr, &idtr, ctxt)) )
-                goto done;
-
-            if ( (idte_offset + idte_size - 1) > idtr.limit )
-                goto raise_exn;
 
             /*
-             * Should strictly speaking read all 8/16 bytes of an entry,
-             * but we currently only care about the dpl and present bits.
+             * Read all 8/16 bytes so the idtr limit check is applied properly
+             * to this entry, even though we only end up looking at the 2nd
+             * word.
              */
-            if ( (rc = ops->read(x86_seg_none, idtr.base + idte_offset + 4,
-                                 &idte_ctl, sizeof(idte_ctl), ctxt)) )
-                goto done;
+            switch ( rc = ops->read(x86_seg_idtr, idte_offset,
+                                    &idte, idte_size, ctxt) )
+            {
+            case X86EMUL_OKAY:
+                break;
+
+            case X86EMUL_EXCEPTION:
+                if ( !ctxt->event_pending )
+                    goto raise_exn;
+                /* fallthrough */
+
+            default:
+                return rc;
+            }
 
             /* Is this entry present? */
-            if ( !(idte_ctl & (1u << 15)) )
+            if ( !(idte.b & (1u << 15)) )
             {
                 fault_type = EXC_NP;
                 goto raise_exn;
@@ -1686,12 +1718,11 @@ static int inject_swint(enum x86_swint_type type,
             /* icebp counts as a hardware event, and bypasses the dpl check. */
             if ( type != x86_swint_icebp )
             {
-                struct segment_register ss;
+                int cpl = get_cpl(ctxt, ops);
 
-                if ( (rc = ops->read_segment(x86_seg_ss, &ss, ctxt)) )
-                    goto done;
+                fail_if(cpl < 0);
 
-                if ( ss.attr.fields.dpl > ((idte_ctl >> 13) & 3) )
+                if ( cpl > ((idte.b >> 13) & 3) )
                     goto raise_exn;
             }
         }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary
  2016-11-28 11:13 ` [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary Andrew Cooper
@ 2016-11-28 11:55   ` Tim Deegan
  2016-11-29 15:24   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 11:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 11:13 +0000 on 28 Nov (1480331598), Andrew Cooper wrote:
> When translating the second frame of a write crossing a page boundary, mask
> the linear address down to the page boundary.
> 
> This causes the correct %cr2 being reported to the guest in the case that the
> second frame suffers a pagefault during translation.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED
  2016-11-28 11:13 ` [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED Andrew Cooper
@ 2016-11-28 11:55   ` Tim Deegan
  2016-11-29 15:29   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 11:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 11:13 +0000 on 28 Nov (1480331599), Andrew Cooper wrote:
> X86EMUL_CMPXCHG_FAILED was introduced in c/s d430aae25 in 2005.  Even at the
> time it alised what is now X86EMUL_RETRY (as well as what is now
> X86EMUL_EXCEPTION).  I am not sure why the distinction was considered useful
> at the time.
> 
> It is only used twice; there is no need to call it out differently from other
> uses of X86EMUL_RETRY.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
@ 2016-11-28 11:56   ` Paul Durrant
  2016-11-28 12:58     ` Andrew Cooper
  2016-11-28 14:56   ` Tim Deegan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 11:56 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Kevin Tian, Tim (Xen.org), Jun Nakajima, Jan Beulich

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 28 November 2016 11:14
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; Jun Nakajima <jun.nakajima@intel.com>; Kevin
> Tian <kevin.tian@intel.com>
> Subject: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind
> the emulators back
> 
> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require
> callers
> to inject the pagefault themselves.
> 
> No functional change.

That's not the way it looks on the face of it. You've indeed removed the call to hvm_inject_page_fault() but some of the callers now call x86_emul_pagefault(). I'd call that a functional change... clearly the change you intended, but still a functional change.

  Paul

> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Paul Durrant <paul.durrant@citrix.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> ---
>  xen/arch/x86/hvm/emulate.c        |  2 ++
>  xen/arch/x86/hvm/hvm.c            | 11 +++++++++--
>  xen/arch/x86/hvm/vmx/vvmx.c       | 20 +++++++++++++++-----
>  xen/arch/x86/mm/shadow/common.c   |  1 +
>  xen/include/asm-x86/hvm/support.h |  4 +---
>  5 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index efd6d32..f07c026 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -799,6 +799,7 @@ static int __hvmemul_read(
>      case HVMCOPY_okay:
>          break;
>      case HVMCOPY_bad_gva_to_gfn:
> +        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>          return X86EMUL_EXCEPTION;
>      case HVMCOPY_bad_gfn_to_mfn:
>          if ( access_type == hvm_access_insn_fetch )
> @@ -905,6 +906,7 @@ static int hvmemul_write(
>      case HVMCOPY_okay:
>          break;
>      case HVMCOPY_bad_gva_to_gfn:
> +        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>          return X86EMUL_EXCEPTION;
>      case HVMCOPY_bad_gfn_to_mfn:
>          return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec,
> hvmemul_ctxt, 0);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 37eaee2..ce77520 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2927,6 +2927,8 @@ void hvm_task_switch(
> 
>      rc = hvm_copy_from_guest_linear(
>          &tss, prev_tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
> +    if ( rc == HVMCOPY_bad_gva_to_gfn )
> +        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>      if ( rc != HVMCOPY_okay )
>          goto out;
> 
> @@ -2965,11 +2967,15 @@ void hvm_task_switch(
>                                    offsetof(typeof(tss), trace) -
>                                    offsetof(typeof(tss), eip),
>                                    PFEC_page_present, &pfinfo);
> +    if ( rc == HVMCOPY_bad_gva_to_gfn )
> +        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>      if ( rc != HVMCOPY_okay )
>          goto out;
> 
>      rc = hvm_copy_from_guest_linear(
>          &tss, tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
> +    if ( rc == HVMCOPY_bad_gva_to_gfn )
> +        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>      /*
>       * Note: The HVMCOPY_gfn_shared case could be optimised, if the callee
>       * functions knew we want RO access.
> @@ -3012,7 +3018,10 @@ void hvm_task_switch(
>                                        &tss.back_link, sizeof(tss.back_link), 0,
>                                        &pfinfo);
>          if ( rc == HVMCOPY_bad_gva_to_gfn )
> +        {
> +            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>              exn_raised = 1;
> +        }
>          else if ( rc != HVMCOPY_okay )
>              goto out;
>      }
> @@ -3114,8 +3123,6 @@ static enum hvm_copy_result __hvm_copy(
>                  {
>                      pfinfo->linear = addr;
>                      pfinfo->ec = pfec;
> -
> -                    hvm_inject_page_fault(pfec, addr);
>                  }
>                  return HVMCOPY_bad_gva_to_gfn;
>              }
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> b/xen/arch/x86/hvm/vmx/vvmx.c
> index fd7ea0a..e6e9ebd 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -396,7 +396,6 @@ static int decode_vmx_inst(struct cpu_user_regs
> *regs,
>      struct vcpu *v = current;
>      union vmx_inst_info info;
>      struct segment_register seg;
> -    pagefault_info_t pfinfo;
>      unsigned long base, index, seg_base, disp, offset;
>      int scale, size;
> 
> @@ -451,10 +450,17 @@ static int decode_vmx_inst(struct cpu_user_regs
> *regs,
>                offset + size - 1 > seg.limit) )
>              goto gp_fault;
> 
> -        if ( poperandS != NULL &&
> -             hvm_copy_from_guest_linear(poperandS, base, size, 0, &pfinfo)
> -                  != HVMCOPY_okay )
> -            return X86EMUL_EXCEPTION;
> +        if ( poperandS != NULL )
> +        {
> +            pagefault_info_t pfinfo;
> +            int rc = hvm_copy_from_guest_linear(poperandS, base, size,
> +                                                0, &pfinfo);
> +
> +            if ( rc == HVMCOPY_bad_gva_to_gfn )
> +                hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
> +            if ( rc != HVMCOPY_okay )
> +                return X86EMUL_EXCEPTION;
> +        }
>          decode->mem = base;
>          decode->len = size;
>      }
> @@ -1623,6 +1629,8 @@ int nvmx_handle_vmptrst(struct cpu_user_regs
> *regs)
>      gpa = nvcpu->nv_vvmcxaddr;
> 
>      rc = hvm_copy_to_guest_linear(decode.mem, &gpa, decode.len, 0,
> &pfinfo);
> +    if ( rc == HVMCOPY_bad_gva_to_gfn )
> +        hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>      if ( rc != HVMCOPY_okay )
>          return X86EMUL_EXCEPTION;
> 
> @@ -1694,6 +1702,8 @@ int nvmx_handle_vmread(struct cpu_user_regs
> *regs)
>      switch ( decode.type ) {
>      case VMX_INST_MEMREG_TYPE_MEMORY:
>          rc = hvm_copy_to_guest_linear(decode.mem, &value, decode.len, 0,
> &pfinfo);
> +        if ( rc == HVMCOPY_bad_gva_to_gfn )
> +            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>          if ( rc != HVMCOPY_okay )
>              return X86EMUL_EXCEPTION;
>          break;
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index 0760e76..fbe49e1 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -198,6 +198,7 @@ hvm_read(enum x86_segment seg,
>      case HVMCOPY_okay:
>          return X86EMUL_OKAY;
>      case HVMCOPY_bad_gva_to_gfn:
> +        x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &sh_ctxt->ctxt);
>          return X86EMUL_EXCEPTION;
>      case HVMCOPY_bad_gfn_to_mfn:
>      case HVMCOPY_unhandleable:
> diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-
> x86/hvm/support.h
> index 78349f8..3d767d7 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -85,9 +85,7 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
>   *  HVMCOPY_bad_gva_to_gfn: Some guest virtual address did not have a
> valid
>   *                          mapping to a guest physical address.  The
>   *                          pagefault_info_t structure will be filled in if
> - *                          provided, and a page fault exception is
> - *                          automatically queued for injection into the
> - *                          current HVM VCPU.
> + *                          provided.
>   */
>  typedef struct pagefault_info
>  {
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 03/19] x86/emul: Simplfy emulation state setup
  2016-11-28 11:13 ` [PATCH v2 03/19] x86/emul: Simplfy emulation state setup Andrew Cooper
@ 2016-11-28 11:58   ` Paul Durrant
  2016-11-28 12:54   ` Paul Durrant
  1 sibling, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 11:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, George Dunlap

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 28 November 2016 11:13
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 03/19] x86/emul: Simplfy emulation state setup
> 
> The current code to set up emulation state is ad-hoc and error prone.
> 
>  * Consistently zero all emulation state structures.
>  * Avoid explicitly initialising some state to 0.
>  * Explicitly identify all input and output state in x86_emulate_ctxt.  This
>    involves rearanging some fields.
>  * Have x86_decode() explicitly initalise all output state at its start.
> 
> While making the above changes, two minor tweaks:
> 
>  * Move the calculation of hvmemul_ctxt->ctxt.swint_emulate from
>    _hvm_emulate_one() to hvm_emulate_init_once().  It doesn't need
>    recalculating for each instruction.
>  * Change force_writeback to being a boolean, to match its use.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Tim Deegan <tim@xen.org>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> v2:
>  * Split x86_emulate_ctxt into three sections
> ---
>  xen/arch/x86/hvm/emulate.c             | 28 +++++++++++++++-------------
>  xen/arch/x86/mm.c                      | 14 ++++++++------
>  xen/arch/x86/mm/shadow/common.c        |  4 ++--
>  xen/arch/x86/x86_emulate/x86_emulate.c |  1 +
>  xen/arch/x86/x86_emulate/x86_emulate.h | 32
> ++++++++++++++++++++++----------
>  5 files changed, 48 insertions(+), 31 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index f1f6e2f..3efeead 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1770,13 +1770,6 @@ static int _hvm_emulate_one(struct
> hvm_emulate_ctxt *hvmemul_ctxt,
> 
>      vio->mmio_retry = 0;
> 
> -    if ( cpu_has_vmx )
> -        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> -    else if ( cpu_has_svm_nrips )
> -        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
> -    else
> -        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
> -
>      rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
> 
>      if ( rc == X86EMUL_OKAY && vio->mmio_retry )
> @@ -1947,14 +1940,23 @@ void hvm_emulate_init_once(
>      struct hvm_emulate_ctxt *hvmemul_ctxt,
>      struct cpu_user_regs *regs)
>  {
> -    hvmemul_ctxt->intr_shadow =
> hvm_funcs.get_interrupt_shadow(current);
> -    hvmemul_ctxt->ctxt.regs = regs;
> -    hvmemul_ctxt->ctxt.force_writeback = 1;
> -    hvmemul_ctxt->seg_reg_accessed = 0;
> -    hvmemul_ctxt->seg_reg_dirty = 0;
> -    hvmemul_ctxt->set_context = 0;
> +    struct vcpu *curr = current;
> +
> +    memset(hvmemul_ctxt, 0, sizeof(*hvmemul_ctxt));
> +
> +    hvmemul_ctxt->intr_shadow = hvm_funcs.get_interrupt_shadow(curr);
>      hvmemul_get_seg_reg(x86_seg_cs, hvmemul_ctxt);
>      hvmemul_get_seg_reg(x86_seg_ss, hvmemul_ctxt);
> +
> +    hvmemul_ctxt->ctxt.regs = regs;
> +    hvmemul_ctxt->ctxt.force_writeback = true;
> +
> +    if ( cpu_has_vmx )
> +        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> +    else if ( cpu_has_svm_nrips )
> +        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
> +    else
> +        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
>  }
> 
>  void hvm_emulate_init_per_insn(
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 5b0e9f3..d365f59 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5337,7 +5337,14 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned
> long addr,
>      struct domain *d = v->domain;
>      struct page_info *page;
>      l1_pgentry_t      pte;
> -    struct ptwr_emulate_ctxt ptwr_ctxt;
> +    struct ptwr_emulate_ctxt ptwr_ctxt = {
> +        .ctxt = {
> +            .regs = regs,
> +            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
> +            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
> +            .swint_emulate = x86_swint_emulate_none,
> +        },
> +    };
>      int rc;
> 
>      /* Attempt to read the PTE that maps the VA being accessed. */
> @@ -5363,11 +5370,6 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned
> long addr,
>          goto bail;
>      }
> 
> -    ptwr_ctxt.ctxt.regs = regs;
> -    ptwr_ctxt.ctxt.force_writeback = 0;
> -    ptwr_ctxt.ctxt.addr_size = ptwr_ctxt.ctxt.sp_size =
> -        is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG;
> -    ptwr_ctxt.ctxt.swint_emulate = x86_swint_emulate_none;
>      ptwr_ctxt.cr2 = addr;
>      ptwr_ctxt.pte = pte;
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index 7e5b8b0..a4a3c4b 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -385,8 +385,9 @@ const struct x86_emulate_ops
> *shadow_init_emulation(
>      struct vcpu *v = current;
>      unsigned long addr;
> 
> +    memset(sh_ctxt, 0, sizeof(*sh_ctxt));
> +
>      sh_ctxt->ctxt.regs = regs;
> -    sh_ctxt->ctxt.force_writeback = 0;
>      sh_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> 
>      if ( is_pv_vcpu(v) )
> @@ -396,7 +397,6 @@ const struct x86_emulate_ops
> *shadow_init_emulation(
>      }
> 
>      /* Segment cache initialisation. Primed with CS. */
> -    sh_ctxt->valid_seg_regs = 0;
>      creg = hvm_get_seg_reg(x86_seg_cs, sh_ctxt);
> 
>      /* Work out the emulation mode. */
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
> b/xen/arch/x86/x86_emulate/x86_emulate.c
> index d82e85d..532bd32 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -1904,6 +1904,7 @@ x86_decode(
>      state->regs = ctxt->regs;
>      state->eip = ctxt->regs->eip;
> 
> +    /* Initialise output state in x86_emulate_ctxt */
>      ctxt->retire.byte = 0;
> 
>      op_bytes = def_op_bytes = ad_bytes = def_ad_bytes = ctxt-
> >addr_size/8;
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h
> b/xen/arch/x86/x86_emulate/x86_emulate.h
> index ec824ce..ab566c0 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -410,6 +410,23 @@ struct cpu_user_regs;
> 
>  struct x86_emulate_ctxt
>  {
> +    /*
> +     * Input-only state:
> +     */
> +
> +    /* Software event injection support. */
> +    enum x86_swint_emulation swint_emulate;
> +
> +    /* Set this if writes may have side effects. */
> +    bool force_writeback;
> +
> +    /* Caller data that can be used by x86_emulate_ops' routines. */
> +    void *data;
> +
> +    /*
> +     * Input/output state:
> +     */
> +
>      /* Register state before/after emulation. */
>      struct cpu_user_regs *regs;
> 
> @@ -419,14 +436,12 @@ struct x86_emulate_ctxt
>      /* Stack pointer width in bits (16, 32 or 64). */
>      unsigned int sp_size;
> 
> -    /* Canonical opcode (see below). */
> -    unsigned int opcode;
> -
> -    /* Software event injection support. */
> -    enum x86_swint_emulation swint_emulate;
> +    /*
> +     * Output-only state:
> +     */
> 
> -    /* Set this if writes may have side effects. */
> -    uint8_t force_writeback;
> +    /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
> +    unsigned int opcode;
> 
>      /* Retirement state, set by the emulator (valid only on X86EMUL_OKAY).
> */
>      union {
> @@ -437,9 +452,6 @@ struct x86_emulate_ctxt
>          } flags;
>          uint8_t byte;
>      } retire;
> -
> -    /* Caller data that can be used by x86_emulate_ops' routines. */
> -    void *data;
>  };
> 
>  /*
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-28 11:13 ` [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}() Andrew Cooper
@ 2016-11-28 11:58   ` Tim Deegan
  2016-11-28 11:59     ` Andrew Cooper
  2016-11-29 16:00   ` Jan Beulich
  1 sibling, 1 reply; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 11:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 11:13 +0000 on 28 Nov (1480331603), Andrew Cooper wrote:
> To help with event injection improvements for the PV uses of x86_emulate(),
> implement a event injection API which matches its hvm counterpart.
> 
> This is started with taking do_guest_trap() and modifying its calling API to
> pv_inject_event(), subsequentally implementing the former in terms of the
> latter.
> 
> The existing propagate_page_fault() is fairly similar to
> pv_inject_page_fault(), although it has a return value.  Only a single caller
> makes use of the return value, and non-NULL is only returned if the passed cr2
> is non-canonical.  Opencode this single case in
> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
> void.
> 
> The #PF specific bits are moved into pv_inject_event(), and
> pv_inject_page_fault() is implemented as a static inline wrapper.
> reserved_bit_page_fault() is pure code motion.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Tim Deegan <tim@xen.org>

with one note:

> +    if ( vector == TRAP_page_fault )
> +    {
> +        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
> +        arch_set_cr2(v, event->cr2);
> +
> +        /* Re-set error_code.user flag appropriately for the guest. */
> +        error_code &= ~PFEC_user_mode;
> +        if ( !guest_kernel_mode(v, regs) )
> +            error_code |= PFEC_user_mode;

I can see that you're just moving this code, but isn't it wrong for
what are now called "implicit" accesses?  My Ack stands on this patch
regardless.

Cheers,

Tim.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()
  2016-11-28 11:13 ` [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear() Andrew Cooper
@ 2016-11-28 11:59   ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 11:59 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 28 November 2016 11:14
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to
> hvm_copy_*_guest_linear()
> 
> The functions use linear addresses, not virtual addresses, as no
> segmentation
> is used.  (Lots of other code in Xen makes this mistake.)
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Tim Deegan <tim@xen.org>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> CC: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
>  xen/arch/x86/hvm/emulate.c        | 12 ++++----
>  xen/arch/x86/hvm/hvm.c            | 60 +++++++++++++++++++------------------
> --
>  xen/arch/x86/hvm/vmx/vvmx.c       |  6 ++--
>  xen/arch/x86/mm/shadow/common.c   |  8 +++---
>  xen/include/asm-x86/hvm/support.h | 14 ++++-----
>  5 files changed, 50 insertions(+), 50 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 5165bb2..efd6d32 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -791,8 +791,8 @@ static int __hvmemul_read(
>          pfec |= PFEC_user_mode;
> 
>      rc = ((access_type == hvm_access_insn_fetch) ?
> -          hvm_fetch_from_guest_virt(p_data, addr, bytes, pfec, &pfinfo) :
> -          hvm_copy_from_guest_virt(p_data, addr, bytes, pfec, &pfinfo));
> +          hvm_fetch_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo) :
> +          hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo));
> 
>      switch ( rc )
>      {
> @@ -898,7 +898,7 @@ static int hvmemul_write(
>           (hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3) )
>          pfec |= PFEC_user_mode;
> 
> -    rc = hvm_copy_to_guest_virt(addr, p_data, bytes, pfec, &pfinfo);
> +    rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo);
> 
>      switch ( rc )
>      {
> @@ -1947,9 +1947,9 @@ void hvm_emulate_init_per_insn(
>                                          hvm_access_insn_fetch,
>                                          hvmemul_ctxt->ctxt.addr_size,
>                                          &addr) &&
> -             hvm_fetch_from_guest_virt(hvmemul_ctxt->insn_buf, addr,
> -                                       sizeof(hvmemul_ctxt->insn_buf),
> -                                       pfec, NULL) == HVMCOPY_okay) ?
> +             hvm_fetch_from_guest_linear(hvmemul_ctxt->insn_buf, addr,
> +                                         sizeof(hvmemul_ctxt->insn_buf),
> +                                         pfec, NULL) == HVMCOPY_okay) ?
>              sizeof(hvmemul_ctxt->insn_buf) : 0;
>      }
>      else
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5eae06a..37eaee2 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2925,7 +2925,7 @@ void hvm_task_switch(
>          goto out;
>      }
> 
> -    rc = hvm_copy_from_guest_virt(
> +    rc = hvm_copy_from_guest_linear(
>          &tss, prev_tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
>      if ( rc != HVMCOPY_okay )
>          goto out;
> @@ -2960,15 +2960,15 @@ void hvm_task_switch(
>      hvm_get_segment_register(v, x86_seg_ldtr, &segr);
>      tss.ldt = segr.sel;
> 
> -    rc = hvm_copy_to_guest_virt(prev_tr.base + offsetof(typeof(tss), eip),
> -                                &tss.eip,
> -                                offsetof(typeof(tss), trace) -
> -                                offsetof(typeof(tss), eip),
> -                                PFEC_page_present, &pfinfo);
> +    rc = hvm_copy_to_guest_linear(prev_tr.base + offsetof(typeof(tss), eip),
> +                                  &tss.eip,
> +                                  offsetof(typeof(tss), trace) -
> +                                  offsetof(typeof(tss), eip),
> +                                  PFEC_page_present, &pfinfo);
>      if ( rc != HVMCOPY_okay )
>          goto out;
> 
> -    rc = hvm_copy_from_guest_virt(
> +    rc = hvm_copy_from_guest_linear(
>          &tss, tr.base, sizeof(tss), PFEC_page_present, &pfinfo);
>      /*
>       * Note: The HVMCOPY_gfn_shared case could be optimised, if the callee
> @@ -3008,9 +3008,9 @@ void hvm_task_switch(
>          regs->eflags |= X86_EFLAGS_NT;
>          tss.back_link = prev_tr.sel;
> 
> -        rc = hvm_copy_to_guest_virt(tr.base + offsetof(typeof(tss), back_link),
> -                                    &tss.back_link, sizeof(tss.back_link), 0,
> -                                    &pfinfo);
> +        rc = hvm_copy_to_guest_linear(tr.base + offsetof(typeof(tss),
> back_link),
> +                                      &tss.back_link, sizeof(tss.back_link), 0,
> +                                      &pfinfo);
>          if ( rc == HVMCOPY_bad_gva_to_gfn )
>              exn_raised = 1;
>          else if ( rc != HVMCOPY_okay )
> @@ -3047,8 +3047,8 @@ void hvm_task_switch(
>                                          16 << segr.attr.fields.db,
>                                          &linear_addr) )
>          {
> -            rc = hvm_copy_to_guest_virt(linear_addr, &errcode, opsz, 0,
> -                                        &pfinfo);
> +            rc = hvm_copy_to_guest_linear(linear_addr, &errcode, opsz, 0,
> +                                          &pfinfo);
>              if ( rc == HVMCOPY_bad_gva_to_gfn )
>                  exn_raised = 1;
>              else if ( rc != HVMCOPY_okay )
> @@ -3067,7 +3067,7 @@ void hvm_task_switch(
>  #define HVMCOPY_from_guest (0u<<0)
>  #define HVMCOPY_to_guest   (1u<<0)
>  #define HVMCOPY_phys       (0u<<2)
> -#define HVMCOPY_virt       (1u<<2)
> +#define HVMCOPY_linear     (1u<<2)
>  static enum hvm_copy_result __hvm_copy(
>      void *buf, paddr_t addr, int size, unsigned int flags, uint32_t pfec,
>      pagefault_info_t *pfinfo)
> @@ -3101,7 +3101,7 @@ static enum hvm_copy_result __hvm_copy(
> 
>          count = min_t(int, PAGE_SIZE - gpa, todo);
> 
> -        if ( flags & HVMCOPY_virt )
> +        if ( flags & HVMCOPY_linear )
>          {
>              gfn = paging_gva_to_gfn(curr, addr, &pfec);
>              if ( gfn == gfn_x(INVALID_GFN) )
> @@ -3295,30 +3295,30 @@ enum hvm_copy_result
> hvm_copy_from_guest_phys(
>                        HVMCOPY_from_guest | HVMCOPY_phys, 0, NULL);
>  }
> 
> -enum hvm_copy_result hvm_copy_to_guest_virt(
> -    unsigned long vaddr, void *buf, int size, uint32_t pfec,
> +enum hvm_copy_result hvm_copy_to_guest_linear(
> +    unsigned long addr, void *buf, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_to_guest | HVMCOPY_virt,
> +    return __hvm_copy(buf, addr, size,
> +                      HVMCOPY_to_guest | HVMCOPY_linear,
>                        PFEC_page_present | PFEC_write_access | pfec, pfinfo);
>  }
> 
> -enum hvm_copy_result hvm_copy_from_guest_virt(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec,
> +enum hvm_copy_result hvm_copy_from_guest_linear(
> +    void *buf, unsigned long addr, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_virt,
> +    return __hvm_copy(buf, addr, size,
> +                      HVMCOPY_from_guest | HVMCOPY_linear,
>                        PFEC_page_present | pfec, pfinfo);
>  }
> 
> -enum hvm_copy_result hvm_fetch_from_guest_virt(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec,
> +enum hvm_copy_result hvm_fetch_from_guest_linear(
> +    void *buf, unsigned long addr, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo)
>  {
> -    return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_virt,
> +    return __hvm_copy(buf, addr, size,
> +                      HVMCOPY_from_guest | HVMCOPY_linear,
>                        PFEC_page_present | PFEC_insn_fetch | pfec, pfinfo);
>  }
> 
> @@ -3333,7 +3333,7 @@ unsigned long copy_to_user_hvm(void *to, const
> void *from, unsigned int len)
>          return 0;
>      }
> 
> -    rc = hvm_copy_to_guest_virt((unsigned long)to, (void *)from, len, 0,
> NULL);
> +    rc = hvm_copy_to_guest_linear((unsigned long)to, (void *)from, len, 0,
> NULL);
>      return rc ? len : 0; /* fake a copy_to_user() return code */
>  }
> 
> @@ -3363,7 +3363,7 @@ unsigned long copy_from_user_hvm(void *to,
> const void *from, unsigned len)
>          return 0;
>      }
> 
> -    rc = hvm_copy_from_guest_virt(to, (unsigned long)from, len, 0, NULL);
> +    rc = hvm_copy_from_guest_linear(to, (unsigned long)from, len, 0, NULL);
>      return rc ? len : 0; /* fake a copy_from_user() return code */
>  }
> 
> @@ -4038,8 +4038,8 @@ void hvm_ud_intercept(struct cpu_user_regs
> *regs)
>                                          (hvm_long_mode_enabled(cur) &&
>                                           cs->attr.fields.l) ? 64 :
>                                          cs->attr.fields.db ? 32 : 16, &addr) &&
> -             (hvm_fetch_from_guest_virt(sig, addr, sizeof(sig),
> -                                        walk, NULL) == HVMCOPY_okay) &&
> +             (hvm_fetch_from_guest_linear(sig, addr, sizeof(sig),
> +                                          walk, NULL) == HVMCOPY_okay) &&
>               (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
>          {
>              regs->eip += sizeof(sig);
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> b/xen/arch/x86/hvm/vmx/vvmx.c
> index 7342d12..fd7ea0a 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -452,7 +452,7 @@ static int decode_vmx_inst(struct cpu_user_regs
> *regs,
>              goto gp_fault;
> 
>          if ( poperandS != NULL &&
> -             hvm_copy_from_guest_virt(poperandS, base, size, 0, &pfinfo)
> +             hvm_copy_from_guest_linear(poperandS, base, size, 0, &pfinfo)
>                    != HVMCOPY_okay )
>              return X86EMUL_EXCEPTION;
>          decode->mem = base;
> @@ -1622,7 +1622,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs
> *regs)
> 
>      gpa = nvcpu->nv_vvmcxaddr;
> 
> -    rc = hvm_copy_to_guest_virt(decode.mem, &gpa, decode.len, 0,
> &pfinfo);
> +    rc = hvm_copy_to_guest_linear(decode.mem, &gpa, decode.len, 0,
> &pfinfo);
>      if ( rc != HVMCOPY_okay )
>          return X86EMUL_EXCEPTION;
> 
> @@ -1693,7 +1693,7 @@ int nvmx_handle_vmread(struct cpu_user_regs
> *regs)
> 
>      switch ( decode.type ) {
>      case VMX_INST_MEMREG_TYPE_MEMORY:
> -        rc = hvm_copy_to_guest_virt(decode.mem, &value, decode.len, 0,
> &pfinfo);
> +        rc = hvm_copy_to_guest_linear(decode.mem, &value, decode.len, 0,
> &pfinfo);
>          if ( rc != HVMCOPY_okay )
>              return X86EMUL_EXCEPTION;
>          break;
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index b659324..0760e76 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -189,9 +189,9 @@ hvm_read(enum x86_segment seg,
>          return rc;
> 
>      if ( access_type == hvm_access_insn_fetch )
> -        rc = hvm_fetch_from_guest_virt(p_data, addr, bytes, 0, &pfinfo);
> +        rc = hvm_fetch_from_guest_linear(p_data, addr, bytes, 0, &pfinfo);
>      else
> -        rc = hvm_copy_from_guest_virt(p_data, addr, bytes, 0, &pfinfo);
> +        rc = hvm_copy_from_guest_linear(p_data, addr, bytes, 0, &pfinfo);
> 
>      switch ( rc )
>      {
> @@ -419,7 +419,7 @@ const struct x86_emulate_ops
> *shadow_init_emulation(
>          (!hvm_translate_linear_addr(
>              x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
>              hvm_access_insn_fetch, sh_ctxt, &addr) &&
> -         !hvm_fetch_from_guest_virt(
> +         !hvm_fetch_from_guest_linear(
>               sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
>          ? sizeof(sh_ctxt->insn_buf) : 0;
> 
> @@ -447,7 +447,7 @@ void shadow_continue_emulation(struct
> sh_emulate_ctxt *sh_ctxt,
>                  (!hvm_translate_linear_addr(
>                      x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
>                      hvm_access_insn_fetch, sh_ctxt, &addr) &&
> -                 !hvm_fetch_from_guest_virt(
> +                 !hvm_fetch_from_guest_linear(
>                       sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
>                  ? sizeof(sh_ctxt->insn_buf) : 0;
>              sh_ctxt->insn_buf_eip = regs->eip;
> diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-
> x86/hvm/support.h
> index 114aa04..78349f8 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -73,7 +73,7 @@ enum hvm_copy_result hvm_copy_from_guest_phys(
>      void *buf, paddr_t paddr, int size);
> 
>  /*
> - * Copy to/from a guest virtual address. @pfec should include
> PFEC_user_mode
> + * Copy to/from a guest linear address. @pfec should include
> PFEC_user_mode
>   * if emulating a user-mode access (CPL=3). All other flags in @pfec are
>   * managed by the called function: it is therefore optional for the caller
>   * to set them.
> @@ -95,14 +95,14 @@ typedef struct pagefault_info
>      int ec;
>  } pagefault_info_t;
> 
> -enum hvm_copy_result hvm_copy_to_guest_virt(
> -    unsigned long vaddr, void *buf, int size, uint32_t pfec,
> +enum hvm_copy_result hvm_copy_to_guest_linear(
> +    unsigned long addr, void *buf, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo);
> -enum hvm_copy_result hvm_copy_from_guest_virt(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec,
> +enum hvm_copy_result hvm_copy_from_guest_linear(
> +    void *buf, unsigned long addr, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo);
> -enum hvm_copy_result hvm_fetch_from_guest_virt(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec,
> +enum hvm_copy_result hvm_fetch_from_guest_linear(
> +    void *buf, unsigned long addr, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo);
> 
>  #define HVM_HCALL_completed  0 /* hypercall completed - no further
> action */
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-28 11:58   ` Tim Deegan
@ 2016-11-28 11:59     ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 11:59 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jan Beulich, Xen-devel

On 28/11/16 11:58, Tim Deegan wrote:
> At 11:13 +0000 on 28 Nov (1480331603), Andrew Cooper wrote:
>> To help with event injection improvements for the PV uses of x86_emulate(),
>> implement a event injection API which matches its hvm counterpart.
>>
>> This is started with taking do_guest_trap() and modifying its calling API to
>> pv_inject_event(), subsequentally implementing the former in terms of the
>> latter.
>>
>> The existing propagate_page_fault() is fairly similar to
>> pv_inject_page_fault(), although it has a return value.  Only a single caller
>> makes use of the return value, and non-NULL is only returned if the passed cr2
>> is non-canonical.  Opencode this single case in
>> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
>> void.
>>
>> The #PF specific bits are moved into pv_inject_event(), and
>> pv_inject_page_fault() is implemented as a static inline wrapper.
>> reserved_bit_page_fault() is pure code motion.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Tim Deegan <tim@xen.org>
>
> with one note:
>
>> +    if ( vector == TRAP_page_fault )
>> +    {
>> +        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
>> +        arch_set_cr2(v, event->cr2);
>> +
>> +        /* Re-set error_code.user flag appropriately for the guest. */
>> +        error_code &= ~PFEC_user_mode;
>> +        if ( !guest_kernel_mode(v, regs) )
>> +            error_code |= PFEC_user_mode;
> I can see that you're just moving this code, but isn't it wrong for
> what are now called "implicit" accesses?  My Ack stands on this patch
> regardless.

One swamp at a time. :)

I have an equally sized series for that mess.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 08/19] x86/emul: Rework emulator event injection
  2016-11-28 11:13 ` [PATCH v2 08/19] x86/emul: Rework emulator event injection Andrew Cooper
@ 2016-11-28 12:04   ` Tim Deegan
  2016-11-28 12:48     ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 12:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jun Nakajima, George Dunlap, Xen-devel, Jan Beulich,
	Paul Durrant, Suravee Suthikulpanit

At 11:13 +0000 on 28 Nov (1480331605), Andrew Cooper wrote:
> The emulator needs to gain an understanding of interrupts and exceptions
> generated by its actions.
> 
> Move hvm_emulate_ctxt.{exn_pending,trap} into struct x86_emulate_ctxt so they
> are visible to the emulator.  This removes the need for the
> inject_{hw_exception,sw_interrupt}() hooks, which are dropped and replaced
> with x86_emul_{hw_exception,software_event,reset_event}() instead.
> 
> For exceptions raised by x86_emulate() itself (rather than its callbacks), the
> shadow pagetable and PV uses of x86_emulate() previously failed with
> X86EMUL_UNHANDLEABLE due to the lack of inject_*() hooks.
> 
> This behaviour has changed, and such cases will now return X86EMUL_EXCEPTION
> with event_pending set.  Until the callers of x86_emulate() have been updated
> to inject events back into the guest, divert the event_pending case back into
> the X86EMUL_UNHANDLEABLE path to maintain the same guest-visible behaviour.
> 
> No overall functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3374,11 +3374,23 @@ static int sh_page_fault(struct vcpu *v,
>      r = x86_emulate(&emul_ctxt.ctxt, emul_ops);
>  
>      /*
> +     * TODO: Make this true:
> +     *
> +    ASSERT(emul_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
> +     *
> +     * Some codepaths still raise exceptions behind the back of the
> +     * emulator. (i.e. return X86EMUL_EXCEPTION but without event_pending
> +     * being set).  In the meantime, use a slightly relaxed check...
> +     */
> +    if ( emul_ctxt.ctxt.event_pending )
> +        ASSERT(r == X86EMUL_EXCEPTION);
> +

Here I'll grumble about adding this twice in the same function, when
IMO it ought to be asserted in the emulator instead.  Given that it
mostly disappears later I'll let it stand if you prefer.

> +    /*
>       * NB. We do not unshadow on X86EMUL_EXCEPTION. It's not clear that it
>       * would be a good unshadow hint. If we *do* decide to unshadow-on-fault
>       * then it must be 'failable': we cannot require the unshadow to succeed.
>       */
> -    if ( r == X86EMUL_UNHANDLEABLE )
> +    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )

No thank you.  The comment there explains why we don't want to
unshadow for an injection; please let it stand.  Or if the new
semantics have changed, update the comment.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back Andrew Cooper
@ 2016-11-28 12:47   ` Paul Durrant
  2016-11-29 16:02   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 12:47 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 28 November 2016 11:13
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 11/19] x86/emul: Avoid raising faults behind the
> emulators back
> 
> Introduce a new x86_emul_pagefault() similar to x86_emul_hw_exception(),
> and
> use this instead of hvm_inject_page_fault() from emulation codepaths.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> v2:
>  * Change x86_emul_pagefault()'s error_code parameter to being signed
>  * Split out shadow changes
> ---
>  xen/arch/x86/hvm/emulate.c             |  4 ++--
>  xen/arch/x86/x86_emulate/x86_emulate.h | 13 +++++++++++++
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 7745c5b..35d1d1c 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -459,7 +459,7 @@ static int hvmemul_linear_to_phys(
>      {
>          if ( pfec & (PFEC_page_paged | PFEC_page_shared) )
>              return X86EMUL_RETRY;
> -        hvm_inject_page_fault(pfec, addr);
> +        x86_emul_pagefault(pfec, addr, &hvmemul_ctxt->ctxt);
>          return X86EMUL_EXCEPTION;
>      }
> 
> @@ -483,7 +483,7 @@ static int hvmemul_linear_to_phys(
>                  ASSERT(!reverse);
>                  if ( npfn != gfn_x(INVALID_GFN) )
>                      return X86EMUL_UNHANDLEABLE;
> -                hvm_inject_page_fault(pfec, addr & PAGE_MASK);
> +                x86_emul_pagefault(pfec, addr & PAGE_MASK, &hvmemul_ctxt-
> >ctxt);
>                  return X86EMUL_EXCEPTION;
>              }
>              *reps = done;
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h
> b/xen/arch/x86/x86_emulate/x86_emulate.h
> index 8019ee1..4679711 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -624,6 +624,19 @@ static inline void x86_emul_hw_exception(
>      ctxt->event_pending = true;
>  }
> 
> +static inline void x86_emul_pagefault(
> +    int error_code, unsigned long cr2, struct x86_emulate_ctxt *ctxt)
> +{
> +    ASSERT(!ctxt->event_pending);
> +
> +    ctxt->event.vector = 14; /* TRAP_page_fault */
> +    ctxt->event.type = X86_EVENTTYPE_HW_EXCEPTION;
> +    ctxt->event.error_code = error_code;
> +    ctxt->event.cr2 = cr2;
> +
> +    ctxt->event_pending = true;
> +}
> +
>  static inline void x86_emul_software_event(
>      enum x86_swint_type type, uint8_t vector, uint8_t insn_len,
>      struct x86_emulate_ctxt *ctxt)
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 08/19] x86/emul: Rework emulator event injection
  2016-11-28 12:04   ` Tim Deegan
@ 2016-11-28 12:48     ` Andrew Cooper
  2016-11-28 14:24       ` Tim Deegan
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 12:48 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Jun Nakajima, George Dunlap, Xen-devel, Jan Beulich,
	Paul Durrant, Suravee Suthikulpanit

On 28/11/16 12:04, Tim Deegan wrote:
> At 11:13 +0000 on 28 Nov (1480331605), Andrew Cooper wrote:
>> The emulator needs to gain an understanding of interrupts and exceptions
>> generated by its actions.
>>
>> Move hvm_emulate_ctxt.{exn_pending,trap} into struct x86_emulate_ctxt so they
>> are visible to the emulator.  This removes the need for the
>> inject_{hw_exception,sw_interrupt}() hooks, which are dropped and replaced
>> with x86_emul_{hw_exception,software_event,reset_event}() instead.
>>
>> For exceptions raised by x86_emulate() itself (rather than its callbacks), the
>> shadow pagetable and PV uses of x86_emulate() previously failed with
>> X86EMUL_UNHANDLEABLE due to the lack of inject_*() hooks.
>>
>> This behaviour has changed, and such cases will now return X86EMUL_EXCEPTION
>> with event_pending set.  Until the callers of x86_emulate() have been updated
>> to inject events back into the guest, divert the event_pending case back into
>> the X86EMUL_UNHANDLEABLE path to maintain the same guest-visible behaviour.
>>
>> No overall functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -3374,11 +3374,23 @@ static int sh_page_fault(struct vcpu *v,
>>      r = x86_emulate(&emul_ctxt.ctxt, emul_ops);
>>  
>>      /*
>> +     * TODO: Make this true:
>> +     *
>> +    ASSERT(emul_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
>> +     *
>> +     * Some codepaths still raise exceptions behind the back of the
>> +     * emulator. (i.e. return X86EMUL_EXCEPTION but without event_pending
>> +     * being set).  In the meantime, use a slightly relaxed check...
>> +     */
>> +    if ( emul_ctxt.ctxt.event_pending )
>> +        ASSERT(r == X86EMUL_EXCEPTION);
>> +
> Here I'll grumble about adding this twice in the same function, when
> IMO it ought to be asserted in the emulator instead.  Given that it
> mostly disappears later I'll let it stand if you prefer.

It disappears from different call-sites at different points.  The
PV-only cases are fully resolved in this series, whereas neither the
shadow or HVM cases are fully resolved.

>
>> +    /*
>>       * NB. We do not unshadow on X86EMUL_EXCEPTION. It's not clear that it
>>       * would be a good unshadow hint. If we *do* decide to unshadow-on-fault
>>       * then it must be 'failable': we cannot require the unshadow to succeed.
>>       */
>> -    if ( r == X86EMUL_UNHANDLEABLE )
>> +    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )
> No thank you.  The comment there explains why we don't want to
> unshadow for an injection; please let it stand.  Or if the new
> semantics have changed, update the comment.

This addition is no functional behavioural change from before, which is
the point I was trying to get across.

We previously hit this path for exceptions raised from within the
emulator, such as singlestepping.  Exceptions raised behind the back of
emulator do not set event_pending yet.

The behaviour of exceptions raised within the emulator is fixed later in
patch 13, but this change does not alter the guest-observed behaviour.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 03/19] x86/emul: Simplfy emulation state setup
  2016-11-28 11:13 ` [PATCH v2 03/19] x86/emul: Simplfy emulation state setup Andrew Cooper
  2016-11-28 11:58   ` Paul Durrant
@ 2016-11-28 12:54   ` Paul Durrant
  1 sibling, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 12:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, George Dunlap

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 28 November 2016 11:13
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 03/19] x86/emul: Simplfy emulation state setup
> 
> The current code to set up emulation state is ad-hoc and error prone.
> 
>  * Consistently zero all emulation state structures.
>  * Avoid explicitly initialising some state to 0.
>  * Explicitly identify all input and output state in x86_emulate_ctxt.  This
>    involves rearanging some fields.
>  * Have x86_decode() explicitly initalise all output state at its start.
> 
> While making the above changes, two minor tweaks:
> 
>  * Move the calculation of hvmemul_ctxt->ctxt.swint_emulate from
>    _hvm_emulate_one() to hvm_emulate_init_once().  It doesn't need
>    recalculating for each instruction.
>  * Change force_writeback to being a boolean, to match its use.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Tim Deegan <tim@xen.org>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> v2:
>  * Split x86_emulate_ctxt into three sections
> ---
>  xen/arch/x86/hvm/emulate.c             | 28 +++++++++++++++-------------
>  xen/arch/x86/mm.c                      | 14 ++++++++------
>  xen/arch/x86/mm/shadow/common.c        |  4 ++--
>  xen/arch/x86/x86_emulate/x86_emulate.c |  1 +
>  xen/arch/x86/x86_emulate/x86_emulate.h | 32
> ++++++++++++++++++++++----------
>  5 files changed, 48 insertions(+), 31 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index f1f6e2f..3efeead 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1770,13 +1770,6 @@ static int _hvm_emulate_one(struct
> hvm_emulate_ctxt *hvmemul_ctxt,
> 
>      vio->mmio_retry = 0;
> 
> -    if ( cpu_has_vmx )
> -        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> -    else if ( cpu_has_svm_nrips )
> -        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
> -    else
> -        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
> -
>      rc = x86_emulate(&hvmemul_ctxt->ctxt, ops);
> 
>      if ( rc == X86EMUL_OKAY && vio->mmio_retry )
> @@ -1947,14 +1940,23 @@ void hvm_emulate_init_once(
>      struct hvm_emulate_ctxt *hvmemul_ctxt,
>      struct cpu_user_regs *regs)
>  {
> -    hvmemul_ctxt->intr_shadow =
> hvm_funcs.get_interrupt_shadow(current);
> -    hvmemul_ctxt->ctxt.regs = regs;
> -    hvmemul_ctxt->ctxt.force_writeback = 1;
> -    hvmemul_ctxt->seg_reg_accessed = 0;
> -    hvmemul_ctxt->seg_reg_dirty = 0;
> -    hvmemul_ctxt->set_context = 0;
> +    struct vcpu *curr = current;
> +
> +    memset(hvmemul_ctxt, 0, sizeof(*hvmemul_ctxt));
> +
> +    hvmemul_ctxt->intr_shadow = hvm_funcs.get_interrupt_shadow(curr);
>      hvmemul_get_seg_reg(x86_seg_cs, hvmemul_ctxt);
>      hvmemul_get_seg_reg(x86_seg_ss, hvmemul_ctxt);
> +
> +    hvmemul_ctxt->ctxt.regs = regs;
> +    hvmemul_ctxt->ctxt.force_writeback = true;
> +
> +    if ( cpu_has_vmx )
> +        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> +    else if ( cpu_has_svm_nrips )
> +        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_icebp;
> +    else
> +        hvmemul_ctxt->ctxt.swint_emulate = x86_swint_emulate_all;
>  }
> 
>  void hvm_emulate_init_per_insn(
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 5b0e9f3..d365f59 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5337,7 +5337,14 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned
> long addr,
>      struct domain *d = v->domain;
>      struct page_info *page;
>      l1_pgentry_t      pte;
> -    struct ptwr_emulate_ctxt ptwr_ctxt;
> +    struct ptwr_emulate_ctxt ptwr_ctxt = {
> +        .ctxt = {
> +            .regs = regs,
> +            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
> +            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
> +            .swint_emulate = x86_swint_emulate_none,
> +        },
> +    };
>      int rc;
> 
>      /* Attempt to read the PTE that maps the VA being accessed. */
> @@ -5363,11 +5370,6 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned
> long addr,
>          goto bail;
>      }
> 
> -    ptwr_ctxt.ctxt.regs = regs;
> -    ptwr_ctxt.ctxt.force_writeback = 0;
> -    ptwr_ctxt.ctxt.addr_size = ptwr_ctxt.ctxt.sp_size =
> -        is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG;
> -    ptwr_ctxt.ctxt.swint_emulate = x86_swint_emulate_none;
>      ptwr_ctxt.cr2 = addr;
>      ptwr_ctxt.pte = pte;
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index 7e5b8b0..a4a3c4b 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -385,8 +385,9 @@ const struct x86_emulate_ops
> *shadow_init_emulation(
>      struct vcpu *v = current;
>      unsigned long addr;
> 
> +    memset(sh_ctxt, 0, sizeof(*sh_ctxt));
> +
>      sh_ctxt->ctxt.regs = regs;
> -    sh_ctxt->ctxt.force_writeback = 0;
>      sh_ctxt->ctxt.swint_emulate = x86_swint_emulate_none;
> 
>      if ( is_pv_vcpu(v) )
> @@ -396,7 +397,6 @@ const struct x86_emulate_ops
> *shadow_init_emulation(
>      }
> 
>      /* Segment cache initialisation. Primed with CS. */
> -    sh_ctxt->valid_seg_regs = 0;
>      creg = hvm_get_seg_reg(x86_seg_cs, sh_ctxt);
> 
>      /* Work out the emulation mode. */
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c
> b/xen/arch/x86/x86_emulate/x86_emulate.c
> index d82e85d..532bd32 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -1904,6 +1904,7 @@ x86_decode(
>      state->regs = ctxt->regs;
>      state->eip = ctxt->regs->eip;
> 
> +    /* Initialise output state in x86_emulate_ctxt */
>      ctxt->retire.byte = 0;
> 
>      op_bytes = def_op_bytes = ad_bytes = def_ad_bytes = ctxt-
> >addr_size/8;
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h
> b/xen/arch/x86/x86_emulate/x86_emulate.h
> index ec824ce..ab566c0 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -410,6 +410,23 @@ struct cpu_user_regs;
> 
>  struct x86_emulate_ctxt
>  {
> +    /*
> +     * Input-only state:
> +     */
> +
> +    /* Software event injection support. */
> +    enum x86_swint_emulation swint_emulate;
> +
> +    /* Set this if writes may have side effects. */
> +    bool force_writeback;
> +
> +    /* Caller data that can be used by x86_emulate_ops' routines. */
> +    void *data;
> +
> +    /*
> +     * Input/output state:
> +     */
> +
>      /* Register state before/after emulation. */
>      struct cpu_user_regs *regs;
> 
> @@ -419,14 +436,12 @@ struct x86_emulate_ctxt
>      /* Stack pointer width in bits (16, 32 or 64). */
>      unsigned int sp_size;
> 
> -    /* Canonical opcode (see below). */
> -    unsigned int opcode;
> -
> -    /* Software event injection support. */
> -    enum x86_swint_emulation swint_emulate;
> +    /*
> +     * Output-only state:
> +     */
> 
> -    /* Set this if writes may have side effects. */
> -    uint8_t force_writeback;
> +    /* Canonical opcode (see below) (valid only on X86EMUL_OKAY). */
> +    unsigned int opcode;
> 
>      /* Retirement state, set by the emulator (valid only on X86EMUL_OKAY).
> */
>      union {
> @@ -437,9 +452,6 @@ struct x86_emulate_ctxt
>          } flags;
>          uint8_t byte;
>      } retire;
> -
> -    /* Caller data that can be used by x86_emulate_ops' routines. */
> -    void *data;
>  };
> 
>  /*
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info
  2016-11-28 11:13 ` [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info Andrew Cooper
@ 2016-11-28 12:56   ` Paul Durrant
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 12:56 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: 28 November 2016 11:14
> To: Xen-devel <xen-devel@lists.xen.org>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> Subject: [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in
> terms of no pagefault_info
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Tim Deegan <tim@xen.org>
> ---
> CC: Paul Durrant <paul.durrant@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
>  xen/arch/x86/hvm/emulate.c        |  6 ++---
>  xen/arch/x86/hvm/hvm.c            | 56 +++++++++------------------------------
>  xen/arch/x86/mm/shadow/common.c   |  8 +++---
>  xen/include/asm-x86/hvm/support.h | 11 --------
>  4 files changed, 19 insertions(+), 62 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index 6de94d4..5165bb2 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1947,9 +1947,9 @@ void hvm_emulate_init_per_insn(
>                                          hvm_access_insn_fetch,
>                                          hvmemul_ctxt->ctxt.addr_size,
>                                          &addr) &&
> -             hvm_fetch_from_guest_virt_nofault(hvmemul_ctxt->insn_buf,
> addr,
> -                                               sizeof(hvmemul_ctxt->insn_buf),
> -                                               pfec) == HVMCOPY_okay) ?
> +             hvm_fetch_from_guest_virt(hvmemul_ctxt->insn_buf, addr,
> +                                       sizeof(hvmemul_ctxt->insn_buf),
> +                                       pfec, NULL) == HVMCOPY_okay) ?
>              sizeof(hvmemul_ctxt->insn_buf) : 0;
>      }
>      else
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 390f76d..5eae06a 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3066,8 +3066,6 @@ void hvm_task_switch(
> 
>  #define HVMCOPY_from_guest (0u<<0)
>  #define HVMCOPY_to_guest   (1u<<0)
> -#define HVMCOPY_no_fault   (0u<<1)
> -#define HVMCOPY_fault      (1u<<1)
>  #define HVMCOPY_phys       (0u<<2)
>  #define HVMCOPY_virt       (1u<<2)
>  static enum hvm_copy_result __hvm_copy(
> @@ -3112,13 +3110,10 @@ static enum hvm_copy_result __hvm_copy(
>                      return HVMCOPY_gfn_paged_out;
>                  if ( pfec & PFEC_page_shared )
>                      return HVMCOPY_gfn_shared;
> -                if ( flags & HVMCOPY_fault )
> +                if ( pfinfo )
>                  {
> -                    if ( pfinfo )
> -                    {
> -                        pfinfo->linear = addr;
> -                        pfinfo->ec = pfec;
> -                    }
> +                    pfinfo->linear = addr;
> +                    pfinfo->ec = pfec;
> 
>                      hvm_inject_page_fault(pfec, addr);
>                  }
> @@ -3290,16 +3285,14 @@ enum hvm_copy_result
> hvm_copy_to_guest_phys(
>      paddr_t paddr, void *buf, int size)
>  {
>      return __hvm_copy(buf, paddr, size,
> -                      HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_phys,
> -                      0, NULL);
> +                      HVMCOPY_to_guest | HVMCOPY_phys, 0, NULL);
>  }
> 
>  enum hvm_copy_result hvm_copy_from_guest_phys(
>      void *buf, paddr_t paddr, int size)
>  {
>      return __hvm_copy(buf, paddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_phys,
> -                      0, NULL);
> +                      HVMCOPY_from_guest | HVMCOPY_phys, 0, NULL);
>  }
> 
>  enum hvm_copy_result hvm_copy_to_guest_virt(
> @@ -3307,7 +3300,7 @@ enum hvm_copy_result hvm_copy_to_guest_virt(
>      pagefault_info_t *pfinfo)
>  {
>      return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_to_guest | HVMCOPY_fault | HVMCOPY_virt,
> +                      HVMCOPY_to_guest | HVMCOPY_virt,
>                        PFEC_page_present | PFEC_write_access | pfec, pfinfo);
>  }
> 
> @@ -3316,7 +3309,7 @@ enum hvm_copy_result
> hvm_copy_from_guest_virt(
>      pagefault_info_t *pfinfo)
>  {
>      return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
> +                      HVMCOPY_from_guest | HVMCOPY_virt,
>                        PFEC_page_present | pfec, pfinfo);
>  }
> 
> @@ -3325,34 +3318,10 @@ enum hvm_copy_result
> hvm_fetch_from_guest_virt(
>      pagefault_info_t *pfinfo)
>  {
>      return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_fault | HVMCOPY_virt,
> +                      HVMCOPY_from_guest | HVMCOPY_virt,
>                        PFEC_page_present | PFEC_insn_fetch | pfec, pfinfo);
>  }
> 
> -enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
> -    unsigned long vaddr, void *buf, int size, uint32_t pfec)
> -{
> -    return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_to_guest | HVMCOPY_no_fault | HVMCOPY_virt,
> -                      PFEC_page_present | PFEC_write_access | pfec, NULL);
> -}
> -
> -enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec)
> -{
> -    return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
> -                      PFEC_page_present | pfec, NULL);
> -}
> -
> -enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec)
> -{
> -    return __hvm_copy(buf, vaddr, size,
> -                      HVMCOPY_from_guest | HVMCOPY_no_fault | HVMCOPY_virt,
> -                      PFEC_page_present | PFEC_insn_fetch | pfec, NULL);
> -}
> -
>  unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int
> len)
>  {
>      int rc;
> @@ -3364,8 +3333,7 @@ unsigned long copy_to_user_hvm(void *to, const
> void *from, unsigned int len)
>          return 0;
>      }
> 
> -    rc = hvm_copy_to_guest_virt_nofault((unsigned long)to, (void *)from,
> -                                        len, 0);
> +    rc = hvm_copy_to_guest_virt((unsigned long)to, (void *)from, len, 0,
> NULL);
>      return rc ? len : 0; /* fake a copy_to_user() return code */
>  }
> 
> @@ -3395,7 +3363,7 @@ unsigned long copy_from_user_hvm(void *to,
> const void *from, unsigned len)
>          return 0;
>      }
> 
> -    rc = hvm_copy_from_guest_virt_nofault(to, (unsigned long)from, len, 0);
> +    rc = hvm_copy_from_guest_virt(to, (unsigned long)from, len, 0, NULL);
>      return rc ? len : 0; /* fake a copy_from_user() return code */
>  }
> 
> @@ -4070,8 +4038,8 @@ void hvm_ud_intercept(struct cpu_user_regs
> *regs)
>                                          (hvm_long_mode_enabled(cur) &&
>                                           cs->attr.fields.l) ? 64 :
>                                          cs->attr.fields.db ? 32 : 16, &addr) &&
> -             (hvm_fetch_from_guest_virt_nofault(sig, addr, sizeof(sig),
> -                                                walk) == HVMCOPY_okay) &&
> +             (hvm_fetch_from_guest_virt(sig, addr, sizeof(sig),
> +                                        walk, NULL) == HVMCOPY_okay) &&
>               (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
>          {
>              regs->eip += sizeof(sig);
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index e8501ce..b659324 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -419,8 +419,8 @@ const struct x86_emulate_ops
> *shadow_init_emulation(
>          (!hvm_translate_linear_addr(
>              x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
>              hvm_access_insn_fetch, sh_ctxt, &addr) &&
> -         !hvm_fetch_from_guest_virt_nofault(
> -             sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0))
> +         !hvm_fetch_from_guest_virt(
> +             sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
>          ? sizeof(sh_ctxt->insn_buf) : 0;
> 
>      return &hvm_shadow_emulator_ops;
> @@ -447,8 +447,8 @@ void shadow_continue_emulation(struct
> sh_emulate_ctxt *sh_ctxt,
>                  (!hvm_translate_linear_addr(
>                      x86_seg_cs, regs->eip, sizeof(sh_ctxt->insn_buf),
>                      hvm_access_insn_fetch, sh_ctxt, &addr) &&
> -                 !hvm_fetch_from_guest_virt_nofault(
> -                     sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0))
> +                 !hvm_fetch_from_guest_virt(
> +                     sh_ctxt->insn_buf, addr, sizeof(sh_ctxt->insn_buf), 0, NULL))
>                  ? sizeof(sh_ctxt->insn_buf) : 0;
>              sh_ctxt->insn_buf_eip = regs->eip;
>          }
> diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-
> x86/hvm/support.h
> index 4aa5a36..114aa04 100644
> --- a/xen/include/asm-x86/hvm/support.h
> +++ b/xen/include/asm-x86/hvm/support.h
> @@ -105,17 +105,6 @@ enum hvm_copy_result
> hvm_fetch_from_guest_virt(
>      void *buf, unsigned long vaddr, int size, uint32_t pfec,
>      pagefault_info_t *pfinfo);
> 
> -/*
> - * As above (copy to/from a guest virtual address), but no fault is generated
> - * when HVMCOPY_bad_gva_to_gfn is returned.
> - */
> -enum hvm_copy_result hvm_copy_to_guest_virt_nofault(
> -    unsigned long vaddr, void *buf, int size, uint32_t pfec);
> -enum hvm_copy_result hvm_copy_from_guest_virt_nofault(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec);
> -enum hvm_copy_result hvm_fetch_from_guest_virt_nofault(
> -    void *buf, unsigned long vaddr, int size, uint32_t pfec);
> -
>  #define HVM_HCALL_completed  0 /* hypercall completed - no further
> action */
>  #define HVM_HCALL_preempted  1 /* hypercall preempted - re-execute
> VMCALL */
>  #define HVM_HCALL_invalidate 2 /* invalidate ioemu-dm memory cache
> */
> --
> 2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 11:56   ` Paul Durrant
@ 2016-11-28 12:58     ` Andrew Cooper
  2016-11-28 13:01       ` Paul Durrant
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 12:58 UTC (permalink / raw)
  To: Paul Durrant, Xen-devel
  Cc: Kevin Tian, Tim (Xen.org), Jun Nakajima, Jan Beulich

On 28/11/16 11:56, Paul Durrant wrote:
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: 28 November 2016 11:14
>> To: Xen-devel <xen-devel@lists.xen.org>
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
>> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>; Tim
>> (Xen.org) <tim@xen.org>; Jun Nakajima <jun.nakajima@intel.com>; Kevin
>> Tian <kevin.tian@intel.com>
>> Subject: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind
>> the emulators back
>>
>> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require
>> callers
>> to inject the pagefault themselves.
>>
>> No functional change.
> That's not the way it looks on the face of it. You've indeed removed the call to hvm_inject_page_fault() but some of the callers now call x86_emul_pagefault(). I'd call that a functional change... clearly the change you intended, but still a functional change.

Hmm - I suppose I am confusing no functional change in the hypervisor
with no functional change as observed by a guest.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 12:58     ` Andrew Cooper
@ 2016-11-28 13:01       ` Paul Durrant
  2016-11-28 13:03         ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Durrant @ 2016-11-28 13:01 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Kevin Tian, Tim (Xen.org), Jun Nakajima, Jan Beulich

> -----Original Message-----
> From: Andrew Cooper
> Sent: 28 November 2016 12:58
> To: Paul Durrant <Paul.Durrant@citrix.com>; Xen-devel <xen-
> devel@lists.xen.org>
> Cc: Jan Beulich <JBeulich@suse.com>; Tim (Xen.org) <tim@xen.org>; Jun
> Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>
> Subject: Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF
> behind the emulators back
> 
> On 28/11/16 11:56, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> Sent: 28 November 2016 11:14
> >> To: Xen-devel <xen-devel@lists.xen.org>
> >> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
> >> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>; Tim
> >> (Xen.org) <tim@xen.org>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin
> >> Tian <kevin.tian@intel.com>
> >> Subject: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF
> behind
> >> the emulators back
> >>
> >> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require
> >> callers
> >> to inject the pagefault themselves.
> >>
> >> No functional change.
> > That's not the way it looks on the face of it. You've indeed removed the call
> to hvm_inject_page_fault() but some of the callers now call
> x86_emul_pagefault(). I'd call that a functional change... clearly the change
> you intended, but still a functional change.
> 
> Hmm - I suppose I am confusing no functional change in the hypervisor
> with no functional change as observed by a guest.
> 

Yes, I was thinking from the PoV of someone looking at this patch years later and saying 'hang on a minute...'. Saying 'no guest-observable behavioural change' is much clearer I think.

  Paul

> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 13:01       ` Paul Durrant
@ 2016-11-28 13:03         ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 13:03 UTC (permalink / raw)
  To: Paul Durrant, Xen-devel
  Cc: Kevin Tian, Tim (Xen.org), Jun Nakajima, Jan Beulich

On 28/11/16 13:01, Paul Durrant wrote:
>> -----Original Message-----
>> From: Andrew Cooper
>> Sent: 28 November 2016 12:58
>> To: Paul Durrant <Paul.Durrant@citrix.com>; Xen-devel <xen-
>> devel@lists.xen.org>
>> Cc: Jan Beulich <JBeulich@suse.com>; Tim (Xen.org) <tim@xen.org>; Jun
>> Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>
>> Subject: Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF
>> behind the emulators back
>>
>> On 28/11/16 11:56, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>>>> Sent: 28 November 2016 11:14
>>>> To: Xen-devel <xen-devel@lists.xen.org>
>>>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Jan Beulich
>>>> <JBeulich@suse.com>; Paul Durrant <Paul.Durrant@citrix.com>; Tim
>>>> (Xen.org) <tim@xen.org>; Jun Nakajima <jun.nakajima@intel.com>;
>> Kevin
>>>> Tian <kevin.tian@intel.com>
>>>> Subject: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF
>> behind
>>>> the emulators back
>>>>
>>>> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require
>>>> callers
>>>> to inject the pagefault themselves.
>>>>
>>>> No functional change.
>>> That's not the way it looks on the face of it. You've indeed removed the call
>> to hvm_inject_page_fault() but some of the callers now call
>> x86_emul_pagefault(). I'd call that a functional change... clearly the change
>> you intended, but still a functional change.
>>
>> Hmm - I suppose I am confusing no functional change in the hypervisor
>> with no functional change as observed by a guest.
>>
> Yes, I was thinking from the PoV of someone looking at this patch years later and saying 'hang on a minute...'. Saying 'no guest-observable behavioural change' is much clearer I think.

I will double check all of the patches and clarify it in the commit
messages.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 10/19] x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS
  2016-11-28 11:13 ` [PATCH v2 10/19] x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS Andrew Cooper
@ 2016-11-28 14:18   ` Boris Ostrovsky
  0 siblings, 0 replies; 57+ messages in thread
From: Boris Ostrovsky @ 2016-11-28 14:18 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Suravee Suthikulpanit

On 11/28/2016 06:13 AM, Andrew Cooper wrote:
> Intel VT-x and AMD SVM provide access to the full segment descriptor cache via
> fields in the VMCB/VMCS.  However, the bits which are actually checked by
> hardware and preserved across vmentry/exit are inconsistent, and the vendor
> accessor functions perform inconsistent modification to the raw values.
>
> Convert {svm,vmx}_{get,set}_segment_register() into raw accessors, and alter
> hvm_{get,set}_segment_register() to cook the values consistently.  This allows
> the common emulation code to better rely on finding architecturally-expected
> values.
>
> While moving the code performing the cooking, fix the %ss.db quirk.  A NULL
> selector is indicated by .p being clear, not the value of the .type field.
>
> This does cause some functional changes because of the modifications being
> applied uniformly.  A side effect of this fixes latent bugs where
> vmx_set_segment_register() didn't correctly fix up .G for segments, and
> inconsistent fixing up of the GDTR/IDTR limits.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 08/19] x86/emul: Rework emulator event injection
  2016-11-28 12:48     ` Andrew Cooper
@ 2016-11-28 14:24       ` Tim Deegan
  2016-11-28 14:34         ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 14:24 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jun Nakajima, George Dunlap, Xen-devel, Jan Beulich,
	Paul Durrant, Suravee Suthikulpanit

At 12:48 +0000 on 28 Nov (1480337304), Andrew Cooper wrote:
> On 28/11/16 12:04, Tim Deegan wrote:
> > At 11:13 +0000 on 28 Nov (1480331605), Andrew Cooper wrote:
> >> +    /*
> >>       * NB. We do not unshadow on X86EMUL_EXCEPTION. It's not clear that it
> >>       * would be a good unshadow hint. If we *do* decide to unshadow-on-fault
> >>       * then it must be 'failable': we cannot require the unshadow to succeed.
> >>       */
> >> -    if ( r == X86EMUL_UNHANDLEABLE )
> >> +    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )
> > No thank you.  The comment there explains why we don't want to
> > unshadow for an injection; please let it stand.  Or if the new
> > semantics have changed, update the comment.
> 
> This addition is no functional behavioural change from before, which is
> the point I was trying to get across.

I can understand why you want to do it that way but it makes the
series (and this code!) read quite oddly.  If you keep this, then
please add a second comment above this line that explains why the code
temporarily disagrees with the first comment. :)  You can delete that
comment again in patch #13.

With that, Acked-by: Tim Deegan <tim@xen.org>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 08/19] x86/emul: Rework emulator event injection
  2016-11-28 14:24       ` Tim Deegan
@ 2016-11-28 14:34         ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 14:34 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Jun Nakajima, George Dunlap, Xen-devel, Jan Beulich,
	Paul Durrant, Suravee Suthikulpanit

On 28/11/16 14:24, Tim Deegan wrote:
> At 12:48 +0000 on 28 Nov (1480337304), Andrew Cooper wrote:
>> On 28/11/16 12:04, Tim Deegan wrote:
>>> At 11:13 +0000 on 28 Nov (1480331605), Andrew Cooper wrote:
>>>> +    /*
>>>>       * NB. We do not unshadow on X86EMUL_EXCEPTION. It's not clear that it
>>>>       * would be a good unshadow hint. If we *do* decide to unshadow-on-fault
>>>>       * then it must be 'failable': we cannot require the unshadow to succeed.
>>>>       */
>>>> -    if ( r == X86EMUL_UNHANDLEABLE )
>>>> +    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )
>>> No thank you.  The comment there explains why we don't want to
>>> unshadow for an injection; please let it stand.  Or if the new
>>> semantics have changed, update the comment.
>> This addition is no functional behavioural change from before, which is
>> the point I was trying to get across.
> I can understand why you want to do it that way but it makes the
> series (and this code!) read quite oddly.  If you keep this, then
> please add a second comment above this line that explains why the code
> temporarily disagrees with the first comment. :)  You can delete that
> comment again in patch #13.
>
> With that, Acked-by: Tim Deegan <tim@xen.org>

How about this?

~Andrew

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3374,11 +3374,33 @@ static int sh_page_fault(struct vcpu *v,
     r = x86_emulate(&emul_ctxt.ctxt, emul_ops);
 
     /*
+     * TODO: Make this true:
+     *
+    ASSERT(emul_ctxt.ctxt.event_pending == (rc == X86EMUL_EXCEPTION));
+     *
+     * Some codepaths still raise exceptions behind the back of the
+     * emulator. (i.e. return X86EMUL_EXCEPTION but without event_pending
+     * being set).  In the meantime, use a slightly relaxed check...
+     */
+    if ( emul_ctxt.ctxt.event_pending )
+        ASSERT(r == X86EMUL_EXCEPTION);
+
+    /*
      * NB. We do not unshadow on X86EMUL_EXCEPTION. It's not clear that it
      * would be a good unshadow hint. If we *do* decide to
unshadow-on-fault
      * then it must be 'failable': we cannot require the unshadow to
succeed.
+     *
+     * Note: Despite the above comment, this path has actually been handing
+     * exception circumstances raised by the emulator itself (e.g.
singlestep)
+     * because of the lack of the inject_hw_exception() hook.
+     *
+     * With this change, exceptions raised behind the back of the emulator
+     * still return without setting event_pending, but exceptions raised by
+     * the emulator do.  Force these exceptions back onto the UNHANDLEABLE
+     * path for now, so they are similarly ignored.  A future change
will fix
+     * this properly.
      */
-    if ( r == X86EMUL_UNHANDLEABLE )
+    if ( r == X86EMUL_UNHANDLEABLE || emul_ctxt.ctxt.event_pending )
     {
         perfc_incr(shadow_fault_emulate_failed);



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 13/19] x86/shadow: " Andrew Cooper
@ 2016-11-28 14:49   ` Tim Deegan
  2016-11-28 16:04     ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 14:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

Hi,

At 11:13 +0000 on 28 Nov (1480331610), Andrew Cooper wrote:
> Use x86_emul_{hw_exception,pagefault}() rather than
> {pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
> faults to be known to the emulator.  This requires altering the callers of
> x86_emulate() to properly re-inject the event.
> 
> While fixing this, fix the singlestep behaviour.  Previously, an otherwise
> successful emulation would fail if singlestepping was active, as the emulator
> couldn't raise #DB.  This is unreasonable from the point of view of the guest.
> 
> We therefore tolerate #PF/#GP/SS and #DB being raised by the emulator, but
> reject anything else as unexpected.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

> +    if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
> +    {
> +        /*
> +         * This emulation covers writes to shadow pagetables.  We tolerate #PF
> +         * (from hitting adjacent pages), #GP/#SS (from segmentation errors),
> +         * and #DB (from singlestepping).  Anything else is an emulation bug,
> +         * or a guest playing with the instruction stream under Xen's feet.
> +         */
> +        if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
> +             (emul_ctxt.ctxt.event.vector < 32) &&
> +             ((1u << emul_ctxt.ctxt.event.vector) &
> +              ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
> +               (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
> +        {
> +            if ( is_hvm_vcpu(v) )
> +                hvm_inject_event(&emul_ctxt.ctxt.event);
> +            else
> +                pv_inject_event(&emul_ctxt.ctxt.event);
> +        }
> +        else
> +        {
> +            if ( is_hvm_vcpu(v) )
> +                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +            else
> +                pv_inject_hw_exception(TRAP_gp_fault, 0);
> +        }

I don't think it's OK to lob #GP into a HVM guest here -- we can hit
this path even if the guest isn't behaving strangely.  I think it
would be better to run this check before the X86EMUL_UNHANDLEABLE one
and convert injections that we choose not to handle into
X86EMUL_UNHANDLEABLE.

Which I guess brings us back full circle to the behaviour we had
before, and perhaps the right change to the earlier patch is to
start down this road, with an explanatory comment.

e.g. start with something like this, immediately after the first call
to x86_emulate():

    /*
     * Events raised within the emulator itself used to return
     * X86EMUL_UNHANDLEABLE because we didn't supply injection
     * callbacks.  Now the emulator supplies those to us via
     * ctxt.event_pending instead.  Preserve the old behaviour
     * for now.
     */
     if (emul_ctxt.ctxt.event_pending)
         r = X86EMUL_UNHANDLEABLE;

And in this patch, replace it with the hunk you have above, but
setting r = X86EMUL_UNHANDLEABLE instead of injecting #GP.

Does that make sense?

Also, I'm a little confused after all this as to whether the emulator
can still return with X86EMUL_OKAY and the event_pending set.  If it
can do that then we need to inject the event come what may, because
any other side-effects will have been committted already. 

> @@ -3475,6 +3503,37 @@ static int sh_page_fault(struct vcpu *v,
>              {
>                  perfc_incr(shadow_em_ex_fail);
>                  TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_EMULATION_LAST_FAILED);
> +
> +                if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
> +                {
> +                    /*
> +                     * This emulation covers writes to shadow pagetables.  We
> +                     * tolerate #PF (from hitting adjacent pages), #GP/#SS
> +                     * (from segmentation errors), and #DB (from
> +                     * singlestepping).  Anything else is an emulation bug, or
> +                     * a guest playing with the instruction stream under Xen's
> +                     * feet.
> +                     */
> +                    if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
> +                         (emul_ctxt.ctxt.event.vector < 32) &&
> +                         ((1u << emul_ctxt.ctxt.event.vector) &
> +                          ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
> +                           (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
> +                    {
> +                        if ( is_hvm_vcpu(v) )
> +                            hvm_inject_event(&emul_ctxt.ctxt.event);
> +                        else
> +                            pv_inject_event(&emul_ctxt.ctxt.event);
> +                    }
> +                    else
> +                    {
> +                        if ( is_hvm_vcpu(v) )
> +                            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +                        else
> +                            pv_inject_hw_exception(TRAP_gp_fault, 0);
> +                    }
> +                }
> +

This looks like code duplication, but rather than trying to merge the
two cases, I think we can drop this one entirely.  This emulation is
optimistically trying to find the second half of a PAE PTE write -
it's OK just to stop emulating if we hit anything this exciting.
So we can lose the whole hunk.

(Again, unless we need to handle X86EMUL_OKAY and event_pending).

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
  2016-11-28 11:56   ` Paul Durrant
@ 2016-11-28 14:56   ` Tim Deegan
  2016-11-28 16:32     ` Andrew Cooper
  2016-11-29  1:22   ` Tian, Kevin
  2016-11-29 16:24   ` Jan Beulich
  3 siblings, 1 reply; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 14:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Paul Durrant, Jun Nakajima, Jan Beulich, Xen-devel

At 11:13 +0000 on 28 Nov (1480331614), Andrew Cooper wrote:
> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
> to inject the pagefault themselves.

This seems like it'd be easy to forget to DTRT with the fault,
especially in code being ported forward across this series.

Would it be better to have hvm_copy &c take a callback function
instead of a pfinfo pointer?

(In any case, this can have my ack for the shadow-code change.)

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back
  2016-11-28 14:49   ` Tim Deegan
@ 2016-11-28 16:04     ` Andrew Cooper
  2016-11-28 17:21       ` Tim Deegan
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 16:04 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jan Beulich, Xen-devel

On 28/11/16 14:49, Tim Deegan wrote:
> Hi,
>
> At 11:13 +0000 on 28 Nov (1480331610), Andrew Cooper wrote:
>> Use x86_emul_{hw_exception,pagefault}() rather than
>> {pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
>> faults to be known to the emulator.  This requires altering the callers of
>> x86_emulate() to properly re-inject the event.
>>
>> While fixing this, fix the singlestep behaviour.  Previously, an otherwise
>> successful emulation would fail if singlestepping was active, as the emulator
>> couldn't raise #DB.  This is unreasonable from the point of view of the guest.
>>
>> We therefore tolerate #PF/#GP/SS and #DB being raised by the emulator, but
>> reject anything else as unexpected.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> +    if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
>> +    {
>> +        /*
>> +         * This emulation covers writes to shadow pagetables.  We tolerate #PF
>> +         * (from hitting adjacent pages), #GP/#SS (from segmentation errors),
>> +         * and #DB (from singlestepping).  Anything else is an emulation bug,
>> +         * or a guest playing with the instruction stream under Xen's feet.
>> +         */
>> +        if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +             (emul_ctxt.ctxt.event.vector < 32) &&
>> +             ((1u << emul_ctxt.ctxt.event.vector) &
>> +              ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
>> +               (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
>> +        {
>> +            if ( is_hvm_vcpu(v) )
>> +                hvm_inject_event(&emul_ctxt.ctxt.event);
>> +            else
>> +                pv_inject_event(&emul_ctxt.ctxt.event);
>> +        }
>> +        else
>> +        {
>> +            if ( is_hvm_vcpu(v) )
>> +                hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> +            else
>> +                pv_inject_hw_exception(TRAP_gp_fault, 0);
>> +        }
> I don't think it's OK to lob #GP into a HVM guest here -- we can hit
> this path even if the guest isn't behaving strangely.

What circumstances are you thinking of?

Unless the guest is playing with the instruction stream,  event_pending
will only be set by the set in the paths altered by this patch, which is
the four whitelisted vectors.

> I think it would be better to run this check before the X86EMUL_UNHANDLEABLE one
> and convert injections that we choose not to handle into
> X86EMUL_UNHANDLEABLE.
>
> Which I guess brings us back full circle to the behaviour we had
> before, and perhaps the right change to the earlier patch is to
> start down this road, with an explanatory comment.
>
> e.g. start with something like this, immediately after the first call
> to x86_emulate():
>
>     /*
>      * Events raised within the emulator itself used to return
>      * X86EMUL_UNHANDLEABLE because we didn't supply injection
>      * callbacks.  Now the emulator supplies those to us via
>      * ctxt.event_pending instead.  Preserve the old behaviour
>      * for now.
>      */
>      if (emul_ctxt.ctxt.event_pending)
>          r = X86EMUL_UNHANDLEABLE;
>
> And in this patch, replace it with the hunk you have above, but
> setting r = X86EMUL_UNHANDLEABLE instead of injecting #GP.
>
> Does that make sense?

It does make sense, but that goes against the comment of not unshadowing
on exception.

A lot of this revolves around how likely we are to hit the #GP[0] case. 
I assert that we shouldn't be able to hit it unless the guest is playing
games, or we have a bug in emulation.

If we don't go throwing a #GP back, we should at least leave something
obvious in the log.

>
> Also, I'm a little confused after all this as to whether the emulator
> can still return with X86EMUL_OKAY and the event_pending set.

I have now determined this not to be the case.  Apologies for my
misinformation in v1.

All exception and software interrupt cases end up returning X86_EXCEPTION.

    case 0xcd: /* int imm8 */
        swint_type = x86_swint_int;
    swint:
        rc = inject_swint(swint_type, (uint8_t)src.val,
                          _regs.eip - ctxt->regs->eip,
                          ctxt, ops) ? : X86EMUL_EXCEPTION;
        goto done;


This is why I introduced the slightly relaxed check:

if ( emul_ctxt.ctxt.event_pending )
    ASSERT(r == X86EMUL_EXCEPTION);

in the earlier patch.

> If it
> can do that then we need to inject the event come what may, because
> any other side-effects will have been committted already. 

Because of the shadow pagetables use of x86_swint_emulate_none, all
software interrupts have fault semantics, so %eip isn't moved forward;
this is left to hardware on the re-inject path.  There are no other
pieces of state change for any software interrupts.

With some re-evaluation of hindsight, I plan to make all trap injection
have fault semantics out of x86_emulate(), leaving the possible
adjustment of %eip to the lower level vendor functions, as x86_event has
an insn_len field.  In reality, its only the svm path which needs to
adjust %eip, and only in certain circumstances.

The odd-case-out is singlestep, which completes writeback then raises an
exception.

>
>> @@ -3475,6 +3503,37 @@ static int sh_page_fault(struct vcpu *v,
>>              {
>>                  perfc_incr(shadow_em_ex_fail);
>>                  TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_EMULATION_LAST_FAILED);
>> +
>> +                if ( r == X86EMUL_EXCEPTION && emul_ctxt.ctxt.event_pending )
>> +                {
>> +                    /*
>> +                     * This emulation covers writes to shadow pagetables.  We
>> +                     * tolerate #PF (from hitting adjacent pages), #GP/#SS
>> +                     * (from segmentation errors), and #DB (from
>> +                     * singlestepping).  Anything else is an emulation bug, or
>> +                     * a guest playing with the instruction stream under Xen's
>> +                     * feet.
>> +                     */
>> +                    if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
>> +                         (emul_ctxt.ctxt.event.vector < 32) &&
>> +                         ((1u << emul_ctxt.ctxt.event.vector) &
>> +                          ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
>> +                           (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
>> +                    {
>> +                        if ( is_hvm_vcpu(v) )
>> +                            hvm_inject_event(&emul_ctxt.ctxt.event);
>> +                        else
>> +                            pv_inject_event(&emul_ctxt.ctxt.event);
>> +                    }
>> +                    else
>> +                    {
>> +                        if ( is_hvm_vcpu(v) )
>> +                            hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> +                        else
>> +                            pv_inject_hw_exception(TRAP_gp_fault, 0);
>> +                    }
>> +                }
>> +
> This looks like code duplication, but rather than trying to merge the
> two cases, I think we can drop this one entirely.  This emulation is
> optimistically trying to find the second half of a PAE PTE write -
> it's OK just to stop emulating if we hit anything this exciting.
> So we can lose the whole hunk.

At the very least we should retain the singlestep #DB injection, as it
still has trap semantics.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 14:56   ` Tim Deegan
@ 2016-11-28 16:32     ` Andrew Cooper
  2016-11-28 16:42       ` Tim Deegan
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 16:32 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Kevin Tian, Paul Durrant, Jun Nakajima, Jan Beulich, Xen-devel

On 28/11/16 14:56, Tim Deegan wrote:
> At 11:13 +0000 on 28 Nov (1480331614), Andrew Cooper wrote:
>> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
>> to inject the pagefault themselves.
> This seems like it'd be easy to forget to DTRT with the fault,
> especially in code being ported forward across this series.

Code ported across the series will have an API change to accommodate.

>
> Would it be better to have hvm_copy &c take a callback function
> instead of a pfinfo pointer?

I considered both of these options, but this option seemed cleaner at
the time.

I am not fussed either way, but I don't see that a new function pointer
would be any less easy to get wrong.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 16:32     ` Andrew Cooper
@ 2016-11-28 16:42       ` Tim Deegan
  0 siblings, 0 replies; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 16:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Paul Durrant, Jun Nakajima, Jan Beulich, Xen-devel

At 16:32 +0000 on 28 Nov (1480350762), Andrew Cooper wrote:
> On 28/11/16 14:56, Tim Deegan wrote:
> > At 11:13 +0000 on 28 Nov (1480331614), Andrew Cooper wrote:
> >> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
> >> to inject the pagefault themselves.
> > This seems like it'd be easy to forget to DTRT with the fault,
> > especially in code being ported forward across this series.
> 
> Code ported across the series will have an API change to accommodate.
> 
> >
> > Would it be better to have hvm_copy &c take a callback function
> > instead of a pfinfo pointer?
> 
> I considered both of these options, but this option seemed cleaner at
> the time.
> 
> I am not fussed either way, but I don't see that a new function pointer
> would be any less easy to get wrong.

Righto.  As I said, Ack to the shadow bits anyway.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back
  2016-11-28 16:04     ` Andrew Cooper
@ 2016-11-28 17:21       ` Tim Deegan
  2016-11-28 17:36         ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Tim Deegan @ 2016-11-28 17:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 16:04 +0000 on 28 Nov (1480349059), Andrew Cooper wrote:
> On 28/11/16 14:49, Tim Deegan wrote:
> > At 11:13 +0000 on 28 Nov (1480331610), Andrew Cooper wrote:
> >> +        /*
> >> +         * This emulation covers writes to shadow pagetables.  We tolerate #PF
> >> +         * (from hitting adjacent pages), #GP/#SS (from segmentation errors),
> >> +         * and #DB (from singlestepping).  Anything else is an emulation bug,
> >> +         * or a guest playing with the instruction stream under Xen's feet.
> >> +         */
> >> +        if ( emul_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
> >> +             (emul_ctxt.ctxt.event.vector < 32) &&
> >> +             ((1u << emul_ctxt.ctxt.event.vector) &
> >> +              ((1u << TRAP_debug) | (1u << TRAP_stack_error) |
> >> +               (1u << TRAP_gp_fault) | (1u << TRAP_page_fault))) )
> >> +        {
> >> +            if ( is_hvm_vcpu(v) )
> >> +                hvm_inject_event(&emul_ctxt.ctxt.event);
> >> +            else
> >> +                pv_inject_event(&emul_ctxt.ctxt.event);
> >> +        }
> >> +        else
> >> +        {
> >> +            if ( is_hvm_vcpu(v) )
> >> +                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> >> +            else
> >> +                pv_inject_hw_exception(TRAP_gp_fault, 0);
> >> +        }
> > I don't think it's OK to lob #GP into a HVM guest here -- we can hit
> > this path even if the guest isn't behaving strangely.
> 
> What circumstances are you thinking of?
> 
> Unless the guest is playing with the instruction stream,  event_pending
> will only be set by the set in the paths altered by this patch, which is
> the four whitelisted vectors.

I don't see any path through the emulator that both writes to memory
and raises a different exception to these four, so I retract that -
the guest does have to be acting strangely.  Likely either modifying
code while another CPU is running on it or executing from a page whose
PT mappings are changing (including changing a text mapping and then
executing from it without flushing TLB first).

But even if the guest is doing something demented with multithreaded
self-modifying code, I think we're better off abandoning ship here
and retrying the instruction than raising a spurious #GP.  HVM
guests don't have any contract with us that lets us #GP them here.
And we can hit this path for writes to non-pagetables, if we haven't
unshadowed an ex-pagetable yet.

I don't know of any kernel-space code that behaves quite so loosely
wiht its mappings, but obfuscation tricks to stop reverse engineering
can do some pretty strange things.

> > I think it would be better to run this check before the X86EMUL_UNHANDLEABLE one
> > and convert injections that we choose not to handle into
> > X86EMUL_UNHANDLEABLE.
> >
> > Which I guess brings us back full circle to the behaviour we had
> > before, and perhaps the right change to the earlier patch is to
> > start down this road, with an explanatory comment.
> >
> > e.g. start with something like this, immediately after the first call
> > to x86_emulate():
> >
> >     /*
> >      * Events raised within the emulator itself used to return
> >      * X86EMUL_UNHANDLEABLE because we didn't supply injection
> >      * callbacks.  Now the emulator supplies those to us via
> >      * ctxt.event_pending instead.  Preserve the old behaviour
> >      * for now.
> >      */
> >      if (emul_ctxt.ctxt.event_pending)
> >          r = X86EMUL_UNHANDLEABLE;
> >
> > And in this patch, replace it with the hunk you have above, but
> > setting r = X86EMUL_UNHANDLEABLE instead of injecting #GP.
> >
> > Does that make sense?
> 
> It does make sense, but that goes against the comment of not unshadowing
> on exception.

Yep, that comment would need to be updated.  The final behaviour is
something like what we had before: some kinds of event (#PF in
particular) get injected, and for the others we unshadow and retry.
You've expanded the set of events that we'll inject, and the mechanism has
moved around.

> A lot of this revolves around how likely we are to hit the #GP[0] case. 
> I assert that we shouldn't be able to hit it unless the guest is playing
> games, or we have a bug in emulation.
> 
> If we don't go throwing a #GP back, we should at least leave something
> obvious in the log.
> 
> >
> > Also, I'm a little confused after all this as to whether the emulator
> > can still return with X86EMUL_OKAY and the event_pending set.
> 
> I have now determined this not to be the case.  Apologies for my
> misinformation in v1.

Phew!

> > This looks like code duplication, but rather than trying to merge the
> > two cases, I think we can drop this one entirely.  This emulation is
> > optimistically trying to find the second half of a PAE PTE write -
> > it's OK just to stop emulating if we hit anything this exciting.
> > So we can lose the whole hunk.
> 
> At the very least we should retain the singlestep #DB injection, as it
> still has trap semantics.

Argh!  Meaning it returns X86EMUL_EXCEPTION but has already updated
register state?  So yeah, we have to inject that.  But it can go away
when you change everything to have fault semantics, right?

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 13/19] x86/shadow: Avoid raising faults behind the emulators back
  2016-11-28 17:21       ` Tim Deegan
@ 2016-11-28 17:36         ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-28 17:36 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jan Beulich, Xen-devel

On 28/11/16 17:21, Tim Deegan wrote:
>>> I think it would be better to run this check before the X86EMUL_UNHANDLEABLE one
>>> and convert injections that we choose not to handle into
>>> X86EMUL_UNHANDLEABLE.
>>>
>>> Which I guess brings us back full circle to the behaviour we had
>>> before, and perhaps the right change to the earlier patch is to
>>> start down this road, with an explanatory comment.
>>>
>>> e.g. start with something like this, immediately after the first call
>>> to x86_emulate():
>>>
>>>     /*
>>>      * Events raised within the emulator itself used to return
>>>      * X86EMUL_UNHANDLEABLE because we didn't supply injection
>>>      * callbacks.  Now the emulator supplies those to us via
>>>      * ctxt.event_pending instead.  Preserve the old behaviour
>>>      * for now.
>>>      */
>>>      if (emul_ctxt.ctxt.event_pending)
>>>          r = X86EMUL_UNHANDLEABLE;
>>>
>>> And in this patch, replace it with the hunk you have above, but
>>> setting r = X86EMUL_UNHANDLEABLE instead of injecting #GP.
>>>
>>> Does that make sense?
>> It does make sense, but that goes against the comment of not unshadowing
>> on exception.
> Yep, that comment would need to be updated.  The final behaviour is
> something like what we had before: some kinds of event (#PF in
> particular) get injected, and for the others we unshadow and retry.
> You've expanded the set of events that we'll inject, and the mechanism has
> moved around.

Ok.  I will drop the #GP's and cause this to fall into the unhandleable
path.

>
>> A lot of this revolves around how likely we are to hit the #GP[0] case. 
>> I assert that we shouldn't be able to hit it unless the guest is playing
>> games, or we have a bug in emulation.
>>
>> If we don't go throwing a #GP back, we should at least leave something
>> obvious in the log.
>>
>>> Also, I'm a little confused after all this as to whether the emulator
>>> can still return with X86EMUL_OKAY and the event_pending set.
>> I have now determined this not to be the case.  Apologies for my
>> misinformation in v1.
> Phew!

Yes.  Despite being the last person to fix this several times, it is
very opaque code.

>
>>> This looks like code duplication, but rather than trying to merge the
>>> two cases, I think we can drop this one entirely.  This emulation is
>>> optimistically trying to find the second half of a PAE PTE write -
>>> it's OK just to stop emulating if we hit anything this exciting.
>>> So we can lose the whole hunk.
>> At the very least we should retain the singlestep #DB injection, as it
>> still has trap semantics.
> Argh!  Meaning it returns X86EMUL_EXCEPTION but has already updated
> register state?

Yes

> So yeah, we have to inject that.  But it can go away
> when you change everything to have fault semantics, right?

No.  Singlestep comes out as a hardware exception, not a software interrupt.

Implemented in this way, it is always going to be a special case; we
must manually raise the singlestep event if necessary, as hardware won't
do it automatically on re-entry to the guest.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
  2016-11-28 11:56   ` Paul Durrant
  2016-11-28 14:56   ` Tim Deegan
@ 2016-11-29  1:22   ` Tian, Kevin
  2016-11-29 16:24   ` Jan Beulich
  3 siblings, 0 replies; 57+ messages in thread
From: Tian, Kevin @ 2016-11-29  1:22 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Paul Durrant, Tim Deegan, Nakajima, Jun, Jan Beulich

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Monday, November 28, 2016 7:14 PM
> 
> Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
> to inject the pagefault themselves.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary
  2016-11-28 11:13 ` [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary Andrew Cooper
  2016-11-28 11:55   ` Tim Deegan
@ 2016-11-29 15:24   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2016-11-29 15:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
> When translating the second frame of a write crossing a page boundary, mask
> the linear address down to the page boundary.
> 
> This causes the correct %cr2 being reported to the guest in the case that the
> second frame suffers a pagefault during translation.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED
  2016-11-28 11:13 ` [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED Andrew Cooper
  2016-11-28 11:55   ` Tim Deegan
@ 2016-11-29 15:29   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2016-11-29 15:29 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
> X86EMUL_CMPXCHG_FAILED was introduced in c/s d430aae25 in 2005.  Even at the
> time it alised what is now X86EMUL_RETRY (as well as what is now
> X86EMUL_EXCEPTION).  I am not sure why the distinction was considered useful
> at the time.

I have always guessed that this is so one could make them have
distinct values if need be, since "cmpxchg failure" does not
necessarily mean "retry" to all possible callers. So I'm not fully
convinced this is a good move, but I'm also not worried enough
to really object.

> It is only used twice; there is no need to call it out differently from other
> uses of X86EMUL_RETRY.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-28 11:13 ` [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}() Andrew Cooper
  2016-11-28 11:58   ` Tim Deegan
@ 2016-11-29 16:00   ` Jan Beulich
  2016-11-29 16:50     ` Andrew Cooper
  1 sibling, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2016-11-29 16:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
> The existing propagate_page_fault() is fairly similar to
> pv_inject_page_fault(), although it has a return value.  Only a single caller
> makes use of the return value, and non-NULL is only returned if the passed cr2
> is non-canonical.  Opencode this single case in
> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
> void.

I can only say that back then it was quite intentional to not open
code this in the caller, no matter that it was (and still is) just one.

>      if ( unlikely(null_trap_bounce(v, tb)) )
> -        gprintk(XENLOG_WARNING,
> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
> -                trapstr(trapnr), trapnr, regs->error_code);
> +    {
> +        if ( vector == TRAP_page_fault )
> +        {
> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
> +            show_page_walk(event->cr2);
> +
> +            if ( unlikely(error_code & PFEC_reserved_bit) )
> +                reserved_bit_page_fault(event->cr2, regs);

I think you want to move the show_page_walk() into an else here,
to avoid logging two of them. But then again - why do you move
this behind the null_trap_bounce() check? It had been logged
independently in the original code, and for (imo) a good reason
(reserved bit faults are always a sign of a hypervisor problem
after all).

> +        }
> +        else
> +            gprintk(XENLOG_WARNING,
> +                    "Unhandled %s fault/trap [#%d, ec=%04x]\n",
> +                    trapstr(vector), vector, error_code);

Which tells us that we need to finally get our log level handling
straightened, to avoid inconsistencies like the one here comparing
with the code a few lines up.

> +static inline void do_guest_trap(unsigned int trapnr,
> +                                 const struct cpu_user_regs *regs)
> +{
> +    const struct x86_event event = {

I don't mind the const, but I don't think it's very useful here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back Andrew Cooper
  2016-11-28 12:47   ` Paul Durrant
@ 2016-11-29 16:02   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2016-11-29 16:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, Xen-devel

>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
> Introduce a new x86_emul_pagefault() similar to x86_emul_hw_exception(), and
> use this instead of hvm_inject_page_fault() from emulation codepaths.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
                     ` (2 preceding siblings ...)
  2016-11-29  1:22   ` Tian, Kevin
@ 2016-11-29 16:24   ` Jan Beulich
  2016-11-29 16:30     ` Andrew Cooper
  3 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2016-11-29 16:24 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Paul Durrant, Tim Deegan, Jun Nakajima, Xen-devel

>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
> @@ -3012,7 +3018,10 @@ void hvm_task_switch(
>                                        &tss.back_link, sizeof(tss.back_link), 0,
>                                        &pfinfo);
>          if ( rc == HVMCOPY_bad_gva_to_gfn )
> +        {
> +            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>              exn_raised = 1;
> +        }
>          else if ( rc != HVMCOPY_okay )
>              goto out;
>      }

There's another one a few lines down from here (storing the error
code on the stack). Or did an earlier patch replace this, and I didn't
notice?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-29 16:24   ` Jan Beulich
@ 2016-11-29 16:30     ` Andrew Cooper
  2016-11-29 16:36       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-29 16:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Paul Durrant, Tim Deegan, Jun Nakajima, Xen-devel

On 29/11/16 16:24, Jan Beulich wrote:
>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>> @@ -3012,7 +3018,10 @@ void hvm_task_switch(
>>                                        &tss.back_link, sizeof(tss.back_link), 0,
>>                                        &pfinfo);
>>          if ( rc == HVMCOPY_bad_gva_to_gfn )
>> +        {
>> +            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>>              exn_raised = 1;
>> +        }
>>          else if ( rc != HVMCOPY_okay )
>>              goto out;
>>      }
> There's another one a few lines down from here (storing the error
> code on the stack). Or did an earlier patch replace this, and I didn't
> notice?

Ah - that was a rebasing error on my behalf.  Before your TSS
adjustments, that was a no-fault call.

I will fix it up.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-29 16:30     ` Andrew Cooper
@ 2016-11-29 16:36       ` Jan Beulich
  2016-11-29 16:38         ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2016-11-29 16:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Paul Durrant, Tim Deegan, Jun Nakajima, Xen-devel

>>> On 29.11.16 at 17:30, <andrew.cooper3@citrix.com> wrote:
> On 29/11/16 16:24, Jan Beulich wrote:
>>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>>> @@ -3012,7 +3018,10 @@ void hvm_task_switch(
>>>                                        &tss.back_link, sizeof(tss.back_link), 
> 0,
>>>                                        &pfinfo);
>>>          if ( rc == HVMCOPY_bad_gva_to_gfn )
>>> +        {
>>> +            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>>>              exn_raised = 1;
>>> +        }
>>>          else if ( rc != HVMCOPY_okay )
>>>              goto out;
>>>      }
>> There's another one a few lines down from here (storing the error
>> code on the stack). Or did an earlier patch replace this, and I didn't
>> notice?
> 
> Ah - that was a rebasing error on my behalf.  Before your TSS
> adjustments, that was a no-fault call.
> 
> I will fix it up.

Okay, assuming it'll follow the same model, feel free to put my R-b
then on the result.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
  2016-11-29 16:36       ` Jan Beulich
@ 2016-11-29 16:38         ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-29 16:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Kevin Tian, Paul Durrant, Tim Deegan, Jun Nakajima, Xen-devel

On 29/11/16 16:36, Jan Beulich wrote:
>>>> On 29.11.16 at 17:30, <andrew.cooper3@citrix.com> wrote:
>> On 29/11/16 16:24, Jan Beulich wrote:
>>>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>>>> @@ -3012,7 +3018,10 @@ void hvm_task_switch(
>>>>                                        &tss.back_link, sizeof(tss.back_link), 
>> 0,
>>>>                                        &pfinfo);
>>>>          if ( rc == HVMCOPY_bad_gva_to_gfn )
>>>> +        {
>>>> +            hvm_inject_page_fault(pfinfo.ec, pfinfo.linear);
>>>>              exn_raised = 1;
>>>> +        }
>>>>          else if ( rc != HVMCOPY_okay )
>>>>              goto out;
>>>>      }
>>> There's another one a few lines down from here (storing the error
>>> code on the stack). Or did an earlier patch replace this, and I didn't
>>> notice?
>> Ah - that was a rebasing error on my behalf.  Before your TSS
>> adjustments, that was a no-fault call.
>>
>> I will fix it up.
> Okay, assuming it'll follow the same model,

Yes.  The new hunk is identical to this quoted hunk (other than
indentation).

> feel free to put my R-b then on the result.

Thanks.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-29 16:00   ` Jan Beulich
@ 2016-11-29 16:50     ` Andrew Cooper
  2016-11-30  8:41       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2016-11-29 16:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Xen-devel

On 29/11/16 16:00, Jan Beulich wrote:
>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>> The existing propagate_page_fault() is fairly similar to
>> pv_inject_page_fault(), although it has a return value.  Only a single caller
>> makes use of the return value, and non-NULL is only returned if the passed cr2
>> is non-canonical.  Opencode this single case in
>> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
>> void.
> I can only say that back then it was quite intentional to not open
> code this in the caller, no matter that it was (and still is) just one.

Is that an objection to me making this change?

>
>>      if ( unlikely(null_trap_bounce(v, tb)) )
>> -        gprintk(XENLOG_WARNING,
>> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>> -                trapstr(trapnr), trapnr, regs->error_code);
>> +    {
>> +        if ( vector == TRAP_page_fault )
>> +        {
>> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
>> +            show_page_walk(event->cr2);
>> +
>> +            if ( unlikely(error_code & PFEC_reserved_bit) )
>> +                reserved_bit_page_fault(event->cr2, regs);
> I think you want to move the show_page_walk() into an else here,
> to avoid logging two of them. But then again - why do you move
> this behind the null_trap_bounce() check? It had been logged
> independently in the original code, and for (imo) a good reason
> (reserved bit faults are always a sign of a hypervisor problem
> after all).

TBH, I found it odd that it was in propagate_page_fault() to start with.

It is the kind of thing which should be in the pagefault handler itself,
not in the reinjection-to-pv-guests code.

Would moving it into fixup_page_fault() be ok?  It should probably go
between the hap/shadow and memadd checks, as shadow guests can have
reserved bits set.

>
>> +        }
>> +        else
>> +            gprintk(XENLOG_WARNING,
>> +                    "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>> +                    trapstr(vector), vector, error_code);
> Which tells us that we need to finally get our log level handling
> straightened, to avoid inconsistencies like the one here comparing
> with the code a few lines up.

Yes.  I chose not to unpick that swamp right now, but if
reserved_bit_page_fault() gets moved out, this bit can be simplified to
this single gprintk().

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-29 16:50     ` Andrew Cooper
@ 2016-11-30  8:41       ` Jan Beulich
  2016-11-30 13:17         ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2016-11-30  8:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Tim Deegan, Xen-devel

>>> On 29.11.16 at 17:50, <andrew.cooper3@citrix.com> wrote:
> On 29/11/16 16:00, Jan Beulich wrote:
>>>>> On 28.11.16 at 12:13, <andrew.cooper3@citrix.com> wrote:
>>> The existing propagate_page_fault() is fairly similar to
>>> pv_inject_page_fault(), although it has a return value.  Only a single caller
>>> makes use of the return value, and non-NULL is only returned if the passed cr2
>>> is non-canonical.  Opencode this single case in
>>> handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
>>> void.
>> I can only say that back then it was quite intentional to not open
>> code this in the caller, no matter that it was (and still is) just one.
> 
> Is that an objection to me making this change?

Well, no, not really. It's more like "I'd prefer it to stay as is, but I can
see why you want to change it, and not changing it would likely make
your overall modification harder".

>>>      if ( unlikely(null_trap_bounce(v, tb)) )
>>> -        gprintk(XENLOG_WARNING,
>>> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>>> -                trapstr(trapnr), trapnr, regs->error_code);
>>> +    {
>>> +        if ( vector == TRAP_page_fault )
>>> +        {
>>> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
>>> +            show_page_walk(event->cr2);
>>> +
>>> +            if ( unlikely(error_code & PFEC_reserved_bit) )
>>> +                reserved_bit_page_fault(event->cr2, regs);
>> I think you want to move the show_page_walk() into an else here,
>> to avoid logging two of them. But then again - why do you move
>> this behind the null_trap_bounce() check? It had been logged
>> independently in the original code, and for (imo) a good reason
>> (reserved bit faults are always a sign of a hypervisor problem
>> after all).
> 
> TBH, I found it odd that it was in propagate_page_fault() to start with.
> 
> It is the kind of thing which should be in the pagefault handler itself,
> not in the reinjection-to-pv-guests code.
> 
> Would moving it into fixup_page_fault() be ok?  It should probably go
> between the hap/shadow and memadd checks, as shadow guests can have
> reserved bits set.

That would move it ahead in the flow quite a bit, which I'm not sure
is a good idea (due to possible [hypothetical] other uses of reserved
bits). Also note that there already is a call to reserved_bit_page_fault()
in the !guest_mode() case, so if anything I would see it moved right
ahead of the pv_inject_page_fault() invocation from do_page_fault().
(This would then shrink page size too, as you wouldn't have to move
around reserved_bit_page_fault() itself.)

Btw, looking at the full patch again I notice that the error code
parameter to pv_inject_page_fault() is signed, which is contrary to
what I think I recall you saying on the HVM side of things (the
error code being non-optional here, and hence better being unsigned,
as X86_EVENT_NO_EC is not allowed).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}()
  2016-11-30  8:41       ` Jan Beulich
@ 2016-11-30 13:17         ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2016-11-30 13:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Xen-devel

On 30/11/16 08:41, Jan Beulich wrote:
>>>>      if ( unlikely(null_trap_bounce(v, tb)) )
>>>> -        gprintk(XENLOG_WARNING,
>>>> -                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
>>>> -                trapstr(trapnr), trapnr, regs->error_code);
>>>> +    {
>>>> +        if ( vector == TRAP_page_fault )
>>>> +        {
>>>> +            printk("%pv: unhandled page fault (ec=%04X)\n", v, error_code);
>>>> +            show_page_walk(event->cr2);
>>>> +
>>>> +            if ( unlikely(error_code & PFEC_reserved_bit) )
>>>> +                reserved_bit_page_fault(event->cr2, regs);
>>> I think you want to move the show_page_walk() into an else here,
>>> to avoid logging two of them. But then again - why do you move
>>> this behind the null_trap_bounce() check? It had been logged
>>> independently in the original code, and for (imo) a good reason
>>> (reserved bit faults are always a sign of a hypervisor problem
>>> after all).
>> TBH, I found it odd that it was in propagate_page_fault() to start with.
>>
>> It is the kind of thing which should be in the pagefault handler itself,
>> not in the reinjection-to-pv-guests code.
>>
>> Would moving it into fixup_page_fault() be ok?  It should probably go
>> between the hap/shadow and memadd checks, as shadow guests can have
>> reserved bits set.
> That would move it ahead in the flow quite a bit, which I'm not sure
> is a good idea (due to possible [hypothetical] other uses of reserved
> bits). Also note that there already is a call to reserved_bit_page_fault()
> in the !guest_mode() case, so if anything I would see it moved right
> ahead of the pv_inject_page_fault() invocation from do_page_fault().
> (This would then shrink page size too, as you wouldn't have to move
> around reserved_bit_page_fault() itself.)

Done.  This looks much cleaner.

>
> Btw, looking at the full patch again I notice that the error code
> parameter to pv_inject_page_fault() is signed, which is contrary to
> what I think I recall you saying on the HVM side of things (the
> error code being non-optional here, and hence better being unsigned,
> as X86_EVENT_NO_EC is not allowed).

hvm_inject_page_fault() has always had a signed error code, and the API
is maintained by x86_emul_* and pv_inject_* for consistency.

struct pfinfo currently has an unsigned ec parameter, because all the
existing code uses uint32_t pfec.  In reality, this isn't a problem at
the pfinfo / *_inject_page_fault() boundary.

I have a number of patches focusing on pagefault, and in particular
trying to clean up a lot of misuse of pfec.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2016-11-30 13:17 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-28 11:13 [PATCH for-4.9 v2 00/19] XSA-191 followup Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 01/19] x86/shadow: Fix #PFs from emulated writes crossing a page boundary Andrew Cooper
2016-11-28 11:55   ` Tim Deegan
2016-11-29 15:24   ` Jan Beulich
2016-11-28 11:13 ` [PATCH v2 02/19] x86/emul: Drop X86EMUL_CMPXCHG_FAILED Andrew Cooper
2016-11-28 11:55   ` Tim Deegan
2016-11-29 15:29   ` Jan Beulich
2016-11-28 11:13 ` [PATCH v2 03/19] x86/emul: Simplfy emulation state setup Andrew Cooper
2016-11-28 11:58   ` Paul Durrant
2016-11-28 12:54   ` Paul Durrant
2016-11-28 11:13 ` [PATCH v2 04/19] x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 05/19] x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 06/19] x86/pv: Implement pv_inject_{event, page_fault, hw_exception}() Andrew Cooper
2016-11-28 11:58   ` Tim Deegan
2016-11-28 11:59     ` Andrew Cooper
2016-11-29 16:00   ` Jan Beulich
2016-11-29 16:50     ` Andrew Cooper
2016-11-30  8:41       ` Jan Beulich
2016-11-30 13:17         ` Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 07/19] x86/emul: Remove opencoded exception generation Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 08/19] x86/emul: Rework emulator event injection Andrew Cooper
2016-11-28 12:04   ` Tim Deegan
2016-11-28 12:48     ` Andrew Cooper
2016-11-28 14:24       ` Tim Deegan
2016-11-28 14:34         ` Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 09/19] x86/vmx: Use hvm_{get, set}_segment_register() rather than vmx_{get, set}_segment_register() Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 10/19] x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS Andrew Cooper
2016-11-28 14:18   ` Boris Ostrovsky
2016-11-28 11:13 ` [PATCH v2 11/19] x86/emul: Avoid raising faults behind the emulators back Andrew Cooper
2016-11-28 12:47   ` Paul Durrant
2016-11-29 16:02   ` Jan Beulich
2016-11-28 11:13 ` [PATCH v2 12/19] x86/pv: " Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 13/19] x86/shadow: " Andrew Cooper
2016-11-28 14:49   ` Tim Deegan
2016-11-28 16:04     ` Andrew Cooper
2016-11-28 17:21       ` Tim Deegan
2016-11-28 17:36         ` Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 14/19] x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 15/19] x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info Andrew Cooper
2016-11-28 12:56   ` Paul Durrant
2016-11-28 11:13 ` [PATCH v2 16/19] x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear() Andrew Cooper
2016-11-28 11:59   ` Paul Durrant
2016-11-28 11:13 ` [PATCH v2 17/19] x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back Andrew Cooper
2016-11-28 11:56   ` Paul Durrant
2016-11-28 12:58     ` Andrew Cooper
2016-11-28 13:01       ` Paul Durrant
2016-11-28 13:03         ` Andrew Cooper
2016-11-28 14:56   ` Tim Deegan
2016-11-28 16:32     ` Andrew Cooper
2016-11-28 16:42       ` Tim Deegan
2016-11-29  1:22   ` Tian, Kevin
2016-11-29 16:24   ` Jan Beulich
2016-11-29 16:30     ` Andrew Cooper
2016-11-29 16:36       ` Jan Beulich
2016-11-29 16:38         ` Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 18/19] x86/hvm: Prepare to allow use of system segments for memory references Andrew Cooper
2016-11-28 11:13 ` [PATCH v2 19/19] x86/hvm: Use system-segment relative memory accesses Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.