All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/18] Refactor x86 trap handling code
@ 2017-05-05 14:48 Wei Liu
  2017-05-05 14:48 ` [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init Wei Liu
                   ` (18 more replies)
  0 siblings, 19 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

This series splits PV code related to trap handling to files under pv                                                                                         
directory.

The patches to refactor various entry.S are dropped in this version as Andrew
is going to work on those.

   git://xenbits.xen.org/people/liuw/xen.git wip.move-traps-v2

Wei Liu (18):
  x86/traps: factor out pv_percpu_traps_init
  x86/traps: export trapstr
  x86/traps: lift do_guest_trap to domain.h
  x86/traps: move all PV emulation code to pv/emulate_ops.h
  x86/pv: clean up emulate_ops.c
  x86/traps: move PV hypercall handlers to pv/traps.c
  x86/traps: move pv_inject_event to pv/traps.c
  x86/traps: move set_guest_{machinecheck,nmi}_trapbounce
  x86/traps: move {un,}register_guest_nmi_callback
  x86/traps: delcare percpu softirq_trap
  x86/traps: move guest_has_trap_callback to pv/traps.c
  x86/traps: move send_guest_trap to pv/traps.c
  x86/traps: move PV specific code in x86_64/traps.c
  x86/traps: merge x86_64/compat/traps.c into pv/traps.c
  x86: clean up pv/traps.c
  x86: guest_has_trap_callback should return bool
  x86: fix coding style issues in asm-x86/traps.h
  x86: clean up traps.c

 xen/arch/x86/pv/Makefile           |    1 +
 xen/arch/x86/pv/emulate_ops.c      | 1931 ++++++++++++++++++++++++++++
 xen/arch/x86/pv/traps.c            | 1062 +++++++++++++++
 xen/arch/x86/traps.c               | 2500 +++---------------------------------
 xen/arch/x86/x86_64/compat/traps.c |  415 ------
 xen/arch/x86/x86_64/traps.c        |  362 +-----
 xen/include/asm-x86/domain.h       |   12 +
 xen/include/asm-x86/pv/domain.h    |    7 +
 xen/include/asm-x86/pv/traps.h     |   58 +
 xen/include/asm-x86/traps.h        |   29 +-
 10 files changed, 3261 insertions(+), 3116 deletions(-)
 create mode 100644 xen/arch/x86/pv/emulate_ops.c
 delete mode 100644 xen/arch/x86/x86_64/compat/traps.c
 create mode 100644 xen/include/asm-x86/pv/traps.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 16:12   ` Andrew Cooper
  2017-05-05 14:48 ` [PATCH v2 02/18] x86/traps: export trapstr Wei Liu
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

The body of subarch_percpu_traps_init is for setting up PV syscall
trampoline. Move that into a dedicated function.

Leave the BUILD_BUG_ON in the original function as it is not tied to PV.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/x86_64/traps.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index ad4d6c1f44..a237f4d5c2 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -374,16 +374,13 @@ DEFINE_PER_CPU(struct stubs, stubs);
 void lstar_enter(void);
 void cstar_enter(void);
 
-void subarch_percpu_traps_init(void)
+static void pv_percpu_traps_init(void)
 {
     unsigned long stack_bottom = get_stack_bottom();
     unsigned long stub_va = this_cpu(stubs.addr);
     unsigned char *stub_page;
     unsigned int offset;
 
-    /* IST_MAX IST pages + 1 syscall page + 1 guard page + primary stack. */
-    BUILD_BUG_ON((IST_MAX + 2) * PAGE_SIZE + PRIMARY_STACK_SIZE > STACK_SIZE);
-
     stub_page = map_domain_page(_mfn(this_cpu(stubs.mfn)));
 
     /*
@@ -422,6 +419,14 @@ void subarch_percpu_traps_init(void)
     wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
 }
 
+void subarch_percpu_traps_init(void)
+{
+    /* IST_MAX IST pages + 1 syscall page + 1 guard page + primary stack. */
+    BUILD_BUG_ON((IST_MAX + 2) * PAGE_SIZE + PRIMARY_STACK_SIZE > STACK_SIZE);
+
+    pv_percpu_traps_init();
+}
+
 void init_int80_direct_trap(struct vcpu *v)
 {
     struct trap_info *ti = &v->arch.pv_vcpu.trap_ctxt[0x80];
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/18] x86/traps: export trapstr
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
  2017-05-05 14:48 ` [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h Wei Liu
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

It will be used in common and pv specific code. Export it in traps.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c        | 2 +-
 xen/include/asm-x86/traps.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 27fdf12edb..8023392a06 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -544,7 +544,7 @@ static int nmi_show_execution_state(const struct cpu_user_regs *regs, int cpu)
     return 1;
 }
 
-static const char *trapstr(unsigned int trapnr)
+const char *trapstr(unsigned int trapnr)
 {
     static const char * const strings[] = {
         "divide error", "debug", "nmi", "bkpt", "overflow", "bounds",
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index e3884d8406..f1d2513e6b 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -51,4 +51,6 @@ uint32_t guest_io_read(unsigned int port, unsigned int bytes,
 void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
                     struct domain *);
 
+const char *trapstr(unsigned int trapnr);
+
 #endif /* ASM_TRAP_H */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
  2017-05-05 14:48 ` [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init Wei Liu
  2017-05-05 14:48 ` [PATCH v2 02/18] x86/traps: export trapstr Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-12 18:16   ` Andrew Cooper
  2017-05-05 14:48 ` [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h Wei Liu
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Put it along side with other pv_inject functions and rename it to
pv_inject_trap.

We need this because this function is used by PV emulation code and PV
trap handling code, which will be split into different files.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c         | 58 ++++++++++++++++++--------------------------
 xen/include/asm-x86/domain.h | 12 +++++++++
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8023392a06..2639be7850 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -684,18 +684,6 @@ void pv_inject_event(const struct x86_event *event)
     }
 }
 
-static inline void do_guest_trap(unsigned int trapnr,
-                                 const struct cpu_user_regs *regs)
-{
-    const struct x86_event event = {
-        .vector = trapnr,
-        .error_code = (((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)))
-                       ? regs->error_code : X86_EVENT_NO_EC),
-    };
-
-    pv_inject_event(&event);
-}
-
 static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
 {
     regs->rip = rip;
@@ -703,7 +691,7 @@ static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
     if ( regs->eflags & X86_EFLAGS_TF )
     {
         current->arch.debugreg[6] |= DR_STEP | DR_STATUS_RESERVED_ONE;
-        do_guest_trap(TRAP_debug, regs);
+        pv_inject_trap(TRAP_debug, regs);
     }
 }
 
@@ -751,7 +739,7 @@ int set_guest_machinecheck_trapbounce(void)
     struct vcpu *v = current;
     struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
  
-    do_guest_trap(TRAP_machine_check, guest_cpu_user_regs());
+    pv_inject_trap(TRAP_machine_check, guest_cpu_user_regs());
     tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
     return !null_trap_bounce(v, tb);
 }
@@ -764,7 +752,7 @@ int set_guest_nmi_trapbounce(void)
 {
     struct vcpu *v = current;
     struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
-    do_guest_trap(TRAP_nmi, guest_cpu_user_regs());
+    pv_inject_trap(TRAP_nmi, guest_cpu_user_regs());
     tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
     return !null_trap_bounce(v, tb);
 }
@@ -794,7 +782,7 @@ void do_trap(struct cpu_user_regs *regs)
 
     if ( guest_mode(regs) )
     {
-        do_guest_trap(trapnr, regs);
+        pv_inject_trap(trapnr, regs);
         return;
     }
 
@@ -1060,7 +1048,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
     {
         regs->rip = eip;
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_trap(TRAP_gp_fault, regs);
         return EXCRET_fault_fixed;
     }
 
@@ -1096,7 +1084,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
     {
         if ( !emulate_invalid_rdtscp(regs) &&
              !emulate_forced_invalid_op(regs) )
-            do_guest_trap(TRAP_invalid_op, regs);
+            pv_inject_trap(TRAP_invalid_op, regs);
         return;
     }
 
@@ -1224,7 +1212,7 @@ void do_int3(struct cpu_user_regs *regs)
         return;
     }
 
-    do_guest_trap(TRAP_int3, regs);
+    pv_inject_trap(TRAP_int3, regs);
 }
 
 static void reserved_bit_page_fault(
@@ -3038,7 +3026,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         {
             curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
             if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
-                do_guest_trap(TRAP_debug, regs);
+                pv_inject_trap(TRAP_debug, regs);
         }
         /* fall through */
     case X86EMUL_RETRY:
@@ -3153,12 +3141,12 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
          (((ar >> 13) & 3) < (regs->cs & 3)) ||
          ((ar & _SEGMENT_TYPE) != 0xc00) )
     {
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_trap(TRAP_gp_fault, regs);
         return;
     }
     if ( !(ar & _SEGMENT_P) )
     {
-        do_guest_trap(TRAP_no_segment, regs);
+        pv_inject_trap(TRAP_no_segment, regs);
         return;
     }
     dpl = (ar >> 13) & 3;
@@ -3174,7 +3162,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
          !(ar & _SEGMENT_P) ||
          !(ar & _SEGMENT_CODE) )
     {
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_trap(TRAP_gp_fault, regs);
         return;
     }
 
@@ -3187,7 +3175,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
         if ( PTR_ERR(state) == -X86EMUL_EXCEPTION )
             pv_inject_event(&ctxt.ctxt.event);
         else
-            do_guest_trap(TRAP_gp_fault, regs);
+            pv_inject_trap(TRAP_gp_fault, regs);
         return;
     }
 
@@ -3237,7 +3225,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
          (opnd_sel & ~3) != regs->error_code ||
          dpl < (opnd_sel & 3) )
     {
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_trap(TRAP_gp_fault, regs);
         return;
     }
 
@@ -3285,7 +3273,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             /* Inner stack known only for kernel ring. */
             if ( (sel & 3) != GUEST_KERNEL_RPL(v->domain) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_trap(TRAP_gp_fault, regs);
                 return;
             }
             esp = v->arch.pv_vcpu.kernel_sp;
@@ -3309,7 +3297,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
             if ( !compat_access_ok(stkp - 4 - nparm, (4 + nparm) * 4) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_trap(TRAP_gp_fault, regs);
                 return;
             }
             push(regs->ss);
@@ -3324,12 +3312,12 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                      (ar & _SEGMENT_CODE) ||
                      !(ar & _SEGMENT_WR) ||
                      !check_stack_limit(ar, limit, esp + nparm * 4, nparm * 4) )
-                    return do_guest_trap(TRAP_gp_fault, regs);
+                    return pv_inject_trap(TRAP_gp_fault, regs);
                 ustkp = (unsigned int *)(unsigned long)
                         ((unsigned int)base + regs->esp + nparm * 4);
                 if ( !compat_access_ok(ustkp - nparm, nparm * 4) )
                 {
-                    do_guest_trap(TRAP_gp_fault, regs);
+                    pv_inject_trap(TRAP_gp_fault, regs);
                     return;
                 }
                 do
@@ -3355,7 +3343,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             if ( !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
                  ((ar >> 13) & 3) != (sel & 3) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_trap(TRAP_gp_fault, regs);
                 return;
             }
             if ( !check_stack_limit(ar, limit, esp, 2 * 4) )
@@ -3366,7 +3354,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
             if ( !compat_access_ok(stkp - 2, 2 * 4) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_trap(TRAP_gp_fault, regs);
                 return;
             }
         }
@@ -3427,7 +3415,7 @@ void do_general_protection(struct cpu_user_regs *regs)
         if ( permit_softint(TI_GET_DPL(ti), v, regs) )
         {
             regs->rip += 2;
-            do_guest_trap(vector, regs);
+            pv_inject_trap(vector, regs);
             return;
         }
     }
@@ -3446,7 +3434,7 @@ void do_general_protection(struct cpu_user_regs *regs)
     }
 
     /* Pass on GPF as is. */
-    do_guest_trap(TRAP_gp_fault, regs);
+    pv_inject_trap(TRAP_gp_fault, regs);
     return;
 
  gp_in_kernel:
@@ -3666,7 +3654,7 @@ void do_device_not_available(struct cpu_user_regs *regs)
 
     if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
     {
-        do_guest_trap(TRAP_no_device, regs);
+        pv_inject_trap(TRAP_no_device, regs);
         curr->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS;
     }
     else
@@ -3739,7 +3727,7 @@ void do_debug(struct cpu_user_regs *regs)
     v->arch.debugreg[6] = read_debugreg(6);
 
     ler_enable();
-    do_guest_trap(TRAP_debug, regs);
+    pv_inject_trap(TRAP_debug, regs);
     return;
 
  out:
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6ab987f231..355a8afcdb 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -655,6 +655,18 @@ static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
     pv_inject_event(&event);
 }
 
+static inline void pv_inject_trap(unsigned int trapnr,
+                                  const struct cpu_user_regs *regs)
+{
+    const struct x86_event event = {
+        .vector = trapnr,
+        .error_code = (((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)))
+                       ? regs->error_code : X86_EVENT_NO_EC),
+    };
+
+    pv_inject_event(&event);
+}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (2 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-11 10:18   ` Andrew Cooper
  2017-05-05 14:48 ` [PATCH v2 05/18] x86/pv: clean up emulate_ops.c Wei Liu
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Move the following emulation code from traps.c:

1. invalid op
2. rdtsc
3. privilege instructions
4. gate operation
5. pv cpuid emulation

Export the emulate_* functions via pv/traps.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/Makefile       |    1 +
 xen/arch/x86/pv/emulate_ops.c  | 1929 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c           | 1880 +--------------------------------------
 xen/include/asm-x86/pv/traps.h |   54 ++
 4 files changed, 1985 insertions(+), 1879 deletions(-)
 create mode 100644 xen/arch/x86/pv/emulate_ops.c
 create mode 100644 xen/include/asm-x86/pv/traps.h

diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 489a9f59cb..ef3cecc463 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -3,3 +3,4 @@ obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
 obj-y += domain.o
+obj-y += emulate_ops.o
diff --git a/xen/arch/x86/pv/emulate_ops.c b/xen/arch/x86/pv/emulate_ops.c
new file mode 100644
index 0000000000..5f0965e05b
--- /dev/null
+++ b/xen/arch/x86/pv/emulate_ops.c
@@ -0,0 +1,1929 @@
+/******************************************************************************
+ * arch/x86/pv/emulate_ops.c
+ *
+ * PV emulation code
+ *
+ * Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/errno.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/iocap.h>
+#include <xen/spinlock.h>
+#include <xen/trace.h>
+
+#include <asm/apic.h>
+#include <asm/debugreg.h>
+#include <asm/hpet.h>
+#include <asm/hypercall.h>
+#include <asm/mc146818rtc.h>
+#include <asm/p2m.h>
+#include <asm/pv/traps.h>
+#include <asm/shared.h>
+#include <asm/traps.h>
+#include <asm/x86_emulate.h>
+
+#include <xsm/xsm.h>
+
+/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
+void host_to_guest_gpr_switch(struct cpu_user_regs *);
+unsigned long guest_to_host_gpr_switch(unsigned long);
+
+void (*pv_post_outb_hook)(unsigned int port, u8 value);
+
+typedef void io_emul_stub_t(struct cpu_user_regs *);
+
+struct priv_op_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    struct {
+        unsigned long base, limit;
+    } cs;
+    char *io_emul_stub;
+    unsigned int bpmatch;
+    unsigned int tsc;
+#define TSC_BASE 1
+#define TSC_AUX 2
+};
+
+static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
+{
+    regs->rip = rip;
+    regs->eflags &= ~X86_EFLAGS_RF;
+    if ( regs->eflags & X86_EFLAGS_TF )
+    {
+        current->arch.debugreg[6] |= DR_STEP | DR_STATUS_RESERVED_ONE;
+        pv_inject_trap(TRAP_debug, regs);
+    }
+}
+
+static int pv_emul_virt_to_linear(unsigned long base, unsigned long offset,
+                                  unsigned int bytes, unsigned long limit,
+                                  enum x86_segment seg,
+                                  struct x86_emulate_ctxt *ctxt,
+                                  unsigned long *addr)
+{
+    int rc = X86EMUL_OKAY;
+
+    *addr = base + offset;
+
+    if ( ctxt->addr_size < 64 )
+    {
+        if ( limit < bytes - 1 || offset > limit - bytes + 1 )
+            rc = X86EMUL_EXCEPTION;
+        *addr = (uint32_t)*addr;
+    }
+    else if ( !__addr_ok(*addr) )
+        rc = X86EMUL_EXCEPTION;
+
+    if ( unlikely(rc == X86EMUL_EXCEPTION) )
+        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
+                                                : TRAP_stack_error,
+                              0, ctxt);
+
+    return rc;
+}
+
+static int priv_op_insn_fetch(enum x86_segment seg,
+                              unsigned long offset,
+                              void *p_data,
+                              unsigned int bytes,
+                              struct x86_emulate_ctxt *ctxt)
+{
+    const struct priv_op_ctxt *poc =
+        container_of(ctxt, struct priv_op_ctxt, ctxt);
+    unsigned int rc;
+    unsigned long addr = poc->cs.base + offset;
+
+    ASSERT(seg == x86_seg_cs);
+
+    /* We don't mean to emulate any branches. */
+    if ( !bytes )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = pv_emul_virt_to_linear(poc->cs.base, offset, bytes, poc->cs.limit,
+                                x86_seg_cs, ctxt, &addr);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) != 0 )
+    {
+        /*
+         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
+         * cpu_has_nx, but we'd then need a "fetch" variant of
+         * __copy_from_user() respecting NX, SMEP, and protection keys.
+         */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int read_descriptor(unsigned int sel,
+                           const struct vcpu *v,
+                           unsigned long *base,
+                           unsigned long *limit,
+                           unsigned int *ar,
+                           bool_t insn_fetch)
+{
+    struct desc_struct desc;
+
+    if ( sel < 4)
+        desc.b = desc.a = 0;
+    else if ( __get_user(desc,
+                         (const struct desc_struct *)(!(sel & 4)
+                                                      ? GDT_VIRT_START(v)
+                                                      : LDT_VIRT_START(v))
+                         + (sel >> 3)) )
+        return 0;
+    if ( !insn_fetch )
+        desc.b &= ~_SEGMENT_L;
+
+    *ar = desc.b & 0x00f0ff00;
+    if ( !(desc.b & _SEGMENT_L) )
+    {
+        *base = ((desc.a >> 16) + ((desc.b & 0xff) << 16) +
+                 (desc.b & 0xff000000));
+        *limit = (desc.a & 0xffff) | (desc.b & 0x000f0000);
+        if ( desc.b & _SEGMENT_G )
+            *limit = ((*limit + 1) << 12) - 1;
+#ifndef NDEBUG
+        if ( sel > 3 )
+        {
+            unsigned int a, l;
+            unsigned char valid;
+
+            asm volatile (
+                "larl %2,%0 ; setz %1"
+                : "=r" (a), "=qm" (valid) : "rm" (sel));
+            BUG_ON(valid && ((a & 0x00f0ff00) != *ar));
+            asm volatile (
+                "lsll %2,%0 ; setz %1"
+                : "=r" (l), "=qm" (valid) : "rm" (sel));
+            BUG_ON(valid && (l != *limit));
+        }
+#endif
+    }
+    else
+    {
+        *base = 0UL;
+        *limit = ~0UL;
+    }
+
+    return 1;
+}
+
+static int priv_op_read_segment(enum x86_segment seg,
+                                struct segment_register *reg,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    /* Check if this is an attempt to access the I/O bitmap. */
+    if ( seg == x86_seg_tr )
+    {
+        switch ( ctxt->opcode )
+        {
+        case 0x6c ... 0x6f: /* ins / outs */
+        case 0xe4 ... 0xe7: /* in / out (immediate port) */
+        case 0xec ... 0xef: /* in / out (port in %dx) */
+            /* Defer the check to priv_op_{read,write}_io(). */
+            return X86EMUL_DONE;
+        }
+    }
+
+    if ( ctxt->addr_size < 64 )
+    {
+        unsigned long limit;
+        unsigned int sel, ar;
+
+        switch ( seg )
+        {
+        case x86_seg_cs: sel = ctxt->regs->cs; break;
+        case x86_seg_ds: sel = read_sreg(ds);  break;
+        case x86_seg_es: sel = read_sreg(es);  break;
+        case x86_seg_fs: sel = read_sreg(fs);  break;
+        case x86_seg_gs: sel = read_sreg(gs);  break;
+        case x86_seg_ss: sel = ctxt->regs->ss; break;
+        default: return X86EMUL_UNHANDLEABLE;
+        }
+
+        if ( !read_descriptor(sel, current, &reg->base, &limit, &ar, 0) )
+            return X86EMUL_UNHANDLEABLE;
+
+        reg->limit = limit;
+        reg->attr.bytes = ar >> 8;
+    }
+    else
+    {
+        switch ( seg )
+        {
+        default:
+            if ( !is_x86_user_segment(seg) )
+                return X86EMUL_UNHANDLEABLE;
+            reg->base = 0;
+            break;
+        case x86_seg_fs:
+            reg->base = rdfsbase();
+            break;
+        case x86_seg_gs:
+            reg->base = rdgsbase();
+            break;
+        }
+
+        reg->limit = ~0U;
+
+        reg->attr.bytes = 0;
+        reg->attr.fields.type = _SEGMENT_WR >> 8;
+        if ( seg == x86_seg_cs )
+        {
+            reg->attr.fields.type |= _SEGMENT_CODE >> 8;
+            reg->attr.fields.l = 1;
+        }
+        else
+            reg->attr.fields.db = 1;
+        reg->attr.fields.s   = 1;
+        reg->attr.fields.dpl = 3;
+        reg->attr.fields.p   = 1;
+        reg->attr.fields.g   = 1;
+    }
+
+    /*
+     * For x86_emulate.c's mode_ring0() to work, fake a DPL of zero.
+     * Also do this for consistency for non-conforming code segments.
+     */
+    if ( (seg == x86_seg_ss ||
+          (seg == x86_seg_cs &&
+           !(reg->attr.fields.type & (_SEGMENT_EC >> 8)))) &&
+         guest_kernel_mode(current, ctxt->regs) )
+        reg->attr.fields.dpl = 0;
+
+    return X86EMUL_OKAY;
+}
+
+/* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
+static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
+{
+    unsigned int cpl = guest_kernel_mode(v, regs) ?
+        (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
+
+    ASSERT((v->arch.pv_vcpu.iopl & ~X86_EFLAGS_IOPL) == 0);
+
+    return IOPL(cpl) <= v->arch.pv_vcpu.iopl;
+}
+
+static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
+                                          unsigned int port, unsigned int bytes)
+{
+    if ( !ctxt->io_emul_stub )
+        ctxt->io_emul_stub = map_domain_page(_mfn(this_cpu(stubs.mfn))) +
+                                             (this_cpu(stubs.addr) &
+                                              ~PAGE_MASK) +
+                                             STUB_BUF_SIZE / 2;
+
+    /* movq $host_to_guest_gpr_switch,%rcx */
+    ctxt->io_emul_stub[0] = 0x48;
+    ctxt->io_emul_stub[1] = 0xb9;
+    *(void **)&ctxt->io_emul_stub[2] = (void *)host_to_guest_gpr_switch;
+    /* callq *%rcx */
+    ctxt->io_emul_stub[10] = 0xff;
+    ctxt->io_emul_stub[11] = 0xd1;
+    /* data16 or nop */
+    ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
+    /* <io-access opcode> */
+    ctxt->io_emul_stub[13] = opcode;
+    /* imm8 or nop */
+    ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
+    /* ret (jumps to guest_to_host_gpr_switch) */
+    ctxt->io_emul_stub[15] = 0xc3;
+    BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
+
+    if ( ioemul_handle_quirk )
+        ioemul_handle_quirk(opcode, &ctxt->io_emul_stub[12], ctxt->ctxt.regs);
+
+    /* Handy function-typed pointer to the stub. */
+    return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
+}
+
+/* Has the guest requested sufficient permission for this I/O access? */
+static int guest_io_okay(
+    unsigned int port, unsigned int bytes,
+    struct vcpu *v, struct cpu_user_regs *regs)
+{
+    /* If in user mode, switch to kernel mode just to read I/O bitmap. */
+    int user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+    if ( iopl_ok(v, regs) )
+        return 1;
+
+    if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
+    {
+        union { uint8_t bytes[2]; uint16_t mask; } x;
+
+        /*
+         * Grab permission bytes from guest space. Inaccessible bytes are
+         * read as 0xff (no access allowed).
+         */
+        TOGGLE_MODE();
+        switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
+                                          port>>3, 2) )
+        {
+        default: x.bytes[0] = ~0;
+            /* fallthrough */
+        case 1:  x.bytes[1] = ~0;
+            /* fallthrough */
+        case 0:  break;
+        }
+        TOGGLE_MODE();
+
+        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
+            return 1;
+    }
+
+    return 0;
+}
+
+static unsigned int check_guest_io_breakpoint(struct vcpu *v,
+    unsigned int port, unsigned int len)
+{
+    unsigned int width, i, match = 0;
+    unsigned long start;
+
+    if ( !(v->arch.debugreg[5]) ||
+         !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
+        return 0;
+
+    for ( i = 0; i < 4; i++ )
+    {
+        if ( !(v->arch.debugreg[5] &
+               (3 << (i * DR_ENABLE_SIZE))) )
+            continue;
+
+        start = v->arch.debugreg[i];
+        width = 0;
+
+        switch ( (v->arch.debugreg[7] >>
+                  (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
+        {
+        case DR_LEN_1: width = 1; break;
+        case DR_LEN_2: width = 2; break;
+        case DR_LEN_4: width = 4; break;
+        case DR_LEN_8: width = 8; break;
+        }
+
+        if ( (start < (port + len)) && ((start + width) > port) )
+            match |= 1 << i;
+    }
+
+    return match;
+}
+
+/* Has the administrator granted sufficient permission for this I/O access? */
+static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
+                            const struct domain *d)
+{
+    /*
+     * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
+     * We never permit direct access to that register.
+     */
+    if ( (port == 0xcf8) && (bytes == 4) )
+        return 0;
+
+    /* We also never permit direct access to the RTC/CMOS registers. */
+    if ( ((port & ~1) == RTC_PORT(0)) )
+        return 0;
+
+    return ioports_access_permitted(d, port, port + bytes - 1);
+}
+
+static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
+                         unsigned int size, uint32_t *write)
+{
+    uint32_t machine_bdf;
+
+    if ( !is_hardware_domain(currd) )
+        return 0;
+
+    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
+        return 1;
+
+    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
+    if ( write )
+    {
+        const unsigned long *ro_map = pci_get_ro_map(0);
+
+        if ( ro_map && test_bit(machine_bdf, ro_map) )
+            return 0;
+    }
+    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
+    /* AMD extended configuration space access? */
+    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
+         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
+    {
+        uint64_t msr_val;
+
+        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
+            return 0;
+        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
+            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
+    }
+
+    return !write ?
+           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
+                                     start, start + size - 1, 0) == 0 :
+           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
+}
+
+uint32_t guest_io_read(unsigned int port, unsigned int bytes,
+                       struct domain *currd)
+{
+    uint32_t data = 0;
+    unsigned int shift = 0;
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        switch ( bytes )
+        {
+        case 1: return inb(port);
+        case 2: return inw(port);
+        case 4: return inl(port);
+        }
+    }
+
+    while ( bytes != 0 )
+    {
+        unsigned int size = 1;
+        uint32_t sub_data = ~0;
+
+        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
+        {
+            sub_data = pv_pit_handler(port, 0, 0);
+        }
+        else if ( port == RTC_PORT(0) )
+        {
+            sub_data = currd->arch.cmos_idx;
+        }
+        else if ( (port == RTC_PORT(1)) &&
+                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
+        {
+            unsigned long flags;
+
+            spin_lock_irqsave(&rtc_lock, flags);
+            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+            sub_data = inb(RTC_PORT(1));
+            spin_unlock_irqrestore(&rtc_lock, flags);
+        }
+        else if ( (port == 0xcf8) && (bytes == 4) )
+        {
+            size = 4;
+            sub_data = currd->arch.pci_cf8;
+        }
+        else if ( (port & 0xfffc) == 0xcfc )
+        {
+            size = min(bytes, 4 - (port & 3));
+            if ( size == 3 )
+                size = 2;
+            if ( pci_cfg_ok(currd, port & 3, size, NULL) )
+                sub_data = pci_conf_read(currd->arch.pci_cf8, port & 3, size);
+        }
+
+        if ( size == 4 )
+            return sub_data;
+
+        data |= (sub_data & ((1u << (size * 8)) - 1)) << shift;
+        shift += size * 8;
+        port += size;
+        bytes -= size;
+    }
+
+    return data;
+}
+
+void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
+                    struct domain *currd)
+{
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        switch ( bytes ) {
+        case 1:
+            outb((uint8_t)data, port);
+            if ( pv_post_outb_hook )
+                pv_post_outb_hook(port, (uint8_t)data);
+            break;
+        case 2:
+            outw((uint16_t)data, port);
+            break;
+        case 4:
+            outl(data, port);
+            break;
+        }
+        return;
+    }
+
+    while ( bytes != 0 )
+    {
+        unsigned int size = 1;
+
+        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
+        {
+            pv_pit_handler(port, (uint8_t)data, 1);
+        }
+        else if ( port == RTC_PORT(0) )
+        {
+            currd->arch.cmos_idx = data;
+        }
+        else if ( (port == RTC_PORT(1)) &&
+                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
+        {
+            unsigned long flags;
+
+            if ( pv_rtc_handler )
+                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
+            spin_lock_irqsave(&rtc_lock, flags);
+            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+            outb(data, RTC_PORT(1));
+            spin_unlock_irqrestore(&rtc_lock, flags);
+        }
+        else if ( (port == 0xcf8) && (bytes == 4) )
+        {
+            size = 4;
+            currd->arch.pci_cf8 = data;
+        }
+        else if ( (port & 0xfffc) == 0xcfc )
+        {
+            size = min(bytes, 4 - (port & 3));
+            if ( size == 3 )
+                size = 2;
+            if ( pci_cfg_ok(currd, port & 3, size, &data) )
+                pci_conf_write(currd->arch.pci_cf8, port & 3, size, data);
+        }
+
+        if ( size == 4 )
+            return;
+
+        port += size;
+        bytes -= size;
+        data >>= size * 8;
+    }
+}
+
+static int priv_op_read_io(unsigned int port, unsigned int bytes,
+                           unsigned long *val, struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+
+    /* INS must not come here. */
+    ASSERT((ctxt->opcode & ~9) == 0xe4);
+
+    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        io_emul_stub_t *io_emul =
+            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
+
+        mark_regs_dirty(ctxt->regs);
+        io_emul(ctxt->regs);
+        return X86EMUL_DONE;
+    }
+
+    *val = guest_io_read(port, bytes, currd);
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_write_io(unsigned int port, unsigned int bytes,
+                            unsigned long val, struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+
+    /* OUTS must not come here. */
+    ASSERT((ctxt->opcode & ~9) == 0xe6);
+
+    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        io_emul_stub_t *io_emul =
+            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
+
+        mark_regs_dirty(ctxt->regs);
+        io_emul(ctxt->regs);
+        if ( (bytes == 1) && pv_post_outb_hook )
+            pv_post_outb_hook(port, val);
+        return X86EMUL_DONE;
+    }
+
+    guest_io_write(port, bytes, val, currd);
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_rep_ins(uint16_t port,
+                           enum x86_segment seg, unsigned long offset,
+                           unsigned int bytes_per_rep, unsigned long *reps,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+    unsigned long goal = *reps;
+    struct segment_register sreg;
+    int rc;
+
+    ASSERT(seg == x86_seg_es);
+
+    *reps = 0;
+
+    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = priv_op_read_segment(x86_seg_es, &sreg, ctxt);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( !sreg.attr.fields.p )
+        return X86EMUL_UNHANDLEABLE;
+    if ( !sreg.attr.fields.s ||
+         (sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) ||
+         !(sreg.attr.fields.type & (_SEGMENT_WR >> 8)) )
+    {
+        x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+
+    while ( *reps < goal )
+    {
+        unsigned int data = guest_io_read(port, bytes_per_rep, currd);
+        unsigned long addr;
+
+        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
+                                    sreg.limit, x86_seg_es, ctxt, &addr);
+        if ( rc != X86EMUL_OKAY )
+            return rc;
+
+        if ( (rc = __copy_to_user((void *)addr, &data, bytes_per_rep)) != 0 )
+        {
+            x86_emul_pagefault(PFEC_write_access,
+                               addr + bytes_per_rep - rc, ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+
+        ++*reps;
+
+        if ( poc->bpmatch || hypercall_preempt_check() )
+            break;
+
+        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
+        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
+            offset -= bytes_per_rep;
+        else
+            offset += bytes_per_rep;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_rep_outs(enum x86_segment seg, unsigned long offset,
+                            uint16_t port,
+                            unsigned int bytes_per_rep, unsigned long *reps,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+    unsigned long goal = *reps;
+    struct segment_register sreg;
+    int rc;
+
+    *reps = 0;
+
+    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = priv_op_read_segment(seg, &sreg, ctxt);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( !sreg.attr.fields.p )
+        return X86EMUL_UNHANDLEABLE;
+    if ( !sreg.attr.fields.s ||
+         ((sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) &&
+          !(sreg.attr.fields.type & (_SEGMENT_WR >> 8))) )
+    {
+        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
+                                                : TRAP_stack_error,
+                              0, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+
+    while ( *reps < goal )
+    {
+        unsigned int data = 0;
+        unsigned long addr;
+
+        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
+                                    sreg.limit, seg, ctxt, &addr);
+        if ( rc != X86EMUL_OKAY )
+            return rc;
+
+        if ( (rc = __copy_from_user(&data, (void *)addr, bytes_per_rep)) != 0 )
+        {
+            x86_emul_pagefault(0, addr + bytes_per_rep - rc, ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+
+        guest_io_write(port, bytes_per_rep, data, currd);
+
+        ++*reps;
+
+        if ( poc->bpmatch || hypercall_preempt_check() )
+            break;
+
+        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
+        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
+            offset -= bytes_per_rep;
+        else
+            offset += bytes_per_rep;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_read_cr(unsigned int reg, unsigned long *val,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    const struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0: /* Read CR0 */
+        *val = (read_cr0() & ~X86_CR0_TS) | curr->arch.pv_vcpu.ctrlreg[0];
+        return X86EMUL_OKAY;
+
+    case 2: /* Read CR2 */
+    case 4: /* Read CR4 */
+        *val = curr->arch.pv_vcpu.ctrlreg[reg];
+        return X86EMUL_OKAY;
+
+    case 3: /* Read CR3 */
+    {
+        const struct domain *currd = curr->domain;
+        unsigned long mfn;
+
+        if ( !is_pv_32bit_domain(currd) )
+        {
+            mfn = pagetable_get_pfn(curr->arch.guest_table);
+            *val = xen_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
+        }
+        else
+        {
+            l4_pgentry_t *pl4e =
+                map_domain_page(_mfn(pagetable_get_pfn(curr->arch.guest_table)));
+
+            mfn = l4e_get_pfn(*pl4e);
+            unmap_domain_page(pl4e);
+            *val = compat_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
+        }
+        /* PTs should not be shared */
+        BUG_ON(page_get_owner(mfn_to_page(mfn)) == dom_cow);
+        return X86EMUL_OKAY;
+    }
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_write_cr(unsigned int reg, unsigned long val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0: /* Write CR0 */
+        if ( (val ^ read_cr0()) & ~X86_CR0_TS )
+        {
+            gdprintk(XENLOG_WARNING,
+                    "Attempt to change unmodifiable CR0 flags\n");
+            break;
+        }
+        do_fpu_taskswitch(!!(val & X86_CR0_TS));
+        return X86EMUL_OKAY;
+
+    case 2: /* Write CR2 */
+        curr->arch.pv_vcpu.ctrlreg[2] = val;
+        arch_set_cr2(curr, val);
+        return X86EMUL_OKAY;
+
+    case 3: /* Write CR3 */
+    {
+        struct domain *currd = curr->domain;
+        unsigned long gfn;
+        struct page_info *page;
+        int rc;
+
+        gfn = !is_pv_32bit_domain(currd)
+              ? xen_cr3_to_pfn(val) : compat_cr3_to_pfn(val);
+        page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
+        if ( !page )
+            break;
+        rc = new_guest_cr3(page_to_mfn(page));
+        put_page(page);
+
+        switch ( rc )
+        {
+        case 0:
+            return X86EMUL_OKAY;
+        case -ERESTART: /* retry after preemption */
+            return X86EMUL_RETRY;
+        }
+        break;
+    }
+
+    case 4: /* Write CR4 */
+        curr->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(curr, val);
+        write_cr4(pv_guest_cr4_to_real_cr4(curr));
+        ctxt_switch_levelling(curr);
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_read_dr(unsigned int reg, unsigned long *val,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    unsigned long res = do_get_debugreg(reg);
+
+    if ( IS_ERR_VALUE(res) )
+        return X86EMUL_UNHANDLEABLE;
+
+    *val = res;
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_write_dr(unsigned int reg, unsigned long val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    return do_set_debugreg(reg, val) == 0
+           ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
+}
+
+static inline uint64_t guest_misc_enable(uint64_t val)
+{
+    val &= ~(MSR_IA32_MISC_ENABLE_PERF_AVAIL |
+             MSR_IA32_MISC_ENABLE_MONITOR_ENABLE);
+    val |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
+           MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
+           MSR_IA32_MISC_ENABLE_XTPR_DISABLE;
+    return val;
+}
+
+static inline bool is_cpufreq_controller(const struct domain *d)
+{
+    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
+            is_hardware_domain(d));
+}
+
+static int priv_op_read_msr(unsigned int reg, uint64_t *val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    const struct vcpu *curr = current;
+    const struct domain *currd = curr->domain;
+    bool vpmu_msr = false;
+
+    switch ( reg )
+    {
+        int rc;
+
+    case MSR_FS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = cpu_has_fsgsbase ? __rdfsbase() : curr->arch.pv_vcpu.fs_base;
+        return X86EMUL_OKAY;
+
+    case MSR_GS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = cpu_has_fsgsbase ? __rdgsbase()
+                                : curr->arch.pv_vcpu.gs_base_kernel;
+        return X86EMUL_OKAY;
+
+    case MSR_SHADOW_GS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = curr->arch.pv_vcpu.gs_base_user;
+        return X86EMUL_OKAY;
+
+    /*
+     * In order to fully retain original behavior, defer calling
+     * pv_soft_rdtsc() until after emulation. This may want/need to be
+     * reconsidered.
+     */
+    case MSR_IA32_TSC:
+        poc->tsc |= TSC_BASE;
+        goto normal;
+
+    case MSR_TSC_AUX:
+        poc->tsc |= TSC_AUX;
+        if ( cpu_has_rdtscp )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_EFER:
+        *val = read_efer();
+        if ( is_pv_32bit_domain(currd) )
+            *val &= ~(EFER_LME | EFER_LMA | EFER_LMSLE);
+        return X86EMUL_OKAY;
+
+    case MSR_K7_FID_VID_CTL:
+    case MSR_K7_FID_VID_STATUS:
+    case MSR_K8_PSTATE_LIMIT:
+    case MSR_K8_PSTATE_CTRL:
+    case MSR_K8_PSTATE_STATUS:
+    case MSR_K8_PSTATE0:
+    case MSR_K8_PSTATE1:
+    case MSR_K8_PSTATE2:
+    case MSR_K8_PSTATE3:
+    case MSR_K8_PSTATE4:
+    case MSR_K8_PSTATE5:
+    case MSR_K8_PSTATE6:
+    case MSR_K8_PSTATE7:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+            break;
+        if ( unlikely(is_cpufreq_controller(currd)) )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_UCODE_REV:
+        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
+                break;
+            /* As documented in the SDM: Do a CPUID 1 here */
+            cpuid_eax(1);
+        }
+        goto normal;
+
+    case MSR_IA32_MISC_ENABLE:
+        if ( rdmsr_safe(reg, *val) )
+            break;
+        *val = guest_misc_enable(*val);
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR0_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
+            break;
+        *val = curr->arch.pv_vcpu.dr_mask[0];
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
+            break;
+        *val = curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1];
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_PERF_CAPABILITIES:
+        /* No extra capabilities are supported. */
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_PLATFORM_INFO:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
+            break;
+        *val = 0;
+        if ( this_cpu(cpuid_faulting_enabled) )
+            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
+            break;
+        *val = 0;
+        if ( curr->arch.cpuid_faulting )
+            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            vpmu_msr = true;
+            /* fall through */
+    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( vpmu_do_rdmsr(reg, val) )
+                    break;
+                return X86EMUL_OKAY;
+            }
+        }
+        /* fall through */
+    default:
+        if ( rdmsr_hypervisor_regs(reg, val) )
+            return X86EMUL_OKAY;
+
+        rc = vmce_rdmsr(reg, val);
+        if ( rc < 0 )
+            break;
+        if ( rc )
+            return X86EMUL_OKAY;
+        /* fall through */
+    normal:
+        /* Everyone can read the MSR space. */
+        /* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
+        if ( rdmsr_safe(reg, *val) )
+            break;
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+#include "../x86_64/mmconfig.h"
+
+static int priv_op_write_msr(unsigned int reg, uint64_t val,
+                             struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+    const struct domain *currd = curr->domain;
+    bool vpmu_msr = false;
+
+    switch ( reg )
+    {
+        uint64_t temp;
+        int rc;
+
+    case MSR_FS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrfsbase(val);
+        curr->arch.pv_vcpu.fs_base = val;
+        return X86EMUL_OKAY;
+
+    case MSR_GS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrgsbase(val);
+        curr->arch.pv_vcpu.gs_base_kernel = val;
+        return X86EMUL_OKAY;
+
+    case MSR_SHADOW_GS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrmsrl(MSR_SHADOW_GS_BASE, val);
+        curr->arch.pv_vcpu.gs_base_user = val;
+        return X86EMUL_OKAY;
+
+    case MSR_K7_FID_VID_STATUS:
+    case MSR_K7_FID_VID_CTL:
+    case MSR_K8_PSTATE_LIMIT:
+    case MSR_K8_PSTATE_CTRL:
+    case MSR_K8_PSTATE_STATUS:
+    case MSR_K8_PSTATE0:
+    case MSR_K8_PSTATE1:
+    case MSR_K8_PSTATE2:
+    case MSR_K8_PSTATE3:
+    case MSR_K8_PSTATE4:
+    case MSR_K8_PSTATE5:
+    case MSR_K8_PSTATE6:
+    case MSR_K8_PSTATE7:
+    case MSR_K8_HWCR:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_AMD64_NB_CFG:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
+             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
+             ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
+            goto invalid;
+        if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_FAM10H_MMIO_CONF_BASE:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
+             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( rdmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, temp) != 0 )
+            break;
+        if ( (pci_probe & PCI_PROBE_MASK) == PCI_PROBE_MMCONF ?
+             temp != val :
+             ((temp ^ val) &
+              ~(FAM10H_MMIO_CONF_ENABLE |
+                (FAM10H_MMIO_CONF_BUSRANGE_MASK <<
+                 FAM10H_MMIO_CONF_BUSRANGE_SHIFT) |
+                ((u64)FAM10H_MMIO_CONF_BASE_MASK <<
+                 FAM10H_MMIO_CONF_BASE_SHIFT))) )
+            goto invalid;
+        if ( wrmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_UCODE_REV:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( rdmsr_safe(reg, temp) )
+            break;
+        if ( val )
+            goto invalid;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_MISC_ENABLE:
+        if ( rdmsr_safe(reg, temp) )
+            break;
+        if ( val != guest_misc_enable(temp) )
+            goto invalid;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_MPERF:
+    case MSR_IA32_APERF:
+        if ( (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) &&
+             (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_PERF_CTL:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_THERM_CONTROL:
+    case MSR_IA32_ENERGY_PERF_BIAS:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_AMD64_DR0_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
+            break;
+        curr->arch.pv_vcpu.dr_mask[0] = val;
+        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
+            wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, val);
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
+            break;
+        curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1] = val;
+        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
+            wrmsrl(reg, val);
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_PLATFORM_INFO:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
+            break;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
+            break;
+        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
+             !this_cpu(cpuid_faulting_enabled) )
+            break;
+        curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
+        return X86EMUL_OKAY;
+
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            vpmu_msr = true;
+    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                     !is_hardware_domain(currd) )
+                    return X86EMUL_OKAY;
+
+                if ( vpmu_do_wrmsr(reg, val, 0) )
+                    break;
+                return X86EMUL_OKAY;
+            }
+        }
+        /* fall through */
+    default:
+        if ( wrmsr_hypervisor_regs(reg, val) == 1 )
+            return X86EMUL_OKAY;
+
+        rc = vmce_wrmsr(reg, val);
+        if ( rc < 0 )
+            break;
+        if ( rc )
+            return X86EMUL_OKAY;
+
+        if ( (rdmsr_safe(reg, temp) != 0) || (val != temp) )
+    invalid:
+            gdprintk(XENLOG_WARNING,
+                     "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 0x%016"PRIx64"\n",
+                     reg, temp, val);
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_wbinvd(struct x86_emulate_ctxt *ctxt)
+{
+    /* Ignore the instruction if unprivileged. */
+    if ( !cache_flush_permitted(current->domain) )
+        /*
+         * Non-physdev domain attempted WBINVD; ignore for now since
+         * newer linux uses this in some start-of-day timing loops.
+         */
+        ;
+    else
+        wbinvd();
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_validate(const struct x86_emulate_state *state,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    switch ( ctxt->opcode )
+    {
+    case 0x6c ... 0x6f: /* ins / outs */
+    case 0xe4 ... 0xe7: /* in / out (immediate port) */
+    case 0xec ... 0xef: /* in / out (port in %dx) */
+    case X86EMUL_OPC(0x0f, 0x06): /* clts */
+    case X86EMUL_OPC(0x0f, 0x09): /* wbinvd */
+    case X86EMUL_OPC(0x0f, 0x20) ...
+         X86EMUL_OPC(0x0f, 0x23): /* mov to/from cr/dr */
+    case X86EMUL_OPC(0x0f, 0x30): /* wrmsr */
+    case X86EMUL_OPC(0x0f, 0x31): /* rdtsc */
+    case X86EMUL_OPC(0x0f, 0x32): /* rdmsr */
+    case X86EMUL_OPC(0x0f, 0xa2): /* cpuid */
+        return X86EMUL_OKAY;
+
+    case 0xfa: case 0xfb: /* cli / sti */
+        if ( !iopl_ok(current, ctxt->regs) )
+            break;
+        /*
+         * This is just too dangerous to allow, in my opinion. Consider if the
+         * caller then tries to reenable interrupts using POPF: we can't trap
+         * that and we'll end up with hard-to-debug lockups. Fast & loose will
+         * do for us. :-)
+        vcpu_info(current, evtchn_upcall_mask) = (ctxt->opcode == 0xfa);
+         */
+        return X86EMUL_DONE;
+
+    case X86EMUL_OPC(0x0f, 0x01):
+    {
+        unsigned int modrm_rm, modrm_reg;
+
+        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
+             (modrm_rm & 7) != 1 )
+            break;
+        switch ( modrm_reg & 7 )
+        {
+        case 2: /* xsetbv */
+        case 7: /* rdtscp */
+            return X86EMUL_OKAY;
+        }
+        break;
+    }
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static const struct x86_emulate_ops priv_op_ops = {
+    .insn_fetch          = priv_op_insn_fetch,
+    .read                = x86emul_unhandleable_rw,
+    .validate            = priv_op_validate,
+    .read_io             = priv_op_read_io,
+    .write_io            = priv_op_write_io,
+    .rep_ins             = priv_op_rep_ins,
+    .rep_outs            = priv_op_rep_outs,
+    .read_segment        = priv_op_read_segment,
+    .read_cr             = priv_op_read_cr,
+    .write_cr            = priv_op_write_cr,
+    .read_dr             = priv_op_read_dr,
+    .write_dr            = priv_op_write_dr,
+    .read_msr            = priv_op_read_msr,
+    .write_msr           = priv_op_write_msr,
+    .cpuid               = pv_emul_cpuid,
+    .wbinvd              = priv_op_wbinvd,
+};
+
+int emulate_privileged_op(struct cpu_user_regs *regs)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct priv_op_ctxt ctxt = {
+        .ctxt.regs = regs,
+        .ctxt.vendor = currd->arch.cpuid->x86_vendor,
+        .ctxt.lma = !is_pv_32bit_domain(currd),
+    };
+    int rc;
+    unsigned int eflags, ar;
+
+    if ( !read_descriptor(regs->cs, curr, &ctxt.cs.base, &ctxt.cs.limit,
+                          &ar, 1) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_P) ||
+         !(ar & _SEGMENT_CODE) )
+        return 0;
+
+    /* Mirror virtualized state into EFLAGS. */
+    ASSERT(regs->eflags & X86_EFLAGS_IF);
+    if ( vcpu_info(curr, evtchn_upcall_mask) )
+        regs->eflags &= ~X86_EFLAGS_IF;
+    else
+        regs->eflags |= X86_EFLAGS_IF;
+    ASSERT(!(regs->eflags & X86_EFLAGS_IOPL));
+    regs->eflags |= curr->arch.pv_vcpu.iopl;
+    eflags = regs->eflags;
+
+    ctxt.ctxt.addr_size = ar & _SEGMENT_L ? 64 : ar & _SEGMENT_DB ? 32 : 16;
+    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed. */
+    rc = x86_emulate(&ctxt.ctxt, &priv_op_ops);
+
+    if ( ctxt.io_emul_stub )
+        unmap_domain_page(ctxt.io_emul_stub);
+
+    /*
+     * Un-mirror virtualized state from EFLAGS.
+     * Nothing we allow to be emulated can change anything other than the
+     * arithmetic bits, and the resume flag.
+     */
+    ASSERT(!((regs->eflags ^ eflags) &
+             ~(X86_EFLAGS_RF | X86_EFLAGS_ARITH_MASK)));
+    regs->eflags |= X86_EFLAGS_IF;
+    regs->eflags &= ~X86_EFLAGS_IOPL;
+
+    switch ( rc )
+    {
+    case X86EMUL_OKAY:
+        if ( ctxt.tsc & TSC_BASE )
+        {
+            if ( ctxt.tsc & TSC_AUX )
+                pv_soft_rdtsc(curr, regs, 1);
+            else if ( currd->arch.vtsc )
+                pv_soft_rdtsc(curr, regs, 0);
+            else
+                msr_split(regs, rdtsc());
+        }
+
+        if ( ctxt.ctxt.retire.singlestep )
+            ctxt.bpmatch |= DR_STEP;
+        if ( ctxt.bpmatch )
+        {
+            curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
+            if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
+                pv_inject_trap(TRAP_debug, regs);
+        }
+        /* fall through */
+    case X86EMUL_RETRY:
+        return EXCRET_fault_fixed;
+
+    case X86EMUL_EXCEPTION:
+        pv_inject_event(&ctxt.ctxt.event);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+struct gate_op_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    struct {
+        unsigned long base, limit;
+    } cs;
+    bool insn_fetch;
+};
+
+static int gate_op_read(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    const struct gate_op_ctxt *goc =
+        container_of(ctxt, struct gate_op_ctxt, ctxt);
+    unsigned int rc = bytes, sel = 0;
+    unsigned long addr = offset, limit = 0;
+
+    switch ( seg )
+    {
+    case x86_seg_cs:
+        addr += goc->cs.base;
+        limit = goc->cs.limit;
+        break;
+    case x86_seg_ds:
+        sel = read_sreg(ds);
+        break;
+    case x86_seg_es:
+        sel = read_sreg(es);
+        break;
+    case x86_seg_fs:
+        sel = read_sreg(fs);
+        break;
+    case x86_seg_gs:
+        sel = read_sreg(gs);
+        break;
+    case x86_seg_ss:
+        sel = ctxt->regs->ss;
+        break;
+    default:
+        return X86EMUL_UNHANDLEABLE;
+    }
+    if ( sel )
+    {
+        unsigned int ar;
+
+        ASSERT(!goc->insn_fetch);
+        if ( !read_descriptor(sel, current, &addr, &limit, &ar, 0) ||
+             !(ar & _SEGMENT_S) ||
+             !(ar & _SEGMENT_P) ||
+             ((ar & _SEGMENT_CODE) && !(ar & _SEGMENT_WR)) )
+            return X86EMUL_UNHANDLEABLE;
+        addr += offset;
+    }
+    else if ( seg != x86_seg_cs )
+        return X86EMUL_UNHANDLEABLE;
+
+    /* We don't mean to emulate any branches. */
+    if ( limit < bytes - 1 || offset > limit - bytes + 1 )
+        return X86EMUL_UNHANDLEABLE;
+
+    addr = (uint32_t)addr;
+
+    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        /*
+         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
+         * cpu_has_nx, but we'd then need a "fetch" variant of
+         * __copy_from_user() respecting NX, SMEP, and protection keys.
+         */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int read_gate_descriptor(unsigned int gate_sel,
+                                const struct vcpu *v,
+                                unsigned int *sel,
+                                unsigned long *off,
+                                unsigned int *ar)
+{
+    struct desc_struct desc;
+    const struct desc_struct *pdesc;
+
+
+    pdesc = (const struct desc_struct *)
+        (!(gate_sel & 4) ? GDT_VIRT_START(v) : LDT_VIRT_START(v))
+        + (gate_sel >> 3);
+    if ( (gate_sel < 4) ||
+         ((gate_sel >= FIRST_RESERVED_GDT_BYTE) && !(gate_sel & 4)) ||
+         __get_user(desc, pdesc) )
+        return 0;
+
+    *sel = (desc.a >> 16) & 0x0000fffc;
+    *off = (desc.a & 0x0000ffff) | (desc.b & 0xffff0000);
+    *ar = desc.b & 0x0000ffff;
+
+    /*
+     * check_descriptor() clears the DPL field and stores the
+     * guest requested DPL in the selector's RPL field.
+     */
+    if ( *ar & _SEGMENT_DPL )
+        return 0;
+    *ar |= (desc.a >> (16 - 13)) & _SEGMENT_DPL;
+
+    if ( !is_pv_32bit_vcpu(v) )
+    {
+        if ( (*ar & 0x1f00) != 0x0c00 ||
+             (gate_sel >= FIRST_RESERVED_GDT_BYTE - 8 && !(gate_sel & 4)) ||
+             __get_user(desc, pdesc + 1) ||
+             (desc.b & 0x1f00) )
+            return 0;
+
+        *off |= (unsigned long)desc.a << 32;
+        return 1;
+    }
+
+    switch ( *ar & 0x1f00 )
+    {
+    case 0x0400:
+        *off &= 0xffff;
+        break;
+    case 0x0c00:
+        break;
+    default:
+        return 0;
+    }
+
+    return 1;
+}
+
+static inline int check_stack_limit(unsigned int ar, unsigned int limit,
+                                    unsigned int esp, unsigned int decr)
+{
+    return (((esp - decr) < (esp - 1)) &&
+            (!(ar & _SEGMENT_EC) ? (esp - 1) <= limit : (esp - decr) > limit));
+}
+
+void emulate_gate_op(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    unsigned int sel, ar, dpl, nparm, insn_len;
+    struct gate_op_ctxt ctxt = { .ctxt.regs = regs, .insn_fetch = true };
+    struct x86_emulate_state *state;
+    unsigned long off, base, limit;
+    uint16_t opnd_sel = 0;
+    int jump = -1, rc = X86EMUL_OKAY;
+
+    /* Check whether this fault is due to the use of a call gate. */
+    if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) ||
+         (((ar >> 13) & 3) < (regs->cs & 3)) ||
+         ((ar & _SEGMENT_TYPE) != 0xc00) )
+    {
+        pv_inject_trap(TRAP_gp_fault, regs);
+        return;
+    }
+    if ( !(ar & _SEGMENT_P) )
+    {
+        pv_inject_trap(TRAP_no_segment, regs);
+        return;
+    }
+    dpl = (ar >> 13) & 3;
+    nparm = ar & 0x1f;
+
+    /*
+     * Decode instruction (and perhaps operand) to determine RPL,
+     * whether this is a jump or a call, and the call return offset.
+     */
+    if ( !read_descriptor(regs->cs, v, &ctxt.cs.base, &ctxt.cs.limit,
+                          &ar, 0) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_P) ||
+         !(ar & _SEGMENT_CODE) )
+    {
+        pv_inject_trap(TRAP_gp_fault, regs);
+        return;
+    }
+
+    ctxt.ctxt.addr_size = ar & _SEGMENT_DB ? 32 : 16;
+    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed for decoding. */
+    state = x86_decode_insn(&ctxt.ctxt, gate_op_read);
+    ctxt.insn_fetch = false;
+    if ( IS_ERR_OR_NULL(state) )
+    {
+        if ( PTR_ERR(state) == -X86EMUL_EXCEPTION )
+            pv_inject_event(&ctxt.ctxt.event);
+        else
+            pv_inject_trap(TRAP_gp_fault, regs);
+        return;
+    }
+
+    switch ( ctxt.ctxt.opcode )
+    {
+        unsigned int modrm_345;
+
+    case 0xea:
+        ++jump;
+        /* fall through */
+    case 0x9a:
+        ++jump;
+        opnd_sel = x86_insn_immediate(state, 1);
+        break;
+    case 0xff:
+        if ( x86_insn_modrm(state, NULL, &modrm_345) >= 3 )
+            break;
+        switch ( modrm_345 & 7 )
+        {
+            enum x86_segment seg;
+
+        case 5:
+            ++jump;
+            /* fall through */
+        case 3:
+            ++jump;
+            base = x86_insn_operand_ea(state, &seg);
+            rc = gate_op_read(seg,
+                              base + (x86_insn_opsize(state) >> 3),
+                              &opnd_sel, sizeof(opnd_sel), &ctxt.ctxt);
+            break;
+        }
+        break;
+    }
+
+    insn_len = x86_insn_length(state, &ctxt.ctxt);
+    x86_emulate_free_state(state);
+
+    if ( rc == X86EMUL_EXCEPTION )
+    {
+        pv_inject_event(&ctxt.ctxt.event);
+        return;
+    }
+
+    if ( rc != X86EMUL_OKAY ||
+         jump < 0 ||
+         (opnd_sel & ~3) != regs->error_code ||
+         dpl < (opnd_sel & 3) )
+    {
+        pv_inject_trap(TRAP_gp_fault, regs);
+        return;
+    }
+
+    if ( !read_descriptor(sel, v, &base, &limit, &ar, 0) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_CODE) ||
+         (!jump || (ar & _SEGMENT_EC) ?
+          ((ar >> 13) & 3) > (regs->cs & 3) :
+          ((ar >> 13) & 3) != (regs->cs & 3)) )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, sel);
+        return;
+    }
+    if ( !(ar & _SEGMENT_P) )
+    {
+        pv_inject_hw_exception(TRAP_no_segment, sel);
+        return;
+    }
+    if ( off > limit )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, 0);
+        return;
+    }
+
+    if ( !jump )
+    {
+        unsigned int ss, esp, *stkp;
+        int rc;
+#define push(item) do \
+        { \
+            --stkp; \
+            esp -= 4; \
+            rc = __put_user(item, stkp); \
+            if ( rc ) \
+            { \
+                pv_inject_page_fault(PFEC_write_access, \
+                                     (unsigned long)(stkp + 1) - rc); \
+                return; \
+            } \
+        } while ( 0 )
+
+        if ( ((ar >> 13) & 3) < (regs->cs & 3) )
+        {
+            sel |= (ar >> 13) & 3;
+            /* Inner stack known only for kernel ring. */
+            if ( (sel & 3) != GUEST_KERNEL_RPL(v->domain) )
+            {
+                pv_inject_trap(TRAP_gp_fault, regs);
+                return;
+            }
+            esp = v->arch.pv_vcpu.kernel_sp;
+            ss = v->arch.pv_vcpu.kernel_ss;
+            if ( (ss & 3) != (sel & 3) ||
+                 !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
+                 ((ar >> 13) & 3) != (sel & 3) ||
+                 !(ar & _SEGMENT_S) ||
+                 (ar & _SEGMENT_CODE) ||
+                 !(ar & _SEGMENT_WR) )
+            {
+                pv_inject_hw_exception(TRAP_invalid_tss, ss & ~3);
+                return;
+            }
+            if ( !(ar & _SEGMENT_P) ||
+                 !check_stack_limit(ar, limit, esp, (4 + nparm) * 4) )
+            {
+                pv_inject_hw_exception(TRAP_stack_error, ss & ~3);
+                return;
+            }
+            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
+            if ( !compat_access_ok(stkp - 4 - nparm, (4 + nparm) * 4) )
+            {
+                pv_inject_trap(TRAP_gp_fault, regs);
+                return;
+            }
+            push(regs->ss);
+            push(regs->rsp);
+            if ( nparm )
+            {
+                const unsigned int *ustkp;
+
+                if ( !read_descriptor(regs->ss, v, &base, &limit, &ar, 0) ||
+                     ((ar >> 13) & 3) != (regs->cs & 3) ||
+                     !(ar & _SEGMENT_S) ||
+                     (ar & _SEGMENT_CODE) ||
+                     !(ar & _SEGMENT_WR) ||
+                     !check_stack_limit(ar, limit, esp + nparm * 4, nparm * 4) )
+                    return pv_inject_trap(TRAP_gp_fault, regs);
+                ustkp = (unsigned int *)(unsigned long)
+                        ((unsigned int)base + regs->esp + nparm * 4);
+                if ( !compat_access_ok(ustkp - nparm, nparm * 4) )
+                {
+                    pv_inject_trap(TRAP_gp_fault, regs);
+                    return;
+                }
+                do
+                {
+                    unsigned int parm;
+
+                    --ustkp;
+                    rc = __get_user(parm, ustkp);
+                    if ( rc )
+                    {
+                        pv_inject_page_fault(0, (unsigned long)(ustkp + 1) - rc);
+                        return;
+                    }
+                    push(parm);
+                } while ( --nparm );
+            }
+        }
+        else
+        {
+            sel |= (regs->cs & 3);
+            esp = regs->rsp;
+            ss = regs->ss;
+            if ( !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
+                 ((ar >> 13) & 3) != (sel & 3) )
+            {
+                pv_inject_trap(TRAP_gp_fault, regs);
+                return;
+            }
+            if ( !check_stack_limit(ar, limit, esp, 2 * 4) )
+            {
+                pv_inject_hw_exception(TRAP_stack_error, 0);
+                return;
+            }
+            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
+            if ( !compat_access_ok(stkp - 2, 2 * 4) )
+            {
+                pv_inject_trap(TRAP_gp_fault, regs);
+                return;
+            }
+        }
+        push(regs->cs);
+        push(regs->rip + insn_len);
+#undef push
+        regs->rsp = esp;
+        regs->ss = ss;
+    }
+    else
+        sel |= (regs->cs & 3);
+
+    regs->cs = sel;
+    instruction_done(regs, off);
+}
+
+int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
+{
+    char opcode[3];
+    unsigned long eip, rc;
+    struct vcpu *v = current;
+
+    eip = regs->rip;
+    if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
+    {
+        pv_inject_page_fault(0, eip + sizeof(opcode) - rc);
+        return EXCRET_fault_fixed;
+    }
+    if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
+        return 0;
+    eip += sizeof(opcode);
+    pv_soft_rdtsc(v, regs, 1);
+    instruction_done(regs, eip);
+    return EXCRET_fault_fixed;
+}
+
+int emulate_forced_invalid_op(struct cpu_user_regs *regs)
+{
+    char sig[5], instr[2];
+    unsigned long eip, rc;
+    struct cpuid_leaf res;
+
+    eip = regs->rip;
+
+    /* Check for forced emulation signature: ud2 ; .ascii "xen". */
+    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
+    {
+        pv_inject_page_fault(0, eip + sizeof(sig) - rc);
+        return EXCRET_fault_fixed;
+    }
+    if ( memcmp(sig, "\xf\xbxen", sizeof(sig)) )
+        return 0;
+    eip += sizeof(sig);
+
+    /* We only emulate CPUID. */
+    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
+    {
+        pv_inject_page_fault(0, eip + sizeof(instr) - rc);
+        return EXCRET_fault_fixed;
+    }
+    if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
+        return 0;
+
+    /* If cpuid faulting is enabled and CPL>0 inject a #GP in place of #UD. */
+    if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
+    {
+        regs->rip = eip;
+        pv_inject_trap(TRAP_gp_fault, regs);
+        return EXCRET_fault_fixed;
+    }
+
+    eip += sizeof(instr);
+
+    guest_cpuid(current, regs->eax, regs->ecx, &res);
+
+    regs->rax = res.a;
+    regs->rbx = res.b;
+    regs->rcx = res.c;
+    regs->rdx = res.d;
+
+    instruction_done(regs, eip);
+
+    trace_trap_one_addr(TRC_PV_FORCED_INVALID_OP, regs->rip);
+
+    return EXCRET_fault_fixed;
+}
+
+int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
+                  struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt)
+{
+    guest_cpuid(current, leaf, subleaf, res);
+
+    return X86EMUL_OKAY;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2639be7850..a076de2be4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -77,6 +77,7 @@
 #include <public/arch-x86/cpuid.h>
 #include <asm/cpuid.h>
 #include <xsm/xsm.h>
+#include <asm/pv/traps.h>
 
 /*
  * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
@@ -684,52 +685,6 @@ void pv_inject_event(const struct x86_event *event)
     }
 }
 
-static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
-{
-    regs->rip = rip;
-    regs->eflags &= ~X86_EFLAGS_RF;
-    if ( regs->eflags & X86_EFLAGS_TF )
-    {
-        current->arch.debugreg[6] |= DR_STEP | DR_STATUS_RESERVED_ONE;
-        pv_inject_trap(TRAP_debug, regs);
-    }
-}
-
-static unsigned int check_guest_io_breakpoint(struct vcpu *v,
-    unsigned int port, unsigned int len)
-{
-    unsigned int width, i, match = 0;
-    unsigned long start;
-
-    if ( !(v->arch.debugreg[5]) ||
-         !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
-        return 0;
-
-    for ( i = 0; i < 4; i++ )
-    {
-        if ( !(v->arch.debugreg[5] &
-               (3 << (i * DR_ENABLE_SIZE))) )
-            continue;
-
-        start = v->arch.debugreg[i];
-        width = 0;
-
-        switch ( (v->arch.debugreg[7] >>
-                  (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
-        {
-        case DR_LEN_1: width = 1; break;
-        case DR_LEN_2: width = 2; break;
-        case DR_LEN_4: width = 4; break;
-        case DR_LEN_8: width = 8; break;
-        }
-
-        if ( (start < (port + len)) && ((start + width) > port) )
-            match |= 1 << i;
-    }
-
-    return match;
-}
-
 /*
  * Called from asm to set up the MCE trapbounce info.
  * Returns 0 if no callback is set up, else 1.
@@ -997,77 +952,6 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, uint32_t leaf,
     }
 }
 
-static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
-{
-    char opcode[3];
-    unsigned long eip, rc;
-    struct vcpu *v = current;
-
-    eip = regs->rip;
-    if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
-    {
-        pv_inject_page_fault(0, eip + sizeof(opcode) - rc);
-        return EXCRET_fault_fixed;
-    }
-    if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
-        return 0;
-    eip += sizeof(opcode);
-    pv_soft_rdtsc(v, regs, 1);
-    instruction_done(regs, eip);
-    return EXCRET_fault_fixed;
-}
-
-static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
-{
-    char sig[5], instr[2];
-    unsigned long eip, rc;
-    struct cpuid_leaf res;
-
-    eip = regs->rip;
-
-    /* Check for forced emulation signature: ud2 ; .ascii "xen". */
-    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
-    {
-        pv_inject_page_fault(0, eip + sizeof(sig) - rc);
-        return EXCRET_fault_fixed;
-    }
-    if ( memcmp(sig, "\xf\xbxen", sizeof(sig)) )
-        return 0;
-    eip += sizeof(sig);
-
-    /* We only emulate CPUID. */
-    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
-    {
-        pv_inject_page_fault(0, eip + sizeof(instr) - rc);
-        return EXCRET_fault_fixed;
-    }
-    if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
-        return 0;
-
-    /* If cpuid faulting is enabled and CPL>0 inject a #GP in place of #UD. */
-    if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
-    {
-        regs->rip = eip;
-        pv_inject_trap(TRAP_gp_fault, regs);
-        return EXCRET_fault_fixed;
-    }
-
-    eip += sizeof(instr);
-
-    guest_cpuid(current, regs->eax, regs->ecx, &res);
-
-    regs->rax = res.a;
-    regs->rbx = res.b;
-    regs->rcx = res.c;
-    regs->rdx = res.d;
-
-    instruction_done(regs, eip);
-
-    trace_trap_one_addr(TRC_PV_FORCED_INVALID_OP, regs->rip);
-
-    return EXCRET_fault_fixed;
-}
-
 void do_invalid_op(struct cpu_user_regs *regs)
 {
     const struct bug_frame *bug = NULL;
@@ -1609,1768 +1493,6 @@ long do_fpu_taskswitch(int set)
     return 0;
 }
 
-static int read_descriptor(unsigned int sel,
-                           const struct vcpu *v,
-                           unsigned long *base,
-                           unsigned long *limit,
-                           unsigned int *ar,
-                           bool_t insn_fetch)
-{
-    struct desc_struct desc;
-
-    if ( sel < 4)
-        desc.b = desc.a = 0;
-    else if ( __get_user(desc,
-                         (const struct desc_struct *)(!(sel & 4)
-                                                      ? GDT_VIRT_START(v)
-                                                      : LDT_VIRT_START(v))
-                         + (sel >> 3)) )
-        return 0;
-    if ( !insn_fetch )
-        desc.b &= ~_SEGMENT_L;
-
-    *ar = desc.b & 0x00f0ff00;
-    if ( !(desc.b & _SEGMENT_L) )
-    {
-        *base = ((desc.a >> 16) + ((desc.b & 0xff) << 16) +
-                 (desc.b & 0xff000000));
-        *limit = (desc.a & 0xffff) | (desc.b & 0x000f0000);
-        if ( desc.b & _SEGMENT_G )
-            *limit = ((*limit + 1) << 12) - 1;
-#ifndef NDEBUG
-        if ( sel > 3 )
-        {
-            unsigned int a, l;
-            unsigned char valid;
-
-            asm volatile (
-                "larl %2,%0 ; setz %1"
-                : "=r" (a), "=qm" (valid) : "rm" (sel));
-            BUG_ON(valid && ((a & 0x00f0ff00) != *ar));
-            asm volatile (
-                "lsll %2,%0 ; setz %1"
-                : "=r" (l), "=qm" (valid) : "rm" (sel));
-            BUG_ON(valid && (l != *limit));
-        }
-#endif
-    }
-    else
-    {
-        *base = 0UL;
-        *limit = ~0UL;
-    }
-
-    return 1;
-}
-
-static int read_gate_descriptor(unsigned int gate_sel,
-                                const struct vcpu *v,
-                                unsigned int *sel,
-                                unsigned long *off,
-                                unsigned int *ar)
-{
-    struct desc_struct desc;
-    const struct desc_struct *pdesc;
-
-
-    pdesc = (const struct desc_struct *)
-        (!(gate_sel & 4) ? GDT_VIRT_START(v) : LDT_VIRT_START(v))
-        + (gate_sel >> 3);
-    if ( (gate_sel < 4) ||
-         ((gate_sel >= FIRST_RESERVED_GDT_BYTE) && !(gate_sel & 4)) ||
-         __get_user(desc, pdesc) )
-        return 0;
-
-    *sel = (desc.a >> 16) & 0x0000fffc;
-    *off = (desc.a & 0x0000ffff) | (desc.b & 0xffff0000);
-    *ar = desc.b & 0x0000ffff;
-
-    /*
-     * check_descriptor() clears the DPL field and stores the
-     * guest requested DPL in the selector's RPL field.
-     */
-    if ( *ar & _SEGMENT_DPL )
-        return 0;
-    *ar |= (desc.a >> (16 - 13)) & _SEGMENT_DPL;
-
-    if ( !is_pv_32bit_vcpu(v) )
-    {
-        if ( (*ar & 0x1f00) != 0x0c00 ||
-             (gate_sel >= FIRST_RESERVED_GDT_BYTE - 8 && !(gate_sel & 4)) ||
-             __get_user(desc, pdesc + 1) ||
-             (desc.b & 0x1f00) )
-            return 0;
-
-        *off |= (unsigned long)desc.a << 32;
-        return 1;
-    }
-
-    switch ( *ar & 0x1f00 )
-    {
-    case 0x0400:
-        *off &= 0xffff;
-        break;
-    case 0x0c00:
-        break;
-    default:
-        return 0;
-    }
-
-    return 1;
-}
-
-static int pv_emul_virt_to_linear(unsigned long base, unsigned long offset,
-                                  unsigned int bytes, unsigned long limit,
-                                  enum x86_segment seg,
-                                  struct x86_emulate_ctxt *ctxt,
-                                  unsigned long *addr)
-{
-    int rc = X86EMUL_OKAY;
-
-    *addr = base + offset;
-
-    if ( ctxt->addr_size < 64 )
-    {
-        if ( limit < bytes - 1 || offset > limit - bytes + 1 )
-            rc = X86EMUL_EXCEPTION;
-        *addr = (uint32_t)*addr;
-    }
-    else if ( !__addr_ok(*addr) )
-        rc = X86EMUL_EXCEPTION;
-
-    if ( unlikely(rc == X86EMUL_EXCEPTION) )
-        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
-                                                : TRAP_stack_error,
-                              0, ctxt);
-
-    return rc;
-}
-
-struct priv_op_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    struct {
-        unsigned long base, limit;
-    } cs;
-    char *io_emul_stub;
-    unsigned int bpmatch;
-    unsigned int tsc;
-#define TSC_BASE 1
-#define TSC_AUX 2
-};
-
-static int priv_op_insn_fetch(enum x86_segment seg,
-                              unsigned long offset,
-                              void *p_data,
-                              unsigned int bytes,
-                              struct x86_emulate_ctxt *ctxt)
-{
-    const struct priv_op_ctxt *poc =
-        container_of(ctxt, struct priv_op_ctxt, ctxt);
-    unsigned int rc;
-    unsigned long addr = poc->cs.base + offset;
-
-    ASSERT(seg == x86_seg_cs);
-
-    /* We don't mean to emulate any branches. */
-    if ( !bytes )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = pv_emul_virt_to_linear(poc->cs.base, offset, bytes, poc->cs.limit,
-                                x86_seg_cs, ctxt, &addr);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) != 0 )
-    {
-        /*
-         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
-         * cpu_has_nx, but we'd then need a "fetch" variant of
-         * __copy_from_user() respecting NX, SMEP, and protection keys.
-         */
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_read_segment(enum x86_segment seg,
-                                struct segment_register *reg,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    /* Check if this is an attempt to access the I/O bitmap. */
-    if ( seg == x86_seg_tr )
-    {
-        switch ( ctxt->opcode )
-        {
-        case 0x6c ... 0x6f: /* ins / outs */
-        case 0xe4 ... 0xe7: /* in / out (immediate port) */
-        case 0xec ... 0xef: /* in / out (port in %dx) */
-            /* Defer the check to priv_op_{read,write}_io(). */
-            return X86EMUL_DONE;
-        }
-    }
-
-    if ( ctxt->addr_size < 64 )
-    {
-        unsigned long limit;
-        unsigned int sel, ar;
-
-        switch ( seg )
-        {
-        case x86_seg_cs: sel = ctxt->regs->cs; break;
-        case x86_seg_ds: sel = read_sreg(ds);  break;
-        case x86_seg_es: sel = read_sreg(es);  break;
-        case x86_seg_fs: sel = read_sreg(fs);  break;
-        case x86_seg_gs: sel = read_sreg(gs);  break;
-        case x86_seg_ss: sel = ctxt->regs->ss; break;
-        default: return X86EMUL_UNHANDLEABLE;
-        }
-
-        if ( !read_descriptor(sel, current, &reg->base, &limit, &ar, 0) )
-            return X86EMUL_UNHANDLEABLE;
-
-        reg->limit = limit;
-        reg->attr.bytes = ar >> 8;
-    }
-    else
-    {
-        switch ( seg )
-        {
-        default:
-            if ( !is_x86_user_segment(seg) )
-                return X86EMUL_UNHANDLEABLE;
-            reg->base = 0;
-            break;
-        case x86_seg_fs:
-            reg->base = rdfsbase();
-            break;
-        case x86_seg_gs:
-            reg->base = rdgsbase();
-            break;
-        }
-
-        reg->limit = ~0U;
-
-        reg->attr.bytes = 0;
-        reg->attr.fields.type = _SEGMENT_WR >> 8;
-        if ( seg == x86_seg_cs )
-        {
-            reg->attr.fields.type |= _SEGMENT_CODE >> 8;
-            reg->attr.fields.l = 1;
-        }
-        else
-            reg->attr.fields.db = 1;
-        reg->attr.fields.s   = 1;
-        reg->attr.fields.dpl = 3;
-        reg->attr.fields.p   = 1;
-        reg->attr.fields.g   = 1;
-    }
-
-    /*
-     * For x86_emulate.c's mode_ring0() to work, fake a DPL of zero.
-     * Also do this for consistency for non-conforming code segments.
-     */
-    if ( (seg == x86_seg_ss ||
-          (seg == x86_seg_cs &&
-           !(reg->attr.fields.type & (_SEGMENT_EC >> 8)))) &&
-         guest_kernel_mode(current, ctxt->regs) )
-        reg->attr.fields.dpl = 0;
-
-    return X86EMUL_OKAY;
-}
-
-/* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
-static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
-{
-    unsigned int cpl = guest_kernel_mode(v, regs) ?
-        (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
-
-    ASSERT((v->arch.pv_vcpu.iopl & ~X86_EFLAGS_IOPL) == 0);
-
-    return IOPL(cpl) <= v->arch.pv_vcpu.iopl;
-}
-
-/* Has the guest requested sufficient permission for this I/O access? */
-static int guest_io_okay(
-    unsigned int port, unsigned int bytes,
-    struct vcpu *v, struct cpu_user_regs *regs)
-{
-    /* If in user mode, switch to kernel mode just to read I/O bitmap. */
-    int user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
-    if ( iopl_ok(v, regs) )
-        return 1;
-
-    if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
-    {
-        union { uint8_t bytes[2]; uint16_t mask; } x;
-
-        /*
-         * Grab permission bytes from guest space. Inaccessible bytes are
-         * read as 0xff (no access allowed).
-         */
-        TOGGLE_MODE();
-        switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
-                                          port>>3, 2) )
-        {
-        default: x.bytes[0] = ~0;
-            /* fallthrough */
-        case 1:  x.bytes[1] = ~0;
-            /* fallthrough */
-        case 0:  break;
-        }
-        TOGGLE_MODE();
-
-        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
-            return 1;
-    }
-
-    return 0;
-}
-
-/* Has the administrator granted sufficient permission for this I/O access? */
-static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
-                            const struct domain *d)
-{
-    /*
-     * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
-     * We never permit direct access to that register.
-     */
-    if ( (port == 0xcf8) && (bytes == 4) )
-        return 0;
-
-    /* We also never permit direct access to the RTC/CMOS registers. */
-    if ( ((port & ~1) == RTC_PORT(0)) )
-        return 0;
-
-    return ioports_access_permitted(d, port, port + bytes - 1);
-}
-
-static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
-                         unsigned int size, uint32_t *write)
-{
-    uint32_t machine_bdf;
-
-    if ( !is_hardware_domain(currd) )
-        return 0;
-
-    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
-        return 1;
-
-    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
-    if ( write )
-    {
-        const unsigned long *ro_map = pci_get_ro_map(0);
-
-        if ( ro_map && test_bit(machine_bdf, ro_map) )
-            return 0;
-    }
-    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
-    /* AMD extended configuration space access? */
-    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
-         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
-    {
-        uint64_t msr_val;
-
-        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
-            return 0;
-        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
-            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
-    }
-
-    return !write ?
-           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
-                                     start, start + size - 1, 0) == 0 :
-           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
-}
-
-uint32_t guest_io_read(unsigned int port, unsigned int bytes,
-                       struct domain *currd)
-{
-    uint32_t data = 0;
-    unsigned int shift = 0;
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        switch ( bytes )
-        {
-        case 1: return inb(port);
-        case 2: return inw(port);
-        case 4: return inl(port);
-        }
-    }
-
-    while ( bytes != 0 )
-    {
-        unsigned int size = 1;
-        uint32_t sub_data = ~0;
-
-        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
-        {
-            sub_data = pv_pit_handler(port, 0, 0);
-        }
-        else if ( port == RTC_PORT(0) )
-        {
-            sub_data = currd->arch.cmos_idx;
-        }
-        else if ( (port == RTC_PORT(1)) &&
-                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
-        {
-            unsigned long flags;
-
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            sub_data = inb(RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
-        }
-        else if ( (port == 0xcf8) && (bytes == 4) )
-        {
-            size = 4;
-            sub_data = currd->arch.pci_cf8;
-        }
-        else if ( (port & 0xfffc) == 0xcfc )
-        {
-            size = min(bytes, 4 - (port & 3));
-            if ( size == 3 )
-                size = 2;
-            if ( pci_cfg_ok(currd, port & 3, size, NULL) )
-                sub_data = pci_conf_read(currd->arch.pci_cf8, port & 3, size);
-        }
-
-        if ( size == 4 )
-            return sub_data;
-
-        data |= (sub_data & ((1u << (size * 8)) - 1)) << shift;
-        shift += size * 8;
-        port += size;
-        bytes -= size;
-    }
-
-    return data;
-}
-
-void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
-                    struct domain *currd)
-{
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        switch ( bytes ) {
-        case 1:
-            outb((uint8_t)data, port);
-            if ( pv_post_outb_hook )
-                pv_post_outb_hook(port, (uint8_t)data);
-            break;
-        case 2:
-            outw((uint16_t)data, port);
-            break;
-        case 4:
-            outl(data, port);
-            break;
-        }
-        return;
-    }
-
-    while ( bytes != 0 )
-    {
-        unsigned int size = 1;
-
-        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
-        {
-            pv_pit_handler(port, (uint8_t)data, 1);
-        }
-        else if ( port == RTC_PORT(0) )
-        {
-            currd->arch.cmos_idx = data;
-        }
-        else if ( (port == RTC_PORT(1)) &&
-                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
-        {
-            unsigned long flags;
-
-            if ( pv_rtc_handler )
-                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            outb(data, RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
-        }
-        else if ( (port == 0xcf8) && (bytes == 4) )
-        {
-            size = 4;
-            currd->arch.pci_cf8 = data;
-        }
-        else if ( (port & 0xfffc) == 0xcfc )
-        {
-            size = min(bytes, 4 - (port & 3));
-            if ( size == 3 )
-                size = 2;
-            if ( pci_cfg_ok(currd, port & 3, size, &data) )
-                pci_conf_write(currd->arch.pci_cf8, port & 3, size, data);
-        }
-
-        if ( size == 4 )
-            return;
-
-        port += size;
-        bytes -= size;
-        data >>= size * 8;
-    }
-}
-
-/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
-void host_to_guest_gpr_switch(struct cpu_user_regs *);
-unsigned long guest_to_host_gpr_switch(unsigned long);
-
-void (*pv_post_outb_hook)(unsigned int port, u8 value);
-
-typedef void io_emul_stub_t(struct cpu_user_regs *);
-
-static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
-                                          unsigned int port, unsigned int bytes)
-{
-    if ( !ctxt->io_emul_stub )
-        ctxt->io_emul_stub = map_domain_page(_mfn(this_cpu(stubs.mfn))) +
-                                             (this_cpu(stubs.addr) &
-                                              ~PAGE_MASK) +
-                                             STUB_BUF_SIZE / 2;
-
-    /* movq $host_to_guest_gpr_switch,%rcx */
-    ctxt->io_emul_stub[0] = 0x48;
-    ctxt->io_emul_stub[1] = 0xb9;
-    *(void **)&ctxt->io_emul_stub[2] = (void *)host_to_guest_gpr_switch;
-    /* callq *%rcx */
-    ctxt->io_emul_stub[10] = 0xff;
-    ctxt->io_emul_stub[11] = 0xd1;
-    /* data16 or nop */
-    ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
-    /* <io-access opcode> */
-    ctxt->io_emul_stub[13] = opcode;
-    /* imm8 or nop */
-    ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
-    /* ret (jumps to guest_to_host_gpr_switch) */
-    ctxt->io_emul_stub[15] = 0xc3;
-    BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
-
-    if ( ioemul_handle_quirk )
-        ioemul_handle_quirk(opcode, &ctxt->io_emul_stub[12], ctxt->ctxt.regs);
-
-    /* Handy function-typed pointer to the stub. */
-    return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
-}
-
-static int priv_op_read_io(unsigned int port, unsigned int bytes,
-                           unsigned long *val, struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-
-    /* INS must not come here. */
-    ASSERT((ctxt->opcode & ~9) == 0xe4);
-
-    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        io_emul_stub_t *io_emul =
-            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
-
-        mark_regs_dirty(ctxt->regs);
-        io_emul(ctxt->regs);
-        return X86EMUL_DONE;
-    }
-
-    *val = guest_io_read(port, bytes, currd);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_write_io(unsigned int port, unsigned int bytes,
-                            unsigned long val, struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-
-    /* OUTS must not come here. */
-    ASSERT((ctxt->opcode & ~9) == 0xe6);
-
-    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        io_emul_stub_t *io_emul =
-            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
-
-        mark_regs_dirty(ctxt->regs);
-        io_emul(ctxt->regs);
-        if ( (bytes == 1) && pv_post_outb_hook )
-            pv_post_outb_hook(port, val);
-        return X86EMUL_DONE;
-    }
-
-    guest_io_write(port, bytes, val, currd);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_rep_ins(uint16_t port,
-                           enum x86_segment seg, unsigned long offset,
-                           unsigned int bytes_per_rep, unsigned long *reps,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-    unsigned long goal = *reps;
-    struct segment_register sreg;
-    int rc;
-
-    ASSERT(seg == x86_seg_es);
-
-    *reps = 0;
-
-    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = priv_op_read_segment(x86_seg_es, &sreg, ctxt);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( !sreg.attr.fields.p )
-        return X86EMUL_UNHANDLEABLE;
-    if ( !sreg.attr.fields.s ||
-         (sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) ||
-         !(sreg.attr.fields.type & (_SEGMENT_WR >> 8)) )
-    {
-        x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
-
-    while ( *reps < goal )
-    {
-        unsigned int data = guest_io_read(port, bytes_per_rep, currd);
-        unsigned long addr;
-
-        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
-                                    sreg.limit, x86_seg_es, ctxt, &addr);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-
-        if ( (rc = __copy_to_user((void *)addr, &data, bytes_per_rep)) != 0 )
-        {
-            x86_emul_pagefault(PFEC_write_access,
-                               addr + bytes_per_rep - rc, ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-
-        ++*reps;
-
-        if ( poc->bpmatch || hypercall_preempt_check() )
-            break;
-
-        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
-        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
-            offset -= bytes_per_rep;
-        else
-            offset += bytes_per_rep;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_rep_outs(enum x86_segment seg, unsigned long offset,
-                            uint16_t port,
-                            unsigned int bytes_per_rep, unsigned long *reps,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-    unsigned long goal = *reps;
-    struct segment_register sreg;
-    int rc;
-
-    *reps = 0;
-
-    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = priv_op_read_segment(seg, &sreg, ctxt);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( !sreg.attr.fields.p )
-        return X86EMUL_UNHANDLEABLE;
-    if ( !sreg.attr.fields.s ||
-         ((sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) &&
-          !(sreg.attr.fields.type & (_SEGMENT_WR >> 8))) )
-    {
-        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
-                                                : TRAP_stack_error,
-                              0, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
-
-    while ( *reps < goal )
-    {
-        unsigned int data = 0;
-        unsigned long addr;
-
-        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
-                                    sreg.limit, seg, ctxt, &addr);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-
-        if ( (rc = __copy_from_user(&data, (void *)addr, bytes_per_rep)) != 0 )
-        {
-            x86_emul_pagefault(0, addr + bytes_per_rep - rc, ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-
-        guest_io_write(port, bytes_per_rep, data, currd);
-
-        ++*reps;
-
-        if ( poc->bpmatch || hypercall_preempt_check() )
-            break;
-
-        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
-        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
-            offset -= bytes_per_rep;
-        else
-            offset += bytes_per_rep;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_read_cr(unsigned int reg, unsigned long *val,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    const struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0: /* Read CR0 */
-        *val = (read_cr0() & ~X86_CR0_TS) | curr->arch.pv_vcpu.ctrlreg[0];
-        return X86EMUL_OKAY;
-
-    case 2: /* Read CR2 */
-    case 4: /* Read CR4 */
-        *val = curr->arch.pv_vcpu.ctrlreg[reg];
-        return X86EMUL_OKAY;
-
-    case 3: /* Read CR3 */
-    {
-        const struct domain *currd = curr->domain;
-        unsigned long mfn;
-
-        if ( !is_pv_32bit_domain(currd) )
-        {
-            mfn = pagetable_get_pfn(curr->arch.guest_table);
-            *val = xen_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
-        }
-        else
-        {
-            l4_pgentry_t *pl4e =
-                map_domain_page(_mfn(pagetable_get_pfn(curr->arch.guest_table)));
-
-            mfn = l4e_get_pfn(*pl4e);
-            unmap_domain_page(pl4e);
-            *val = compat_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
-        }
-        /* PTs should not be shared */
-        BUG_ON(page_get_owner(mfn_to_page(mfn)) == dom_cow);
-        return X86EMUL_OKAY;
-    }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_write_cr(unsigned int reg, unsigned long val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0: /* Write CR0 */
-        if ( (val ^ read_cr0()) & ~X86_CR0_TS )
-        {
-            gdprintk(XENLOG_WARNING,
-                    "Attempt to change unmodifiable CR0 flags\n");
-            break;
-        }
-        do_fpu_taskswitch(!!(val & X86_CR0_TS));
-        return X86EMUL_OKAY;
-
-    case 2: /* Write CR2 */
-        curr->arch.pv_vcpu.ctrlreg[2] = val;
-        arch_set_cr2(curr, val);
-        return X86EMUL_OKAY;
-
-    case 3: /* Write CR3 */
-    {
-        struct domain *currd = curr->domain;
-        unsigned long gfn;
-        struct page_info *page;
-        int rc;
-
-        gfn = !is_pv_32bit_domain(currd)
-              ? xen_cr3_to_pfn(val) : compat_cr3_to_pfn(val);
-        page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
-        if ( !page )
-            break;
-        rc = new_guest_cr3(page_to_mfn(page));
-        put_page(page);
-
-        switch ( rc )
-        {
-        case 0:
-            return X86EMUL_OKAY;
-        case -ERESTART: /* retry after preemption */
-            return X86EMUL_RETRY;
-        }
-        break;
-    }
-
-    case 4: /* Write CR4 */
-        curr->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(curr, val);
-        write_cr4(pv_guest_cr4_to_real_cr4(curr));
-        ctxt_switch_levelling(curr);
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_read_dr(unsigned int reg, unsigned long *val,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    unsigned long res = do_get_debugreg(reg);
-
-    if ( IS_ERR_VALUE(res) )
-        return X86EMUL_UNHANDLEABLE;
-
-    *val = res;
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_write_dr(unsigned int reg, unsigned long val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    return do_set_debugreg(reg, val) == 0
-           ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
-}
-
-static inline uint64_t guest_misc_enable(uint64_t val)
-{
-    val &= ~(MSR_IA32_MISC_ENABLE_PERF_AVAIL |
-             MSR_IA32_MISC_ENABLE_MONITOR_ENABLE);
-    val |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
-           MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
-           MSR_IA32_MISC_ENABLE_XTPR_DISABLE;
-    return val;
-}
-
-static inline bool is_cpufreq_controller(const struct domain *d)
-{
-    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
-            is_hardware_domain(d));
-}
-
-static int priv_op_read_msr(unsigned int reg, uint64_t *val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    const struct vcpu *curr = current;
-    const struct domain *currd = curr->domain;
-    bool vpmu_msr = false;
-
-    switch ( reg )
-    {
-        int rc;
-
-    case MSR_FS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = cpu_has_fsgsbase ? __rdfsbase() : curr->arch.pv_vcpu.fs_base;
-        return X86EMUL_OKAY;
-
-    case MSR_GS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = cpu_has_fsgsbase ? __rdgsbase()
-                                : curr->arch.pv_vcpu.gs_base_kernel;
-        return X86EMUL_OKAY;
-
-    case MSR_SHADOW_GS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = curr->arch.pv_vcpu.gs_base_user;
-        return X86EMUL_OKAY;
-
-    /*
-     * In order to fully retain original behavior, defer calling
-     * pv_soft_rdtsc() until after emulation. This may want/need to be
-     * reconsidered.
-     */
-    case MSR_IA32_TSC:
-        poc->tsc |= TSC_BASE;
-        goto normal;
-
-    case MSR_TSC_AUX:
-        poc->tsc |= TSC_AUX;
-        if ( cpu_has_rdtscp )
-            goto normal;
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_EFER:
-        *val = read_efer();
-        if ( is_pv_32bit_domain(currd) )
-            *val &= ~(EFER_LME | EFER_LMA | EFER_LMSLE);
-        return X86EMUL_OKAY;
-
-    case MSR_K7_FID_VID_CTL:
-    case MSR_K7_FID_VID_STATUS:
-    case MSR_K8_PSTATE_LIMIT:
-    case MSR_K8_PSTATE_CTRL:
-    case MSR_K8_PSTATE_STATUS:
-    case MSR_K8_PSTATE0:
-    case MSR_K8_PSTATE1:
-    case MSR_K8_PSTATE2:
-    case MSR_K8_PSTATE3:
-    case MSR_K8_PSTATE4:
-    case MSR_K8_PSTATE5:
-    case MSR_K8_PSTATE6:
-    case MSR_K8_PSTATE7:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            break;
-        if ( unlikely(is_cpufreq_controller(currd)) )
-            goto normal;
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_UCODE_REV:
-        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
-                break;
-            /* As documented in the SDM: Do a CPUID 1 here */
-            cpuid_eax(1);
-        }
-        goto normal;
-
-    case MSR_IA32_MISC_ENABLE:
-        if ( rdmsr_safe(reg, *val) )
-            break;
-        *val = guest_misc_enable(*val);
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR0_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            break;
-        *val = curr->arch.pv_vcpu.dr_mask[0];
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            break;
-        *val = curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1];
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_PERF_CAPABILITIES:
-        /* No extra capabilities are supported. */
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_PLATFORM_INFO:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
-            break;
-        *val = 0;
-        if ( this_cpu(cpuid_faulting_enabled) )
-            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_MISC_FEATURES_ENABLES:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
-            break;
-        *val = 0;
-        if ( curr->arch.cpuid_faulting )
-            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
-        return X86EMUL_OKAY;
-
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            vpmu_msr = true;
-            /* fall through */
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
-            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
-            {
-                if ( vpmu_do_rdmsr(reg, val) )
-                    break;
-                return X86EMUL_OKAY;
-            }
-        }
-        /* fall through */
-    default:
-        if ( rdmsr_hypervisor_regs(reg, val) )
-            return X86EMUL_OKAY;
-
-        rc = vmce_rdmsr(reg, val);
-        if ( rc < 0 )
-            break;
-        if ( rc )
-            return X86EMUL_OKAY;
-        /* fall through */
-    normal:
-        /* Everyone can read the MSR space. */
-        /* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
-        if ( rdmsr_safe(reg, *val) )
-            break;
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-#include "x86_64/mmconfig.h"
-
-static int priv_op_write_msr(unsigned int reg, uint64_t val,
-                             struct x86_emulate_ctxt *ctxt)
-{
-    struct vcpu *curr = current;
-    const struct domain *currd = curr->domain;
-    bool vpmu_msr = false;
-
-    switch ( reg )
-    {
-        uint64_t temp;
-        int rc;
-
-    case MSR_FS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrfsbase(val);
-        curr->arch.pv_vcpu.fs_base = val;
-        return X86EMUL_OKAY;
-
-    case MSR_GS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrgsbase(val);
-        curr->arch.pv_vcpu.gs_base_kernel = val;
-        return X86EMUL_OKAY;
-
-    case MSR_SHADOW_GS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrmsrl(MSR_SHADOW_GS_BASE, val);
-        curr->arch.pv_vcpu.gs_base_user = val;
-        return X86EMUL_OKAY;
-
-    case MSR_K7_FID_VID_STATUS:
-    case MSR_K7_FID_VID_CTL:
-    case MSR_K8_PSTATE_LIMIT:
-    case MSR_K8_PSTATE_CTRL:
-    case MSR_K8_PSTATE_STATUS:
-    case MSR_K8_PSTATE0:
-    case MSR_K8_PSTATE1:
-    case MSR_K8_PSTATE2:
-    case MSR_K8_PSTATE3:
-    case MSR_K8_PSTATE4:
-    case MSR_K8_PSTATE5:
-    case MSR_K8_PSTATE6:
-    case MSR_K8_PSTATE7:
-    case MSR_K8_HWCR:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_AMD64_NB_CFG:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
-             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
-             ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-            goto invalid;
-        if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_FAM10H_MMIO_CONF_BASE:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
-             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, temp) != 0 )
-            break;
-        if ( (pci_probe & PCI_PROBE_MASK) == PCI_PROBE_MMCONF ?
-             temp != val :
-             ((temp ^ val) &
-              ~(FAM10H_MMIO_CONF_ENABLE |
-                (FAM10H_MMIO_CONF_BUSRANGE_MASK <<
-                 FAM10H_MMIO_CONF_BUSRANGE_SHIFT) |
-                ((u64)FAM10H_MMIO_CONF_BASE_MASK <<
-                 FAM10H_MMIO_CONF_BASE_SHIFT))) )
-            goto invalid;
-        if ( wrmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_UCODE_REV:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val )
-            goto invalid;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_MISC_ENABLE:
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val != guest_misc_enable(temp) )
-            goto invalid;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_MPERF:
-    case MSR_IA32_APERF:
-        if ( (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) &&
-             (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_PERF_CTL:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_THERM_CONTROL:
-    case MSR_IA32_ENERGY_PERF_BIAS:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_AMD64_DR0_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
-            break;
-        curr->arch.pv_vcpu.dr_mask[0] = val;
-        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
-            wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, val);
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
-            break;
-        curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1] = val;
-        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
-            wrmsrl(reg, val);
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_PLATFORM_INFO:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
-            break;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_MISC_FEATURES_ENABLES:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
-             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
-            break;
-        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
-             !this_cpu(cpuid_faulting_enabled) )
-            break;
-        curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
-        return X86EMUL_OKAY;
-
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            vpmu_msr = true;
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
-            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
-            {
-                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
-                     !is_hardware_domain(currd) )
-                    return X86EMUL_OKAY;
-
-                if ( vpmu_do_wrmsr(reg, val, 0) )
-                    break;
-                return X86EMUL_OKAY;
-            }
-        }
-        /* fall through */
-    default:
-        if ( wrmsr_hypervisor_regs(reg, val) == 1 )
-            return X86EMUL_OKAY;
-
-        rc = vmce_wrmsr(reg, val);
-        if ( rc < 0 )
-            break;
-        if ( rc )
-            return X86EMUL_OKAY;
-
-        if ( (rdmsr_safe(reg, temp) != 0) || (val != temp) )
-    invalid:
-            gdprintk(XENLOG_WARNING,
-                     "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 0x%016"PRIx64"\n",
-                     reg, temp, val);
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_wbinvd(struct x86_emulate_ctxt *ctxt)
-{
-    /* Ignore the instruction if unprivileged. */
-    if ( !cache_flush_permitted(current->domain) )
-        /*
-         * Non-physdev domain attempted WBINVD; ignore for now since
-         * newer linux uses this in some start-of-day timing loops.
-         */
-        ;
-    else
-        wbinvd();
-
-    return X86EMUL_OKAY;
-}
-
-int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
-                  struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt)
-{
-    guest_cpuid(current, leaf, subleaf, res);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_validate(const struct x86_emulate_state *state,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    switch ( ctxt->opcode )
-    {
-    case 0x6c ... 0x6f: /* ins / outs */
-    case 0xe4 ... 0xe7: /* in / out (immediate port) */
-    case 0xec ... 0xef: /* in / out (port in %dx) */
-    case X86EMUL_OPC(0x0f, 0x06): /* clts */
-    case X86EMUL_OPC(0x0f, 0x09): /* wbinvd */
-    case X86EMUL_OPC(0x0f, 0x20) ...
-         X86EMUL_OPC(0x0f, 0x23): /* mov to/from cr/dr */
-    case X86EMUL_OPC(0x0f, 0x30): /* wrmsr */
-    case X86EMUL_OPC(0x0f, 0x31): /* rdtsc */
-    case X86EMUL_OPC(0x0f, 0x32): /* rdmsr */
-    case X86EMUL_OPC(0x0f, 0xa2): /* cpuid */
-        return X86EMUL_OKAY;
-
-    case 0xfa: case 0xfb: /* cli / sti */
-        if ( !iopl_ok(current, ctxt->regs) )
-            break;
-        /*
-         * This is just too dangerous to allow, in my opinion. Consider if the
-         * caller then tries to reenable interrupts using POPF: we can't trap
-         * that and we'll end up with hard-to-debug lockups. Fast & loose will
-         * do for us. :-)
-        vcpu_info(current, evtchn_upcall_mask) = (ctxt->opcode == 0xfa);
-         */
-        return X86EMUL_DONE;
-
-    case X86EMUL_OPC(0x0f, 0x01):
-    {
-        unsigned int modrm_rm, modrm_reg;
-
-        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
-             (modrm_rm & 7) != 1 )
-            break;
-        switch ( modrm_reg & 7 )
-        {
-        case 2: /* xsetbv */
-        case 7: /* rdtscp */
-            return X86EMUL_OKAY;
-        }
-        break;
-    }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops priv_op_ops = {
-    .insn_fetch          = priv_op_insn_fetch,
-    .read                = x86emul_unhandleable_rw,
-    .validate            = priv_op_validate,
-    .read_io             = priv_op_read_io,
-    .write_io            = priv_op_write_io,
-    .rep_ins             = priv_op_rep_ins,
-    .rep_outs            = priv_op_rep_outs,
-    .read_segment        = priv_op_read_segment,
-    .read_cr             = priv_op_read_cr,
-    .write_cr            = priv_op_write_cr,
-    .read_dr             = priv_op_read_dr,
-    .write_dr            = priv_op_write_dr,
-    .read_msr            = priv_op_read_msr,
-    .write_msr           = priv_op_write_msr,
-    .cpuid               = pv_emul_cpuid,
-    .wbinvd              = priv_op_wbinvd,
-};
-
-static int emulate_privileged_op(struct cpu_user_regs *regs)
-{
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    struct priv_op_ctxt ctxt = {
-        .ctxt.regs = regs,
-        .ctxt.vendor = currd->arch.cpuid->x86_vendor,
-        .ctxt.lma = !is_pv_32bit_domain(currd),
-    };
-    int rc;
-    unsigned int eflags, ar;
-
-    if ( !read_descriptor(regs->cs, curr, &ctxt.cs.base, &ctxt.cs.limit,
-                          &ar, 1) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_P) ||
-         !(ar & _SEGMENT_CODE) )
-        return 0;
-
-    /* Mirror virtualized state into EFLAGS. */
-    ASSERT(regs->eflags & X86_EFLAGS_IF);
-    if ( vcpu_info(curr, evtchn_upcall_mask) )
-        regs->eflags &= ~X86_EFLAGS_IF;
-    else
-        regs->eflags |= X86_EFLAGS_IF;
-    ASSERT(!(regs->eflags & X86_EFLAGS_IOPL));
-    regs->eflags |= curr->arch.pv_vcpu.iopl;
-    eflags = regs->eflags;
-
-    ctxt.ctxt.addr_size = ar & _SEGMENT_L ? 64 : ar & _SEGMENT_DB ? 32 : 16;
-    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed. */
-    rc = x86_emulate(&ctxt.ctxt, &priv_op_ops);
-
-    if ( ctxt.io_emul_stub )
-        unmap_domain_page(ctxt.io_emul_stub);
-
-    /*
-     * Un-mirror virtualized state from EFLAGS.
-     * Nothing we allow to be emulated can change anything other than the
-     * arithmetic bits, and the resume flag.
-     */
-    ASSERT(!((regs->eflags ^ eflags) &
-             ~(X86_EFLAGS_RF | X86_EFLAGS_ARITH_MASK)));
-    regs->eflags |= X86_EFLAGS_IF;
-    regs->eflags &= ~X86_EFLAGS_IOPL;
-
-    switch ( rc )
-    {
-    case X86EMUL_OKAY:
-        if ( ctxt.tsc & TSC_BASE )
-        {
-            if ( ctxt.tsc & TSC_AUX )
-                pv_soft_rdtsc(curr, regs, 1);
-            else if ( currd->arch.vtsc )
-                pv_soft_rdtsc(curr, regs, 0);
-            else
-                msr_split(regs, rdtsc());
-        }
-
-        if ( ctxt.ctxt.retire.singlestep )
-            ctxt.bpmatch |= DR_STEP;
-        if ( ctxt.bpmatch )
-        {
-            curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
-            if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
-                pv_inject_trap(TRAP_debug, regs);
-        }
-        /* fall through */
-    case X86EMUL_RETRY:
-        return EXCRET_fault_fixed;
-
-    case X86EMUL_EXCEPTION:
-        pv_inject_event(&ctxt.ctxt.event);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
-static inline int check_stack_limit(unsigned int ar, unsigned int limit,
-                                    unsigned int esp, unsigned int decr)
-{
-    return (((esp - decr) < (esp - 1)) &&
-            (!(ar & _SEGMENT_EC) ? (esp - 1) <= limit : (esp - decr) > limit));
-}
-
-struct gate_op_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    struct {
-        unsigned long base, limit;
-    } cs;
-    bool insn_fetch;
-};
-
-static int gate_op_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    const struct gate_op_ctxt *goc =
-        container_of(ctxt, struct gate_op_ctxt, ctxt);
-    unsigned int rc = bytes, sel = 0;
-    unsigned long addr = offset, limit = 0;
-
-    switch ( seg )
-    {
-    case x86_seg_cs:
-        addr += goc->cs.base;
-        limit = goc->cs.limit;
-        break;
-    case x86_seg_ds:
-        sel = read_sreg(ds);
-        break;
-    case x86_seg_es:
-        sel = read_sreg(es);
-        break;
-    case x86_seg_fs:
-        sel = read_sreg(fs);
-        break;
-    case x86_seg_gs:
-        sel = read_sreg(gs);
-        break;
-    case x86_seg_ss:
-        sel = ctxt->regs->ss;
-        break;
-    default:
-        return X86EMUL_UNHANDLEABLE;
-    }
-    if ( sel )
-    {
-        unsigned int ar;
-
-        ASSERT(!goc->insn_fetch);
-        if ( !read_descriptor(sel, current, &addr, &limit, &ar, 0) ||
-             !(ar & _SEGMENT_S) ||
-             !(ar & _SEGMENT_P) ||
-             ((ar & _SEGMENT_CODE) && !(ar & _SEGMENT_WR)) )
-            return X86EMUL_UNHANDLEABLE;
-        addr += offset;
-    }
-    else if ( seg != x86_seg_cs )
-        return X86EMUL_UNHANDLEABLE;
-
-    /* We don't mean to emulate any branches. */
-    if ( limit < bytes - 1 || offset > limit - bytes + 1 )
-        return X86EMUL_UNHANDLEABLE;
-
-    addr = (uint32_t)addr;
-
-    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        /*
-         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
-         * cpu_has_nx, but we'd then need a "fetch" variant of
-         * __copy_from_user() respecting NX, SMEP, and protection keys.
-         */
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static void emulate_gate_op(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    unsigned int sel, ar, dpl, nparm, insn_len;
-    struct gate_op_ctxt ctxt = { .ctxt.regs = regs, .insn_fetch = true };
-    struct x86_emulate_state *state;
-    unsigned long off, base, limit;
-    uint16_t opnd_sel = 0;
-    int jump = -1, rc = X86EMUL_OKAY;
-
-    /* Check whether this fault is due to the use of a call gate. */
-    if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) ||
-         (((ar >> 13) & 3) < (regs->cs & 3)) ||
-         ((ar & _SEGMENT_TYPE) != 0xc00) )
-    {
-        pv_inject_trap(TRAP_gp_fault, regs);
-        return;
-    }
-    if ( !(ar & _SEGMENT_P) )
-    {
-        pv_inject_trap(TRAP_no_segment, regs);
-        return;
-    }
-    dpl = (ar >> 13) & 3;
-    nparm = ar & 0x1f;
-
-    /*
-     * Decode instruction (and perhaps operand) to determine RPL,
-     * whether this is a jump or a call, and the call return offset.
-     */
-    if ( !read_descriptor(regs->cs, v, &ctxt.cs.base, &ctxt.cs.limit,
-                          &ar, 0) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_P) ||
-         !(ar & _SEGMENT_CODE) )
-    {
-        pv_inject_trap(TRAP_gp_fault, regs);
-        return;
-    }
-
-    ctxt.ctxt.addr_size = ar & _SEGMENT_DB ? 32 : 16;
-    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed for decoding. */
-    state = x86_decode_insn(&ctxt.ctxt, gate_op_read);
-    ctxt.insn_fetch = false;
-    if ( IS_ERR_OR_NULL(state) )
-    {
-        if ( PTR_ERR(state) == -X86EMUL_EXCEPTION )
-            pv_inject_event(&ctxt.ctxt.event);
-        else
-            pv_inject_trap(TRAP_gp_fault, regs);
-        return;
-    }
-
-    switch ( ctxt.ctxt.opcode )
-    {
-        unsigned int modrm_345;
-
-    case 0xea:
-        ++jump;
-        /* fall through */
-    case 0x9a:
-        ++jump;
-        opnd_sel = x86_insn_immediate(state, 1);
-        break;
-    case 0xff:
-        if ( x86_insn_modrm(state, NULL, &modrm_345) >= 3 )
-            break;
-        switch ( modrm_345 & 7 )
-        {
-            enum x86_segment seg;
-
-        case 5:
-            ++jump;
-            /* fall through */
-        case 3:
-            ++jump;
-            base = x86_insn_operand_ea(state, &seg);
-            rc = gate_op_read(seg,
-                              base + (x86_insn_opsize(state) >> 3),
-                              &opnd_sel, sizeof(opnd_sel), &ctxt.ctxt);
-            break;
-        }
-        break;
-    }
-
-    insn_len = x86_insn_length(state, &ctxt.ctxt);
-    x86_emulate_free_state(state);
-
-    if ( rc == X86EMUL_EXCEPTION )
-    {
-        pv_inject_event(&ctxt.ctxt.event);
-        return;
-    }
-
-    if ( rc != X86EMUL_OKAY ||
-         jump < 0 ||
-         (opnd_sel & ~3) != regs->error_code ||
-         dpl < (opnd_sel & 3) )
-    {
-        pv_inject_trap(TRAP_gp_fault, regs);
-        return;
-    }
-
-    if ( !read_descriptor(sel, v, &base, &limit, &ar, 0) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_CODE) ||
-         (!jump || (ar & _SEGMENT_EC) ?
-          ((ar >> 13) & 3) > (regs->cs & 3) :
-          ((ar >> 13) & 3) != (regs->cs & 3)) )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, sel);
-        return;
-    }
-    if ( !(ar & _SEGMENT_P) )
-    {
-        pv_inject_hw_exception(TRAP_no_segment, sel);
-        return;
-    }
-    if ( off > limit )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, 0);
-        return;
-    }
-
-    if ( !jump )
-    {
-        unsigned int ss, esp, *stkp;
-        int rc;
-#define push(item) do \
-        { \
-            --stkp; \
-            esp -= 4; \
-            rc = __put_user(item, stkp); \
-            if ( rc ) \
-            { \
-                pv_inject_page_fault(PFEC_write_access, \
-                                     (unsigned long)(stkp + 1) - rc); \
-                return; \
-            } \
-        } while ( 0 )
-
-        if ( ((ar >> 13) & 3) < (regs->cs & 3) )
-        {
-            sel |= (ar >> 13) & 3;
-            /* Inner stack known only for kernel ring. */
-            if ( (sel & 3) != GUEST_KERNEL_RPL(v->domain) )
-            {
-                pv_inject_trap(TRAP_gp_fault, regs);
-                return;
-            }
-            esp = v->arch.pv_vcpu.kernel_sp;
-            ss = v->arch.pv_vcpu.kernel_ss;
-            if ( (ss & 3) != (sel & 3) ||
-                 !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
-                 ((ar >> 13) & 3) != (sel & 3) ||
-                 !(ar & _SEGMENT_S) ||
-                 (ar & _SEGMENT_CODE) ||
-                 !(ar & _SEGMENT_WR) )
-            {
-                pv_inject_hw_exception(TRAP_invalid_tss, ss & ~3);
-                return;
-            }
-            if ( !(ar & _SEGMENT_P) ||
-                 !check_stack_limit(ar, limit, esp, (4 + nparm) * 4) )
-            {
-                pv_inject_hw_exception(TRAP_stack_error, ss & ~3);
-                return;
-            }
-            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
-            if ( !compat_access_ok(stkp - 4 - nparm, (4 + nparm) * 4) )
-            {
-                pv_inject_trap(TRAP_gp_fault, regs);
-                return;
-            }
-            push(regs->ss);
-            push(regs->rsp);
-            if ( nparm )
-            {
-                const unsigned int *ustkp;
-
-                if ( !read_descriptor(regs->ss, v, &base, &limit, &ar, 0) ||
-                     ((ar >> 13) & 3) != (regs->cs & 3) ||
-                     !(ar & _SEGMENT_S) ||
-                     (ar & _SEGMENT_CODE) ||
-                     !(ar & _SEGMENT_WR) ||
-                     !check_stack_limit(ar, limit, esp + nparm * 4, nparm * 4) )
-                    return pv_inject_trap(TRAP_gp_fault, regs);
-                ustkp = (unsigned int *)(unsigned long)
-                        ((unsigned int)base + regs->esp + nparm * 4);
-                if ( !compat_access_ok(ustkp - nparm, nparm * 4) )
-                {
-                    pv_inject_trap(TRAP_gp_fault, regs);
-                    return;
-                }
-                do
-                {
-                    unsigned int parm;
-
-                    --ustkp;
-                    rc = __get_user(parm, ustkp);
-                    if ( rc )
-                    {
-                        pv_inject_page_fault(0, (unsigned long)(ustkp + 1) - rc);
-                        return;
-                    }
-                    push(parm);
-                } while ( --nparm );
-            }
-        }
-        else
-        {
-            sel |= (regs->cs & 3);
-            esp = regs->rsp;
-            ss = regs->ss;
-            if ( !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
-                 ((ar >> 13) & 3) != (sel & 3) )
-            {
-                pv_inject_trap(TRAP_gp_fault, regs);
-                return;
-            }
-            if ( !check_stack_limit(ar, limit, esp, 2 * 4) )
-            {
-                pv_inject_hw_exception(TRAP_stack_error, 0);
-                return;
-            }
-            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
-            if ( !compat_access_ok(stkp - 2, 2 * 4) )
-            {
-                pv_inject_trap(TRAP_gp_fault, regs);
-                return;
-            }
-        }
-        push(regs->cs);
-        push(regs->rip + insn_len);
-#undef push
-        regs->rsp = esp;
-        regs->ss = ss;
-    }
-    else
-        sel |= (regs->cs & 3);
-
-    regs->cs = sel;
-    instruction_done(regs, off);
-}
-
 void do_general_protection(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
new file mode 100644
index 0000000000..f41287add7
--- /dev/null
+++ b/xen/include/asm-x86/pv/traps.h
@@ -0,0 +1,54 @@
+/*
+ * pv/traps.h
+ *
+ * PV guest traps interface definitions
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_TRAPS_H__
+#define __X86_PV_TRAPS_H__
+
+#ifdef CONFIG_PV
+
+#include <public/xen.h>
+
+int emulate_privileged_op(struct cpu_user_regs *regs);
+void emulate_gate_op(struct cpu_user_regs *regs);
+int emulate_forced_invalid_op(struct cpu_user_regs *regs);
+int emulate_invalid_rdtscp(struct cpu_user_regs *regs);
+
+#else  /* !CONFIG_PV */
+
+#include <xen/errno.h>
+
+int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
+void emulate_gate_op(struct cpu_user_regs *regs) {}
+int emulate_forced_invalid_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
+int emulate_invalid_rdtscp(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
+
+#endif	/* CONFIG_PV */
+
+#endif	/* __X86_PV_TRAPS_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/18] x86/pv: clean up emulate_ops.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (3 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-11 11:22   ` Andrew Cooper
  2017-05-05 14:48 ` [PATCH v2 06/18] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Replace bool_t with bool.

Change check_stack_limit to return bool.

Fix some coding style issues.

Undef TOGGLE_MODE when it is no longer needed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/emulate_ops.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/pv/emulate_ops.c b/xen/arch/x86/pv/emulate_ops.c
index 5f0965e05b..97c8d14859 100644
--- a/xen/arch/x86/pv/emulate_ops.c
+++ b/xen/arch/x86/pv/emulate_ops.c
@@ -138,7 +138,7 @@ static int read_descriptor(unsigned int sel,
                            unsigned long *base,
                            unsigned long *limit,
                            unsigned int *ar,
-                           bool_t insn_fetch)
+                           bool insn_fetch)
 {
     struct desc_struct desc;
 
@@ -274,7 +274,7 @@ static int priv_op_read_segment(enum x86_segment seg,
 }
 
 /* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
-static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
+static bool iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
 {
     unsigned int cpl = guest_kernel_mode(v, regs) ?
         (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
@@ -318,9 +318,8 @@ static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
 }
 
 /* Has the guest requested sufficient permission for this I/O access? */
-static int guest_io_okay(
-    unsigned int port, unsigned int bytes,
-    struct vcpu *v, struct cpu_user_regs *regs)
+static int guest_io_okay(unsigned int port, unsigned int bytes,
+                         struct vcpu *v, struct cpu_user_regs *regs)
 {
     /* If in user mode, switch to kernel mode just to read I/O bitmap. */
     int user_mode = !(v->arch.flags & TF_kernel_mode);
@@ -353,11 +352,13 @@ static int guest_io_okay(
             return 1;
     }
 
+#undef TOGGLE_MODE
     return 0;
 }
 
 static unsigned int check_guest_io_breakpoint(struct vcpu *v,
-    unsigned int port, unsigned int len)
+                                              unsigned int port,
+                                              unsigned int len)
 {
     unsigned int width, i, match = 0;
     unsigned long start;
@@ -392,8 +393,8 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
 }
 
 /* Has the administrator granted sufficient permission for this I/O access? */
-static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
-                            const struct domain *d)
+static bool admin_io_okay(unsigned int port, unsigned int bytes,
+                          const struct domain *d)
 {
     /*
      * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
@@ -409,8 +410,8 @@ static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
     return ioports_access_permitted(d, port, port + bytes - 1);
 }
 
-static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
-                         unsigned int size, uint32_t *write)
+static bool pci_cfg_ok(struct domain *currd, unsigned int start,
+                       unsigned int size, uint32_t *write)
 {
     uint32_t machine_bdf;
 
@@ -518,7 +519,8 @@ void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
 {
     if ( admin_io_okay(port, bytes, currd) )
     {
-        switch ( bytes ) {
+        switch ( bytes )
+        {
         case 1:
             outb((uint8_t)data, port);
             if ( pv_post_outb_hook )
@@ -1587,8 +1589,8 @@ static int read_gate_descriptor(unsigned int gate_sel,
     return 1;
 }
 
-static inline int check_stack_limit(unsigned int ar, unsigned int limit,
-                                    unsigned int esp, unsigned int decr)
+static inline bool check_stack_limit(unsigned int ar, unsigned int limit,
+                                     unsigned int esp, unsigned int decr)
 {
     return (((esp - decr) < (esp - 1)) &&
             (!(ar & _SEGMENT_EC) ? (esp - 1) <= limit : (esp - decr) > limit));
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/18] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (4 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 05/18] x86/pv: clean up emulate_ops.c Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 07/18] x86/traps: move pv_inject_event " Wei Liu
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

The following handlers are moved:
1. do_set_trap_table
2. do_set_debugreg
3. do_get_debugreg
4. do_fpu_taskswitch

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 94 -----------------------------------------------
 2 files changed, 97 insertions(+), 94 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 51125a8d86..350e7a1da4 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -19,9 +19,13 @@
  * Copyright (c) 2017 Citrix Systems Ltd.
  */
 
+#include <xen/event.h>
+#include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xen/sched.h>
 
 #include <asm/apic.h>
+#include <asm/debugreg.h>
 
 void do_entry_int82(struct cpu_user_regs *regs)
 {
@@ -31,6 +35,99 @@ void do_entry_int82(struct cpu_user_regs *regs)
     pv_hypercall(regs);
 }
 
+long do_fpu_taskswitch(int set)
+{
+    struct vcpu *v = current;
+
+    if ( set )
+    {
+        v->arch.pv_vcpu.ctrlreg[0] |= X86_CR0_TS;
+        stts();
+    }
+    else
+    {
+        v->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS;
+        if ( v->fpu_dirtied )
+            clts();
+    }
+
+    return 0;
+}
+
+long do_set_trap_table(XEN_GUEST_HANDLE_PARAM(const_trap_info_t) traps)
+{
+    struct trap_info cur;
+    struct vcpu *curr = current;
+    struct trap_info *dst = curr->arch.pv_vcpu.trap_ctxt;
+    long rc = 0;
+
+    /* If no table is presented then clear the entire virtual IDT. */
+    if ( guest_handle_is_null(traps) )
+    {
+        memset(dst, 0, NR_VECTORS * sizeof(*dst));
+        init_int80_direct_trap(curr);
+        return 0;
+    }
+
+    for ( ; ; )
+    {
+        if ( copy_from_guest(&cur, traps, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( cur.address == 0 )
+            break;
+
+        if ( !is_canonical_address(cur.address) )
+            return -EINVAL;
+
+        fixup_guest_code_selector(curr->domain, cur.cs);
+
+        memcpy(&dst[cur.vector], &cur, sizeof(cur));
+
+        if ( cur.vector == 0x80 )
+            init_int80_direct_trap(curr);
+
+        guest_handle_add_offset(traps, 1);
+
+        if ( hypercall_preempt_check() )
+        {
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_set_trap_table, "h", traps);
+            break;
+        }
+    }
+
+    return rc;
+}
+
+long do_set_debugreg(int reg, unsigned long value)
+{
+    return set_debugreg(current, reg, value);
+}
+
+unsigned long do_get_debugreg(int reg)
+{
+    struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0 ... 3:
+    case 6:
+        return curr->arch.debugreg[reg];
+    case 7:
+        return (curr->arch.debugreg[7] |
+                curr->arch.debugreg[5]);
+    case 4 ... 5:
+        return ((curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ?
+                curr->arch.debugreg[reg + 2] : 0);
+    }
+
+    return -EINVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index a076de2be4..9e0b321f08 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1474,25 +1474,6 @@ void __init do_early_page_fault(struct cpu_user_regs *regs)
     }
 }
 
-long do_fpu_taskswitch(int set)
-{
-    struct vcpu *v = current;
-
-    if ( set )
-    {
-        v->arch.pv_vcpu.ctrlreg[0] |= X86_CR0_TS;
-        stts();
-    }
-    else
-    {
-        v->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS;
-        if ( v->fpu_dirtied )
-            clts();
-    }
-
-    return 0;
-}
-
 void do_general_protection(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -2111,56 +2092,6 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
     return -EIO;
 }
 
-
-long do_set_trap_table(XEN_GUEST_HANDLE_PARAM(const_trap_info_t) traps)
-{
-    struct trap_info cur;
-    struct vcpu *curr = current;
-    struct trap_info *dst = curr->arch.pv_vcpu.trap_ctxt;
-    long rc = 0;
-
-    /* If no table is presented then clear the entire virtual IDT. */
-    if ( guest_handle_is_null(traps) )
-    {
-        memset(dst, 0, NR_VECTORS * sizeof(*dst));
-        init_int80_direct_trap(curr);
-        return 0;
-    }
-
-    for ( ; ; )
-    {
-        if ( copy_from_guest(&cur, traps, 1) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( cur.address == 0 )
-            break;
-
-        if ( !is_canonical_address(cur.address) )
-            return -EINVAL;
-
-        fixup_guest_code_selector(curr->domain, cur.cs);
-
-        memcpy(&dst[cur.vector], &cur, sizeof(cur));
-
-        if ( cur.vector == 0x80 )
-            init_int80_direct_trap(curr);
-
-        guest_handle_add_offset(traps, 1);
-
-        if ( hypercall_preempt_check() )
-        {
-            rc = hypercall_create_continuation(
-                __HYPERVISOR_set_trap_table, "h", traps);
-            break;
-        }
-    }
-
-    return rc;
-}
-
 void activate_debugregs(const struct vcpu *curr)
 {
     ASSERT(curr == current);
@@ -2284,31 +2215,6 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
     return 0;
 }
 
-long do_set_debugreg(int reg, unsigned long value)
-{
-    return set_debugreg(current, reg, value);
-}
-
-unsigned long do_get_debugreg(int reg)
-{
-    struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0 ... 3:
-    case 6:
-        return curr->arch.debugreg[reg];
-    case 7:
-        return (curr->arch.debugreg[7] |
-                curr->arch.debugreg[5]);
-    case 4 ... 5:
-        return ((curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ?
-                curr->arch.debugreg[reg + 2] : 0);
-    }
-
-    return -EINVAL;
-}
-
 void asm_domain_crash_synchronous(unsigned long addr)
 {
     /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/18] x86/traps: move pv_inject_event to pv/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (5 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 06/18] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 08/18] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 59 ---------------------------------------------
 2 files changed, 63 insertions(+), 59 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 350e7a1da4..9de5798e58 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -23,9 +23,13 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/sched.h>
+#include <xen/lib.h>
+#include <xen/trace.h>
 
 #include <asm/apic.h>
 #include <asm/debugreg.h>
+#include <asm/shared.h>
+#include <asm/traps.h>
 
 void do_entry_int82(struct cpu_user_regs *regs)
 {
@@ -128,6 +132,65 @@ unsigned long do_get_debugreg(int reg)
     return -EINVAL;
 }
 
+void pv_inject_event(const struct x86_event *event)
+{
+    struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct trap_bounce *tb;
+    const struct trap_info *ti;
+    const uint8_t vector = event->vector;
+    const bool use_error_code =
+        ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
+    unsigned int error_code = event->error_code;
+
+    ASSERT(vector == event->vector); /* Confirm no truncation. */
+    if ( use_error_code )
+        ASSERT(error_code != X86_EVENT_NO_EC);
+    else
+        ASSERT(error_code == X86_EVENT_NO_EC);
+
+    tb = &v->arch.pv_vcpu.trap_bounce;
+    ti = &v->arch.pv_vcpu.trap_ctxt[vector];
+
+    tb->flags = TBF_EXCEPTION;
+    tb->cs    = ti->cs;
+    tb->eip   = ti->address;
+
+    if ( vector == TRAP_page_fault )
+    {
+        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
+        arch_set_cr2(v, event->cr2);
+
+        /* Re-set error_code.user flag appropriately for the guest. */
+        error_code &= ~PFEC_user_mode;
+        if ( !guest_kernel_mode(v, regs) )
+            error_code |= PFEC_user_mode;
+
+        trace_pv_page_fault(event->cr2, error_code);
+    }
+    else
+        trace_pv_trap(vector, regs->rip, use_error_code, error_code);
+
+    if ( use_error_code )
+    {
+        tb->flags |= TBF_EXCEPTION_ERRCODE;
+        tb->error_code = error_code;
+    }
+
+    if ( TI_GET_IF(ti) )
+        tb->flags |= TBF_INTERRUPT;
+
+    if ( unlikely(null_trap_bounce(v, tb)) )
+    {
+        gprintk(XENLOG_WARNING,
+                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
+                trapstr(vector), vector, error_code);
+
+        if ( vector == TRAP_page_fault )
+            show_page_walk(event->cr2);
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 9e0b321f08..c38a8a1171 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -626,65 +626,6 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
           (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-void pv_inject_event(const struct x86_event *event)
-{
-    struct vcpu *v = current;
-    struct cpu_user_regs *regs = guest_cpu_user_regs();
-    struct trap_bounce *tb;
-    const struct trap_info *ti;
-    const uint8_t vector = event->vector;
-    const bool use_error_code =
-        ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
-    unsigned int error_code = event->error_code;
-
-    ASSERT(vector == event->vector); /* Confirm no truncation. */
-    if ( use_error_code )
-        ASSERT(error_code != X86_EVENT_NO_EC);
-    else
-        ASSERT(error_code == X86_EVENT_NO_EC);
-
-    tb = &v->arch.pv_vcpu.trap_bounce;
-    ti = &v->arch.pv_vcpu.trap_ctxt[vector];
-
-    tb->flags = TBF_EXCEPTION;
-    tb->cs    = ti->cs;
-    tb->eip   = ti->address;
-
-    if ( vector == TRAP_page_fault )
-    {
-        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
-        arch_set_cr2(v, event->cr2);
-
-        /* Re-set error_code.user flag appropriately for the guest. */
-        error_code &= ~PFEC_user_mode;
-        if ( !guest_kernel_mode(v, regs) )
-            error_code |= PFEC_user_mode;
-
-        trace_pv_page_fault(event->cr2, error_code);
-    }
-    else
-        trace_pv_trap(vector, regs->rip, use_error_code, error_code);
-
-    if ( use_error_code )
-    {
-        tb->flags |= TBF_EXCEPTION_ERRCODE;
-        tb->error_code = error_code;
-    }
-
-    if ( TI_GET_IF(ti) )
-        tb->flags |= TBF_INTERRUPT;
-
-    if ( unlikely(null_trap_bounce(v, tb)) )
-    {
-        gprintk(XENLOG_WARNING,
-                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
-                trapstr(vector), vector, error_code);
-
-        if ( vector == TRAP_page_fault )
-            show_page_walk(event->cr2);
-    }
-}
-
 /*
  * Called from asm to set up the MCE trapbounce info.
  * Returns 0 if no callback is set up, else 1.
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/18] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (6 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 07/18] x86/traps: move pv_inject_event " Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 09/18] x86/traps: move {un, }register_guest_nmi_callback Wei Liu
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 27 +++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 27 ---------------------------
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 9de5798e58..e96123c71f 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -191,6 +191,33 @@ void pv_inject_event(const struct x86_event *event)
     }
 }
 
+/*
+ * Called from asm to set up the MCE trapbounce info.
+ * Returns 0 if no callback is set up, else 1.
+ */
+int set_guest_machinecheck_trapbounce(void)
+{
+    struct vcpu *v = current;
+    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
+
+    pv_inject_trap(TRAP_machine_check, guest_cpu_user_regs());
+    tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
+    return !null_trap_bounce(v, tb);
+}
+
+/*
+ * Called from asm to set up the NMI trapbounce info.
+ * Returns 0 if no callback is set up, else 1.
+ */
+int set_guest_nmi_trapbounce(void)
+{
+    struct vcpu *v = current;
+    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
+    pv_inject_trap(TRAP_nmi, guest_cpu_user_regs());
+    tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
+    return !null_trap_bounce(v, tb);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index c38a8a1171..e711810388 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -626,33 +626,6 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
           (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-/*
- * Called from asm to set up the MCE trapbounce info.
- * Returns 0 if no callback is set up, else 1.
- */
-int set_guest_machinecheck_trapbounce(void)
-{
-    struct vcpu *v = current;
-    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
- 
-    pv_inject_trap(TRAP_machine_check, guest_cpu_user_regs());
-    tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
-    return !null_trap_bounce(v, tb);
-}
-
-/*
- * Called from asm to set up the NMI trapbounce info.
- * Returns 0 if no callback is set up, else 1.
- */
-int set_guest_nmi_trapbounce(void)
-{
-    struct vcpu *v = current;
-    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
-    pv_inject_trap(TRAP_nmi, guest_cpu_user_regs());
-    tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
-    return !null_trap_bounce(v, tb);
-}
-
 void do_reserved_trap(struct cpu_user_regs *regs)
 {
     unsigned int trapnr = regs->entry_vector;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/18] x86/traps: move {un, }register_guest_nmi_callback
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (7 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 08/18] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 10/18] x86/traps: delcare percpu softirq_trap Wei Liu
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 37 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 36 ------------------------------------
 2 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index e96123c71f..54d77922c5 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -28,6 +28,7 @@
 
 #include <asm/apic.h>
 #include <asm/debugreg.h>
+#include <asm/nmi.h>
 #include <asm/shared.h>
 #include <asm/traps.h>
 
@@ -218,6 +219,42 @@ int set_guest_nmi_trapbounce(void)
     return !null_trap_bounce(v, tb);
 }
 
+long register_guest_nmi_callback(unsigned long address)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
+
+    if ( !is_canonical_address(address) )
+        return -EINVAL;
+
+    t->vector  = TRAP_nmi;
+    t->flags   = 0;
+    t->cs      = (is_pv_32bit_domain(d) ?
+                  FLAT_COMPAT_KERNEL_CS : FLAT_KERNEL_CS);
+    t->address = address;
+    TI_SET_IF(t, 1);
+
+    /*
+     * If no handler was registered we can 'lose the NMI edge'. Re-assert it
+     * now.
+     */
+    if ( (v->vcpu_id == 0) && (arch_get_nmi_reason(d) != 0) )
+        v->nmi_pending = 1;
+
+    return 0;
+}
+
+long unregister_guest_nmi_callback(void)
+{
+    struct vcpu *v = current;
+    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
+
+    memset(t, 0, sizeof(*t));
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e711810388..24343f46dc 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1905,42 +1905,6 @@ void __init trap_init(void)
     open_softirq(PCI_SERR_SOFTIRQ, pci_serr_softirq);
 }
 
-long register_guest_nmi_callback(unsigned long address)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
-
-    if ( !is_canonical_address(address) )
-        return -EINVAL;
-
-    t->vector  = TRAP_nmi;
-    t->flags   = 0;
-    t->cs      = (is_pv_32bit_domain(d) ?
-                  FLAT_COMPAT_KERNEL_CS : FLAT_KERNEL_CS);
-    t->address = address;
-    TI_SET_IF(t, 1);
-
-    /*
-     * If no handler was registered we can 'lose the NMI edge'. Re-assert it
-     * now.
-     */
-    if ( (v->vcpu_id == 0) && (arch_get_nmi_reason(d) != 0) )
-        v->nmi_pending = 1;
-
-    return 0;
-}
-
-long unregister_guest_nmi_callback(void)
-{
-    struct vcpu *v = current;
-    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
-
-    memset(t, 0, sizeof(*t));
-
-    return 0;
-}
-
 int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
 {
     struct vcpu *v;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/18] x86/traps: delcare percpu softirq_trap
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (8 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 09/18] x86/traps: move {un, }register_guest_nmi_callback Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 11/18] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

It needs to be non-static when we split PV specific code out.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c        | 2 +-
 xen/include/asm-x86/traps.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 24343f46dc..ecc767dca1 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1473,7 +1473,7 @@ void do_general_protection(struct cpu_user_regs *regs)
     panic("GENERAL PROTECTION FAULT\n[error_code=%04x]", regs->error_code);
 }
 
-static DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
+DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
 
 static void nmi_mce_softirq(void)
 {
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index f1d2513e6b..4e8760482f 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -24,6 +24,7 @@ struct softirq_trap {
 	struct vcpu *vcpu;	/* vcpu to inject trap */
 	int processor;		/* physical cpu to inject trap */
 };
+DECLARE_PER_CPU(struct softirq_trap, softirq_trap);
 
 struct cpu_user_regs;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 11/18] x86/traps: move guest_has_trap_callback to pv/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (9 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 10/18] x86/traps: delcare percpu softirq_trap Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 12/18] x86/traps: move send_guest_trap " Wei Liu
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 18 ++++++++++++++++++
 xen/arch/x86/traps.c    | 18 ------------------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 54d77922c5..4e9a376000 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -255,6 +255,24 @@ long unregister_guest_nmi_callback(void)
     return 0;
 }
 
+int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
+                            unsigned int trap_nr)
+{
+    struct vcpu *v;
+    struct trap_info *t;
+
+    BUG_ON(d == NULL);
+    BUG_ON(vcpuid >= d->max_vcpus);
+
+    /* Sanity check - XXX should be more fine grained. */
+    BUG_ON(trap_nr >= NR_VECTORS);
+
+    v = d->vcpu[vcpuid];
+    t = &v->arch.pv_vcpu.trap_ctxt[trap_nr];
+
+    return (t->address != 0);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index ecc767dca1..2517b10ddf 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1905,24 +1905,6 @@ void __init trap_init(void)
     open_softirq(PCI_SERR_SOFTIRQ, pci_serr_softirq);
 }
 
-int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
-{
-    struct vcpu *v;
-    struct trap_info *t;
-
-    BUG_ON(d == NULL);
-    BUG_ON(vcpuid >= d->max_vcpus);
-
-    /* Sanity check - XXX should be more fine grained. */
-    BUG_ON(trap_nr >= NR_VECTORS);
-
-    v = d->vcpu[vcpuid];
-    t = &v->arch.pv_vcpu.trap_ctxt[trap_nr];
-
-    return (t->address != 0);
-}
-
-
 int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
 {
     struct vcpu *v;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 12/18] x86/traps: move send_guest_trap to pv/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (10 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 11/18] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 13/18] x86/traps: move PV specific code in x86_64/traps.c Wei Liu
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Fixed some coding style issues while moving.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 47 ----------------------------------------------
 2 files changed, 50 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 4e9a376000..f2627b4215 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -273,6 +273,56 @@ int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
     return (t->address != 0);
 }
 
+int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
+{
+    struct vcpu *v;
+    struct softirq_trap *st = &per_cpu(softirq_trap, smp_processor_id());
+
+    BUG_ON(d == NULL);
+    BUG_ON(vcpuid >= d->max_vcpus);
+    v = d->vcpu[vcpuid];
+
+    switch (trap_nr)
+    {
+    case TRAP_nmi:
+        if ( cmpxchgptr(&st->vcpu, NULL, v) )
+            return -EBUSY;
+        if ( !test_and_set_bool(v->nmi_pending) )
+        {
+               st->domain = d;
+               st->processor = v->processor;
+
+               /* not safe to wake up a vcpu here */
+               raise_softirq(NMI_MCE_SOFTIRQ);
+               return 0;
+        }
+        st->vcpu = NULL;
+        break;
+
+    case TRAP_machine_check:
+        if ( cmpxchgptr(&st->vcpu, NULL, v) )
+            return -EBUSY;
+
+        /* We are called by the machine check (exception or polling) handlers
+         * on the physical CPU that reported a machine check error. */
+
+        if ( !test_and_set_bool(v->mce_pending) )
+        {
+                st->domain = d;
+                st->processor = v->processor;
+
+                /* not safe to wake up a vcpu here */
+                raise_softirq(NMI_MCE_SOFTIRQ);
+                return 0;
+        }
+        st->vcpu = NULL;
+        break;
+    }
+
+    /* delivery failed */
+    return -EIO;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2517b10ddf..c0d5b89b13 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1905,53 +1905,6 @@ void __init trap_init(void)
     open_softirq(PCI_SERR_SOFTIRQ, pci_serr_softirq);
 }
 
-int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
-{
-    struct vcpu *v;
-    struct softirq_trap *st = &per_cpu(softirq_trap, smp_processor_id());
-
-    BUG_ON(d == NULL);
-    BUG_ON(vcpuid >= d->max_vcpus);
-    v = d->vcpu[vcpuid];
-
-    switch (trap_nr) {
-    case TRAP_nmi:
-        if ( cmpxchgptr(&st->vcpu, NULL, v) )
-            return -EBUSY;
-        if ( !test_and_set_bool(v->nmi_pending) ) {
-               st->domain = d;
-               st->processor = v->processor;
-
-               /* not safe to wake up a vcpu here */
-               raise_softirq(NMI_MCE_SOFTIRQ);
-               return 0;
-        }
-        st->vcpu = NULL;
-        break;
-
-    case TRAP_machine_check:
-        if ( cmpxchgptr(&st->vcpu, NULL, v) )
-            return -EBUSY;
-
-        /* We are called by the machine check (exception or polling) handlers
-         * on the physical CPU that reported a machine check error. */
-
-        if ( !test_and_set_bool(v->mce_pending) ) {
-                st->domain = d;
-                st->processor = v->processor;
-
-                /* not safe to wake up a vcpu here */
-                raise_softirq(NMI_MCE_SOFTIRQ);
-                return 0;
-        }
-        st->vcpu = NULL;
-        break;
-    }
-
-    /* delivery failed */
-    return -EIO;
-}
-
 void activate_debugregs(const struct vcpu *curr)
 {
     ASSERT(curr == current);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 13/18] x86/traps: move PV specific code in x86_64/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (11 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 12/18] x86/traps: move send_guest_trap " Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 14/18] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Move them to pv/traps.c.

This in turn requires exporting pv_percpu_traps_init and
hypercall_page_initialise_ring3_kernel.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c         | 363 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/traps.c     | 363 +---------------------------------------
 xen/include/asm-x86/pv/domain.h |   5 +
 xen/include/asm-x86/pv/traps.h  |   4 +
 4 files changed, 374 insertions(+), 361 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index f2627b4215..178ac2773c 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -32,6 +32,8 @@
 #include <asm/shared.h>
 #include <asm/traps.h>
 
+#include <public/callback.h>
+
 void do_entry_int82(struct cpu_user_regs *regs)
 {
     if ( unlikely(untrusted_msi) )
@@ -323,6 +325,367 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
     return -EIO;
 }
 
+void toggle_guest_mode(struct vcpu *v)
+{
+    if ( is_pv_32bit_vcpu(v) )
+        return;
+    if ( cpu_has_fsgsbase )
+    {
+        if ( v->arch.flags & TF_kernel_mode )
+            v->arch.pv_vcpu.gs_base_kernel = __rdgsbase();
+        else
+            v->arch.pv_vcpu.gs_base_user = __rdgsbase();
+    }
+    v->arch.flags ^= TF_kernel_mode;
+    asm volatile ( "swapgs" );
+    update_cr3(v);
+    /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
+    asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
+
+    if ( !(v->arch.flags & TF_kernel_mode) )
+        return;
+
+    if ( v->arch.pv_vcpu.need_update_runstate_area &&
+         update_runstate_area(v) )
+        v->arch.pv_vcpu.need_update_runstate_area = 0;
+
+    if ( v->arch.pv_vcpu.pending_system_time.version &&
+         update_secondary_system_time(v,
+                                      &v->arch.pv_vcpu.pending_system_time) )
+        v->arch.pv_vcpu.pending_system_time.version = 0;
+}
+
+unsigned long do_iret(void)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct iret_context iret_saved;
+    struct vcpu *v = current;
+
+    if ( unlikely(copy_from_user(&iret_saved, (void *)regs->rsp,
+                                 sizeof(iret_saved))) )
+    {
+        gprintk(XENLOG_ERR,
+                "Fault while reading IRET context from guest stack\n");
+        goto exit_and_crash;
+    }
+
+    /* Returning to user mode? */
+    if ( (iret_saved.cs & 3) == 3 )
+    {
+        if ( unlikely(pagetable_is_null(v->arch.guest_table_user)) )
+        {
+            gprintk(XENLOG_ERR,
+                    "Guest switching to user mode with no user page tables\n");
+            goto exit_and_crash;
+        }
+        toggle_guest_mode(v);
+    }
+
+    if ( VM_ASSIST(v->domain, architectural_iopl) )
+        v->arch.pv_vcpu.iopl = iret_saved.rflags & X86_EFLAGS_IOPL;
+
+    regs->rip    = iret_saved.rip;
+    regs->cs     = iret_saved.cs | 3; /* force guest privilege */
+    regs->rflags = ((iret_saved.rflags & ~(X86_EFLAGS_IOPL|X86_EFLAGS_VM))
+                    | X86_EFLAGS_IF);
+    regs->rsp    = iret_saved.rsp;
+    regs->ss     = iret_saved.ss | 3; /* force guest privilege */
+
+    if ( !(iret_saved.flags & VGCF_in_syscall) )
+    {
+        regs->entry_vector &= ~TRAP_syscall;
+        regs->r11 = iret_saved.r11;
+        regs->rcx = iret_saved.rcx;
+    }
+
+    /* Restore upcall mask from supplied EFLAGS.IF. */
+    vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & X86_EFLAGS_IF);
+
+    async_exception_cleanup(v);
+
+    /* Saved %rax gets written back to regs->rax in entry.S. */
+    return iret_saved.rax;
+
+ exit_and_crash:
+    domain_crash(v->domain);
+    return 0;
+}
+
+static unsigned int write_stub_trampoline(
+    unsigned char *stub, unsigned long stub_va,
+    unsigned long stack_bottom, unsigned long target_va)
+{
+    /* movabsq %rax, stack_bottom - 8 */
+    stub[0] = 0x48;
+    stub[1] = 0xa3;
+    *(uint64_t *)&stub[2] = stack_bottom - 8;
+
+    /* movq %rsp, %rax */
+    stub[10] = 0x48;
+    stub[11] = 0x89;
+    stub[12] = 0xe0;
+
+    /* movabsq $stack_bottom - 8, %rsp */
+    stub[13] = 0x48;
+    stub[14] = 0xbc;
+    *(uint64_t *)&stub[15] = stack_bottom - 8;
+
+    /* pushq %rax */
+    stub[23] = 0x50;
+
+    /* jmp target_va */
+    stub[24] = 0xe9;
+    *(int32_t *)&stub[25] = target_va - (stub_va + 29);
+
+    /* Round up to a multiple of 16 bytes. */
+    return 32;
+}
+
+DEFINE_PER_CPU(struct stubs, stubs);
+void lstar_enter(void);
+void cstar_enter(void);
+
+void pv_percpu_traps_init(void)
+{
+    unsigned long stack_bottom = get_stack_bottom();
+    unsigned long stub_va = this_cpu(stubs.addr);
+    unsigned char *stub_page;
+    unsigned int offset;
+
+    stub_page = map_domain_page(_mfn(this_cpu(stubs.mfn)));
+
+    /*
+     * Trampoline for SYSCALL entry from 64-bit mode.  The VT-x HVM vcpu
+     * context switch logic relies on the SYSCALL trampoline being at the
+     * start of the stubs.
+     */
+    wrmsrl(MSR_LSTAR, stub_va);
+    offset = write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
+                                   stub_va, stack_bottom,
+                                   (unsigned long)lstar_enter);
+    stub_va += offset;
+
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
+         boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR )
+    {
+        /* SYSENTER entry. */
+        wrmsrl(MSR_IA32_SYSENTER_ESP, stack_bottom);
+        wrmsrl(MSR_IA32_SYSENTER_EIP, (unsigned long)sysenter_entry);
+        wrmsr(MSR_IA32_SYSENTER_CS, __HYPERVISOR_CS, 0);
+    }
+
+    /* Trampoline for SYSCALL entry from compatibility mode. */
+    wrmsrl(MSR_CSTAR, stub_va);
+    offset += write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
+                                    stub_va, stack_bottom,
+                                    (unsigned long)cstar_enter);
+
+    /* Don't consume more than half of the stub space here. */
+    ASSERT(offset <= STUB_BUF_SIZE / 2);
+
+    unmap_domain_page(stub_page);
+
+    /* Common SYSCALL parameters. */
+    wrmsrl(MSR_STAR, XEN_MSR_STAR);
+    wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
+}
+
+void init_int80_direct_trap(struct vcpu *v)
+{
+    struct trap_info *ti = &v->arch.pv_vcpu.trap_ctxt[0x80];
+    struct trap_bounce *tb = &v->arch.pv_vcpu.int80_bounce;
+
+    tb->flags = TBF_EXCEPTION;
+    tb->cs    = ti->cs;
+    tb->eip   = ti->address;
+
+    if ( null_trap_bounce(v, tb) )
+        tb->flags = 0;
+}
+
+static long register_guest_callback(struct callback_register *reg)
+{
+    long ret = 0;
+    struct vcpu *v = current;
+
+    if ( !is_canonical_address(reg->address) )
+        return -EINVAL;
+
+    switch ( reg->type )
+    {
+    case CALLBACKTYPE_event:
+        v->arch.pv_vcpu.event_callback_eip    = reg->address;
+        break;
+
+    case CALLBACKTYPE_failsafe:
+        v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
+        if ( reg->flags & CALLBACKF_mask_events )
+            set_bit(_VGCF_failsafe_disables_events,
+                    &v->arch.vgc_flags);
+        else
+            clear_bit(_VGCF_failsafe_disables_events,
+                      &v->arch.vgc_flags);
+        break;
+
+    case CALLBACKTYPE_syscall:
+        v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
+        if ( reg->flags & CALLBACKF_mask_events )
+            set_bit(_VGCF_syscall_disables_events,
+                    &v->arch.vgc_flags);
+        else
+            clear_bit(_VGCF_syscall_disables_events,
+                      &v->arch.vgc_flags);
+        break;
+
+    case CALLBACKTYPE_syscall32:
+        v->arch.pv_vcpu.syscall32_callback_eip = reg->address;
+        v->arch.pv_vcpu.syscall32_disables_events =
+            !!(reg->flags & CALLBACKF_mask_events);
+        break;
+
+    case CALLBACKTYPE_sysenter:
+        v->arch.pv_vcpu.sysenter_callback_eip = reg->address;
+        v->arch.pv_vcpu.sysenter_disables_events =
+            !!(reg->flags & CALLBACKF_mask_events);
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = register_guest_nmi_callback(reg->address);
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+static long unregister_guest_callback(struct callback_unregister *unreg)
+{
+    long ret;
+
+    switch ( unreg->type )
+    {
+    case CALLBACKTYPE_event:
+    case CALLBACKTYPE_failsafe:
+    case CALLBACKTYPE_syscall:
+    case CALLBACKTYPE_syscall32:
+    case CALLBACKTYPE_sysenter:
+        ret = -EINVAL;
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = unregister_guest_nmi_callback();
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+
+long do_callback_op(int cmd, XEN_GUEST_HANDLE_PARAM(const_void) arg)
+{
+    long ret;
+
+    switch ( cmd )
+    {
+    case CALLBACKOP_register:
+    {
+        struct callback_register reg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&reg, arg, 1) )
+            break;
+
+        ret = register_guest_callback(&reg);
+    }
+    break;
+
+    case CALLBACKOP_unregister:
+    {
+        struct callback_unregister unreg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&unreg, arg, 1) )
+            break;
+
+        ret = unregister_guest_callback(&unreg);
+    }
+    break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+long do_set_callbacks(unsigned long event_address,
+                      unsigned long failsafe_address,
+                      unsigned long syscall_address)
+{
+    struct callback_register event = {
+        .type = CALLBACKTYPE_event,
+        .address = event_address,
+    };
+    struct callback_register failsafe = {
+        .type = CALLBACKTYPE_failsafe,
+        .address = failsafe_address,
+    };
+    struct callback_register syscall = {
+        .type = CALLBACKTYPE_syscall,
+        .address = syscall_address,
+    };
+
+    register_guest_callback(&event);
+    register_guest_callback(&failsafe);
+    register_guest_callback(&syscall);
+
+    return 0;
+}
+
+void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    /* Fill in all the transfer points with template machine code. */
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        if ( i == __HYPERVISOR_iret )
+            continue;
+
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
+        *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
+        *(u8  *)(p+ 3) = 0xb8;    /* mov  $<i>,%eax */
+        *(u32 *)(p+ 4) = i;
+        *(u16 *)(p+ 8) = 0x050f;  /* syscall */
+        *(u16 *)(p+10) = 0x5b41;  /* pop  %r11 */
+        *(u8  *)(p+12) = 0x59;    /* pop  %rcx */
+        *(u8  *)(p+13) = 0xc3;    /* ret */
+    }
+
+    /*
+     * HYPERVISOR_iret is special because it doesn't return and expects a
+     * special stack frame. Guests jump at this transfer point instead of
+     * calling it.
+     */
+    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
+    *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
+    *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
+    *(u8  *)(p+ 3) = 0x50;    /* push %rax */
+    *(u8  *)(p+ 4) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
+    *(u32 *)(p+ 5) = __HYPERVISOR_iret;
+    *(u16 *)(p+ 9) = 0x050f;  /* syscall */
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index a237f4d5c2..2027a6a4ae 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -23,6 +23,8 @@
 #include <asm/shared.h>
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
+#include <asm/pv/domain.h>
+#include <asm/pv/traps.h>
 #include <public/callback.h>
 
 
@@ -254,171 +256,6 @@ void do_double_fault(struct cpu_user_regs *regs)
     panic("DOUBLE FAULT -- system shutdown");
 }
 
-void toggle_guest_mode(struct vcpu *v)
-{
-    if ( is_pv_32bit_vcpu(v) )
-        return;
-    if ( cpu_has_fsgsbase )
-    {
-        if ( v->arch.flags & TF_kernel_mode )
-            v->arch.pv_vcpu.gs_base_kernel = __rdgsbase();
-        else
-            v->arch.pv_vcpu.gs_base_user = __rdgsbase();
-    }
-    v->arch.flags ^= TF_kernel_mode;
-    asm volatile ( "swapgs" );
-    update_cr3(v);
-    /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
-    asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
-
-    if ( !(v->arch.flags & TF_kernel_mode) )
-        return;
-
-    if ( v->arch.pv_vcpu.need_update_runstate_area &&
-         update_runstate_area(v) )
-        v->arch.pv_vcpu.need_update_runstate_area = 0;
-
-    if ( v->arch.pv_vcpu.pending_system_time.version &&
-         update_secondary_system_time(v,
-                                      &v->arch.pv_vcpu.pending_system_time) )
-        v->arch.pv_vcpu.pending_system_time.version = 0;
-}
-
-unsigned long do_iret(void)
-{
-    struct cpu_user_regs *regs = guest_cpu_user_regs();
-    struct iret_context iret_saved;
-    struct vcpu *v = current;
-
-    if ( unlikely(copy_from_user(&iret_saved, (void *)regs->rsp,
-                                 sizeof(iret_saved))) )
-    {
-        gprintk(XENLOG_ERR,
-                "Fault while reading IRET context from guest stack\n");
-        goto exit_and_crash;
-    }
-
-    /* Returning to user mode? */
-    if ( (iret_saved.cs & 3) == 3 )
-    {
-        if ( unlikely(pagetable_is_null(v->arch.guest_table_user)) )
-        {
-            gprintk(XENLOG_ERR,
-                    "Guest switching to user mode with no user page tables\n");
-            goto exit_and_crash;
-        }
-        toggle_guest_mode(v);
-    }
-
-    if ( VM_ASSIST(v->domain, architectural_iopl) )
-        v->arch.pv_vcpu.iopl = iret_saved.rflags & X86_EFLAGS_IOPL;
-
-    regs->rip    = iret_saved.rip;
-    regs->cs     = iret_saved.cs | 3; /* force guest privilege */
-    regs->rflags = ((iret_saved.rflags & ~(X86_EFLAGS_IOPL|X86_EFLAGS_VM))
-                    | X86_EFLAGS_IF);
-    regs->rsp    = iret_saved.rsp;
-    regs->ss     = iret_saved.ss | 3; /* force guest privilege */
-
-    if ( !(iret_saved.flags & VGCF_in_syscall) )
-    {
-        regs->entry_vector &= ~TRAP_syscall;
-        regs->r11 = iret_saved.r11;
-        regs->rcx = iret_saved.rcx;
-    }
-
-    /* Restore upcall mask from supplied EFLAGS.IF. */
-    vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & X86_EFLAGS_IF);
-
-    async_exception_cleanup(v);
-
-    /* Saved %rax gets written back to regs->rax in entry.S. */
-    return iret_saved.rax;
-
- exit_and_crash:
-    domain_crash(v->domain);
-    return 0;
-}
-
-static unsigned int write_stub_trampoline(
-    unsigned char *stub, unsigned long stub_va,
-    unsigned long stack_bottom, unsigned long target_va)
-{
-    /* movabsq %rax, stack_bottom - 8 */
-    stub[0] = 0x48;
-    stub[1] = 0xa3;
-    *(uint64_t *)&stub[2] = stack_bottom - 8;
-
-    /* movq %rsp, %rax */
-    stub[10] = 0x48;
-    stub[11] = 0x89;
-    stub[12] = 0xe0;
-
-    /* movabsq $stack_bottom - 8, %rsp */
-    stub[13] = 0x48;
-    stub[14] = 0xbc;
-    *(uint64_t *)&stub[15] = stack_bottom - 8;
-
-    /* pushq %rax */
-    stub[23] = 0x50;
-
-    /* jmp target_va */
-    stub[24] = 0xe9;
-    *(int32_t *)&stub[25] = target_va - (stub_va + 29);
-
-    /* Round up to a multiple of 16 bytes. */
-    return 32;
-}
-
-DEFINE_PER_CPU(struct stubs, stubs);
-void lstar_enter(void);
-void cstar_enter(void);
-
-static void pv_percpu_traps_init(void)
-{
-    unsigned long stack_bottom = get_stack_bottom();
-    unsigned long stub_va = this_cpu(stubs.addr);
-    unsigned char *stub_page;
-    unsigned int offset;
-
-    stub_page = map_domain_page(_mfn(this_cpu(stubs.mfn)));
-
-    /*
-     * Trampoline for SYSCALL entry from 64-bit mode.  The VT-x HVM vcpu
-     * context switch logic relies on the SYSCALL trampoline being at the
-     * start of the stubs.
-     */
-    wrmsrl(MSR_LSTAR, stub_va);
-    offset = write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
-                                   stub_va, stack_bottom,
-                                   (unsigned long)lstar_enter);
-    stub_va += offset;
-
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-         boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR )
-    {
-        /* SYSENTER entry. */
-        wrmsrl(MSR_IA32_SYSENTER_ESP, stack_bottom);
-        wrmsrl(MSR_IA32_SYSENTER_EIP, (unsigned long)sysenter_entry);
-        wrmsr(MSR_IA32_SYSENTER_CS, __HYPERVISOR_CS, 0);
-    }
-
-    /* Trampoline for SYSCALL entry from compatibility mode. */
-    wrmsrl(MSR_CSTAR, stub_va);
-    offset += write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
-                                    stub_va, stack_bottom,
-                                    (unsigned long)cstar_enter);
-
-    /* Don't consume more than half of the stub space here. */
-    ASSERT(offset <= STUB_BUF_SIZE / 2);
-
-    unmap_domain_page(stub_page);
-
-    /* Common SYSCALL parameters. */
-    wrmsrl(MSR_STAR, XEN_MSR_STAR);
-    wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
-}
-
 void subarch_percpu_traps_init(void)
 {
     /* IST_MAX IST pages + 1 syscall page + 1 guard page + primary stack. */
@@ -427,202 +264,6 @@ void subarch_percpu_traps_init(void)
     pv_percpu_traps_init();
 }
 
-void init_int80_direct_trap(struct vcpu *v)
-{
-    struct trap_info *ti = &v->arch.pv_vcpu.trap_ctxt[0x80];
-    struct trap_bounce *tb = &v->arch.pv_vcpu.int80_bounce;
-
-    tb->flags = TBF_EXCEPTION;
-    tb->cs    = ti->cs;
-    tb->eip   = ti->address;
-
-    if ( null_trap_bounce(v, tb) )
-        tb->flags = 0;
-}
-
-static long register_guest_callback(struct callback_register *reg)
-{
-    long ret = 0;
-    struct vcpu *v = current;
-
-    if ( !is_canonical_address(reg->address) )
-        return -EINVAL;
-
-    switch ( reg->type )
-    {
-    case CALLBACKTYPE_event:
-        v->arch.pv_vcpu.event_callback_eip    = reg->address;
-        break;
-
-    case CALLBACKTYPE_failsafe:
-        v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
-        if ( reg->flags & CALLBACKF_mask_events )
-            set_bit(_VGCF_failsafe_disables_events,
-                    &v->arch.vgc_flags);
-        else
-            clear_bit(_VGCF_failsafe_disables_events,
-                      &v->arch.vgc_flags);
-        break;
-
-    case CALLBACKTYPE_syscall:
-        v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
-        if ( reg->flags & CALLBACKF_mask_events )
-            set_bit(_VGCF_syscall_disables_events,
-                    &v->arch.vgc_flags);
-        else
-            clear_bit(_VGCF_syscall_disables_events,
-                      &v->arch.vgc_flags);
-        break;
-
-    case CALLBACKTYPE_syscall32:
-        v->arch.pv_vcpu.syscall32_callback_eip = reg->address;
-        v->arch.pv_vcpu.syscall32_disables_events =
-            !!(reg->flags & CALLBACKF_mask_events);
-        break;
-
-    case CALLBACKTYPE_sysenter:
-        v->arch.pv_vcpu.sysenter_callback_eip = reg->address;
-        v->arch.pv_vcpu.sysenter_disables_events =
-            !!(reg->flags & CALLBACKF_mask_events);
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = register_guest_nmi_callback(reg->address);
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-static long unregister_guest_callback(struct callback_unregister *unreg)
-{
-    long ret;
-
-    switch ( unreg->type )
-    {
-    case CALLBACKTYPE_event:
-    case CALLBACKTYPE_failsafe:
-    case CALLBACKTYPE_syscall:
-    case CALLBACKTYPE_syscall32:
-    case CALLBACKTYPE_sysenter:
-        ret = -EINVAL;
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = unregister_guest_nmi_callback();
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-
-long do_callback_op(int cmd, XEN_GUEST_HANDLE_PARAM(const_void) arg)
-{
-    long ret;
-
-    switch ( cmd )
-    {
-    case CALLBACKOP_register:
-    {
-        struct callback_register reg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&reg, arg, 1) )
-            break;
-
-        ret = register_guest_callback(&reg);
-    }
-    break;
-
-    case CALLBACKOP_unregister:
-    {
-        struct callback_unregister unreg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&unreg, arg, 1) )
-            break;
-
-        ret = unregister_guest_callback(&unreg);
-    }
-    break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-long do_set_callbacks(unsigned long event_address,
-                      unsigned long failsafe_address,
-                      unsigned long syscall_address)
-{
-    struct callback_register event = {
-        .type = CALLBACKTYPE_event,
-        .address = event_address,
-    };
-    struct callback_register failsafe = {
-        .type = CALLBACKTYPE_failsafe,
-        .address = failsafe_address,
-    };
-    struct callback_register syscall = {
-        .type = CALLBACKTYPE_syscall,
-        .address = syscall_address,
-    };
-
-    register_guest_callback(&event);
-    register_guest_callback(&failsafe);
-    register_guest_callback(&syscall);
-
-    return 0;
-}
-
-static void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
-{
-    char *p;
-    int i;
-
-    /* Fill in all the transfer points with template machine code. */
-    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
-    {
-        if ( i == __HYPERVISOR_iret )
-            continue;
-
-        p = (char *)(hypercall_page + (i * 32));
-        *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
-        *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
-        *(u8  *)(p+ 3) = 0xb8;    /* mov  $<i>,%eax */
-        *(u32 *)(p+ 4) = i;
-        *(u16 *)(p+ 8) = 0x050f;  /* syscall */
-        *(u16 *)(p+10) = 0x5b41;  /* pop  %r11 */
-        *(u8  *)(p+12) = 0x59;    /* pop  %rcx */
-        *(u8  *)(p+13) = 0xc3;    /* ret */
-    }
-
-    /*
-     * HYPERVISOR_iret is special because it doesn't return and expects a
-     * special stack frame. Guests jump at this transfer point instead of
-     * calling it.
-     */
-    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
-    *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
-    *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
-    *(u8  *)(p+ 3) = 0x50;    /* push %rax */
-    *(u8  *)(p+ 4) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
-    *(u32 *)(p+ 5) = __HYPERVISOR_iret;
-    *(u16 *)(p+ 9) = 0x050f;  /* syscall */
-}
-
 #include "compat/traps.c"
 
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index acdf140fbd..dfa60b080c 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -29,6 +29,8 @@ void pv_domain_destroy(struct domain *d);
 int pv_domain_initialise(struct domain *d, unsigned int domcr_flags,
                          struct xen_arch_domainconfig *config);
 
+void hypercall_page_initialise_ring3_kernel(void *hypercall_page);
+
 #else  /* !CONFIG_PV */
 
 #include <xen/errno.h>
@@ -42,6 +44,9 @@ static inline int pv_domain_initialise(struct domain *d,
 {
     return -EOPNOTSUPP;
 }
+
+void hypercall_page_initialise_ring3_kernel(void *hypercall_page) {}
+
 #endif	/* CONFIG_PV */
 
 void paravirt_ctxt_switch_from(struct vcpu *v);
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
index f41287add7..43d9112b6d 100644
--- a/xen/include/asm-x86/pv/traps.h
+++ b/xen/include/asm-x86/pv/traps.h
@@ -30,6 +30,8 @@ void emulate_gate_op(struct cpu_user_regs *regs);
 int emulate_forced_invalid_op(struct cpu_user_regs *regs);
 int emulate_invalid_rdtscp(struct cpu_user_regs *regs);
 
+void pv_percpu_traps_init(void);
+
 #else  /* !CONFIG_PV */
 
 #include <xen/errno.h>
@@ -39,6 +41,8 @@ void emulate_gate_op(struct cpu_user_regs *regs) {}
 int emulate_forced_invalid_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
 int emulate_invalid_rdtscp(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
 
+void pv_percpu_traps_init(void) {}
+
 #endif	/* CONFIG_PV */
 
 #endif	/* __X86_PV_TRAPS_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 14/18] x86/traps: merge x86_64/compat/traps.c into pv/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (12 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 13/18] x86/traps: move PV specific code in x86_64/traps.c Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 15/18] x86: clean up pv/traps.c Wei Liu
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Export hypercall_page_initialise_ring1_kernel as the code is moved.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c            | 406 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/traps.c | 415 -------------------------------------
 xen/arch/x86/x86_64/traps.c        |   2 -
 xen/include/asm-x86/pv/domain.h    |   2 +
 4 files changed, 408 insertions(+), 417 deletions(-)
 delete mode 100644 xen/arch/x86/x86_64/compat/traps.c

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 178ac2773c..8cabef7a44 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -32,6 +32,9 @@
 #include <asm/shared.h>
 #include <asm/traps.h>
 
+#include <compat/callback.h>
+#include <compat/arch-x86_32.h>
+
 #include <public/callback.h>
 
 void do_entry_int82(struct cpu_user_regs *regs)
@@ -686,6 +689,409 @@ void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
     *(u16 *)(p+ 9) = 0x050f;  /* syscall */
 }
 
+/* Compat guest interfaces */
+
+void compat_show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs,
+                             int debug_stack_lines)
+{
+    unsigned int i, *stack, addr, mask = STACK_SIZE;
+
+    stack = (unsigned int *)(unsigned long)regs->esp;
+    printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
+
+    if ( !__compat_access_ok(v->domain, stack, sizeof(*stack)) )
+    {
+        printk("Guest-inaccessible memory.\n");
+        return;
+    }
+
+    if ( v != current )
+    {
+        struct vcpu *vcpu;
+        unsigned long mfn;
+
+        ASSERT(guest_kernel_mode(v, regs));
+        mfn = read_cr3() >> PAGE_SHIFT;
+        for_each_vcpu( v->domain, vcpu )
+            if ( pagetable_get_pfn(vcpu->arch.guest_table) == mfn )
+                break;
+        if ( !vcpu )
+        {
+            stack = do_page_walk(v, (unsigned long)stack);
+            if ( (unsigned long)stack < PAGE_SIZE )
+            {
+                printk("Inaccessible guest memory.\n");
+                return;
+            }
+            mask = PAGE_SIZE;
+        }
+    }
+
+    for ( i = 0; i < debug_stack_lines * 8; i++ )
+    {
+        if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
+            break;
+        if ( __get_user(addr, stack) )
+        {
+            if ( i != 0 )
+                printk("\n    ");
+            printk("Fault while accessing guest memory.");
+            i = 1;
+            break;
+        }
+        if ( (i != 0) && ((i % 8) == 0) )
+            printk("\n ");
+        printk(" %08x", addr);
+        stack++;
+    }
+    if ( mask == PAGE_SIZE )
+    {
+        BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
+        unmap_domain_page(stack);
+    }
+    if ( i == 0 )
+        printk("Stack empty.");
+    printk("\n");
+}
+
+unsigned int compat_iret(void)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct vcpu *v = current;
+    u32 eflags;
+
+    /* Trim stack pointer to 32 bits. */
+    regs->rsp = (u32)regs->rsp;
+
+    /* Restore EAX (clobbered by hypercall). */
+    if ( unlikely(__get_user(regs->eax, (u32 *)regs->rsp)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    /* Restore CS and EIP. */
+    if ( unlikely(__get_user(regs->eip, (u32 *)regs->rsp + 1)) ||
+        unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    /*
+     * Fix up and restore EFLAGS. We fix up in a local staging area
+     * to avoid firing the BUG_ON(IOPL) check in arch_get_info_guest.
+     */
+    if ( unlikely(__get_user(eflags, (u32 *)regs->rsp + 3)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    if ( VM_ASSIST(v->domain, architectural_iopl) )
+        v->arch.pv_vcpu.iopl = eflags & X86_EFLAGS_IOPL;
+
+    regs->eflags = (eflags & ~X86_EFLAGS_IOPL) | X86_EFLAGS_IF;
+
+    if ( unlikely(eflags & X86_EFLAGS_VM) )
+    {
+        /*
+         * Cannot return to VM86 mode: inject a GP fault instead. Note that
+         * the GP fault is reported on the first VM86 mode instruction, not on
+         * the IRET (which is why we can simply leave the stack frame as-is
+         * (except for perhaps having to copy it), which in turn seems better
+         * than teaching create_bounce_frame() to needlessly deal with vm86
+         * mode frames).
+         */
+        const struct trap_info *ti;
+        u32 x, ksp = v->arch.pv_vcpu.kernel_sp - 40;
+        unsigned int i;
+        int rc = 0;
+
+        gdprintk(XENLOG_ERR, "VM86 mode unavailable (ksp:%08X->%08X)\n",
+                 regs->esp, ksp);
+        if ( ksp < regs->esp )
+        {
+            for (i = 1; i < 10; ++i)
+            {
+                rc |= __get_user(x, (u32 *)regs->rsp + i);
+                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
+            }
+        }
+        else if ( ksp > regs->esp )
+        {
+            for ( i = 9; i > 0; --i )
+            {
+                rc |= __get_user(x, (u32 *)regs->rsp + i);
+                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
+            }
+        }
+        if ( rc )
+        {
+            domain_crash(v->domain);
+            return 0;
+        }
+        regs->esp = ksp;
+        regs->ss = v->arch.pv_vcpu.kernel_ss;
+
+        ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
+        if ( TI_GET_IF(ti) )
+            eflags &= ~X86_EFLAGS_IF;
+        regs->eflags &= ~(X86_EFLAGS_VM|X86_EFLAGS_RF|
+                          X86_EFLAGS_NT|X86_EFLAGS_TF);
+        if ( unlikely(__put_user(0, (u32 *)regs->rsp)) )
+        {
+            domain_crash(v->domain);
+            return 0;
+        }
+        regs->eip = ti->address;
+        regs->cs = ti->cs;
+    }
+    else if ( unlikely(ring_0(regs)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+    else if ( ring_1(regs) )
+        regs->esp += 16;
+    /* Return to ring 2/3: restore ESP and SS. */
+    else if ( __get_user(regs->ss, (u32 *)regs->rsp + 5) ||
+              __get_user(regs->esp, (u32 *)regs->rsp + 4) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    /* Restore upcall mask from supplied EFLAGS.IF. */
+    vcpu_info(v, evtchn_upcall_mask) = !(eflags & X86_EFLAGS_IF);
+
+    async_exception_cleanup(v);
+
+    /*
+     * The hypercall exit path will overwrite EAX with this return
+     * value.
+     */
+    return regs->eax;
+}
+
+static long compat_register_guest_callback(
+    struct compat_callback_register *reg)
+{
+    long ret = 0;
+    struct vcpu *v = current;
+
+    fixup_guest_code_selector(v->domain, reg->address.cs);
+
+    switch ( reg->type )
+    {
+    case CALLBACKTYPE_event:
+        v->arch.pv_vcpu.event_callback_cs     = reg->address.cs;
+        v->arch.pv_vcpu.event_callback_eip    = reg->address.eip;
+        break;
+
+    case CALLBACKTYPE_failsafe:
+        v->arch.pv_vcpu.failsafe_callback_cs  = reg->address.cs;
+        v->arch.pv_vcpu.failsafe_callback_eip = reg->address.eip;
+        if ( reg->flags & CALLBACKF_mask_events )
+            set_bit(_VGCF_failsafe_disables_events,
+                    &v->arch.vgc_flags);
+        else
+            clear_bit(_VGCF_failsafe_disables_events,
+                      &v->arch.vgc_flags);
+        break;
+
+    case CALLBACKTYPE_syscall32:
+        v->arch.pv_vcpu.syscall32_callback_cs     = reg->address.cs;
+        v->arch.pv_vcpu.syscall32_callback_eip    = reg->address.eip;
+        v->arch.pv_vcpu.syscall32_disables_events =
+            (reg->flags & CALLBACKF_mask_events) != 0;
+        break;
+
+    case CALLBACKTYPE_sysenter:
+        v->arch.pv_vcpu.sysenter_callback_cs     = reg->address.cs;
+        v->arch.pv_vcpu.sysenter_callback_eip    = reg->address.eip;
+        v->arch.pv_vcpu.sysenter_disables_events =
+            (reg->flags & CALLBACKF_mask_events) != 0;
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = register_guest_nmi_callback(reg->address.eip);
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+static long compat_unregister_guest_callback(
+    struct compat_callback_unregister *unreg)
+{
+    long ret;
+
+    switch ( unreg->type )
+    {
+    case CALLBACKTYPE_event:
+    case CALLBACKTYPE_failsafe:
+    case CALLBACKTYPE_syscall32:
+    case CALLBACKTYPE_sysenter:
+        ret = -EINVAL;
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = unregister_guest_nmi_callback();
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+
+long compat_callback_op(int cmd, XEN_GUEST_HANDLE(void) arg)
+{
+    long ret;
+
+    switch ( cmd )
+    {
+    case CALLBACKOP_register:
+    {
+        struct compat_callback_register reg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&reg, arg, 1) )
+            break;
+
+        ret = compat_register_guest_callback(&reg);
+    }
+    break;
+
+    case CALLBACKOP_unregister:
+    {
+        struct compat_callback_unregister unreg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&unreg, arg, 1) )
+            break;
+
+        ret = compat_unregister_guest_callback(&unreg);
+    }
+    break;
+
+    default:
+        ret = -EINVAL;
+        break;
+    }
+
+    return ret;
+}
+
+long compat_set_callbacks(unsigned long event_selector,
+                          unsigned long event_address,
+                          unsigned long failsafe_selector,
+                          unsigned long failsafe_address)
+{
+    struct compat_callback_register event = {
+        .type = CALLBACKTYPE_event,
+        .address = {
+            .cs = event_selector,
+            .eip = event_address
+        }
+    };
+    struct compat_callback_register failsafe = {
+        .type = CALLBACKTYPE_failsafe,
+        .address = {
+            .cs = failsafe_selector,
+            .eip = failsafe_address
+        }
+    };
+
+    compat_register_guest_callback(&event);
+    compat_register_guest_callback(&failsafe);
+
+    return 0;
+}
+
+int compat_set_trap_table(XEN_GUEST_HANDLE(trap_info_compat_t) traps)
+{
+    struct compat_trap_info cur;
+    struct trap_info *dst = current->arch.pv_vcpu.trap_ctxt;
+    long rc = 0;
+
+    /* If no table is presented then clear the entire virtual IDT. */
+    if ( guest_handle_is_null(traps) )
+    {
+        memset(dst, 0, NR_VECTORS * sizeof(*dst));
+        return 0;
+    }
+
+    for ( ; ; )
+    {
+        if ( copy_from_guest(&cur, traps, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( cur.address == 0 )
+            break;
+
+        fixup_guest_code_selector(current->domain, cur.cs);
+
+        XLAT_trap_info(dst + cur.vector, &cur);
+
+        if ( cur.vector == 0x80 )
+            init_int80_direct_trap(current);
+
+        guest_handle_add_offset(traps, 1);
+
+        if ( hypercall_preempt_check() )
+        {
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_set_trap_table, "h", traps);
+            break;
+        }
+    }
+
+    return rc;
+}
+
+void hypercall_page_initialise_ring1_kernel(void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    /* Fill in all the transfer points with template machine code. */
+
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        if ( i == __HYPERVISOR_iret )
+            continue;
+
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p+ 0) = 0xb8;    /* mov  $<i>,%eax */
+        *(u32 *)(p+ 1) = i;
+        *(u16 *)(p+ 5) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
+        *(u8  *)(p+ 7) = 0xc3;    /* ret */
+    }
+
+    /*
+     * HYPERVISOR_iret is special because it doesn't return and expects a
+     * special stack frame. Guests jump at this transfer point instead of
+     * calling it.
+     */
+    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
+    *(u8  *)(p+ 0) = 0x50;    /* push %eax */
+    *(u8  *)(p+ 1) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
+    *(u32 *)(p+ 2) = __HYPERVISOR_iret;
+    *(u16 *)(p+ 6) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/compat/traps.c b/xen/arch/x86/x86_64/compat/traps.c
deleted file mode 100644
index 8e9a11ccb0..0000000000
--- a/xen/arch/x86/x86_64/compat/traps.c
+++ /dev/null
@@ -1,415 +0,0 @@
-#include <xen/event.h>
-#include <asm/regs.h>
-#include <compat/callback.h>
-#include <compat/arch-x86_32.h>
-
-void compat_show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs,
-                             int debug_stack_lines)
-{
-    unsigned int i, *stack, addr, mask = STACK_SIZE;
-
-    stack = (unsigned int *)(unsigned long)regs->esp;
-    printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
-
-    if ( !__compat_access_ok(v->domain, stack, sizeof(*stack)) )
-    {
-        printk("Guest-inaccessible memory.\n");
-        return;
-    }
-
-    if ( v != current )
-    {
-        struct vcpu *vcpu;
-        unsigned long mfn;
-
-        ASSERT(guest_kernel_mode(v, regs));
-        mfn = read_cr3() >> PAGE_SHIFT;
-        for_each_vcpu( v->domain, vcpu )
-            if ( pagetable_get_pfn(vcpu->arch.guest_table) == mfn )
-                break;
-        if ( !vcpu )
-        {
-            stack = do_page_walk(v, (unsigned long)stack);
-            if ( (unsigned long)stack < PAGE_SIZE )
-            {
-                printk("Inaccessible guest memory.\n");
-                return;
-            }
-            mask = PAGE_SIZE;
-        }
-    }
-
-    for ( i = 0; i < debug_stack_lines * 8; i++ )
-    {
-        if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
-            break;
-        if ( __get_user(addr, stack) )
-        {
-            if ( i != 0 )
-                printk("\n    ");
-            printk("Fault while accessing guest memory.");
-            i = 1;
-            break;
-        }
-        if ( (i != 0) && ((i % 8) == 0) )
-            printk("\n ");
-        printk(" %08x", addr);
-        stack++;
-    }
-    if ( mask == PAGE_SIZE )
-    {
-        BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
-        unmap_domain_page(stack);
-    }
-    if ( i == 0 )
-        printk("Stack empty.");
-    printk("\n");
-}
-
-unsigned int compat_iret(void)
-{
-    struct cpu_user_regs *regs = guest_cpu_user_regs();
-    struct vcpu *v = current;
-    u32 eflags;
-
-    /* Trim stack pointer to 32 bits. */
-    regs->rsp = (u32)regs->rsp;
-
-    /* Restore EAX (clobbered by hypercall). */
-    if ( unlikely(__get_user(regs->eax, (u32 *)regs->rsp)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    /* Restore CS and EIP. */
-    if ( unlikely(__get_user(regs->eip, (u32 *)regs->rsp + 1)) ||
-        unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    /*
-     * Fix up and restore EFLAGS. We fix up in a local staging area
-     * to avoid firing the BUG_ON(IOPL) check in arch_get_info_guest.
-     */
-    if ( unlikely(__get_user(eflags, (u32 *)regs->rsp + 3)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    if ( VM_ASSIST(v->domain, architectural_iopl) )
-        v->arch.pv_vcpu.iopl = eflags & X86_EFLAGS_IOPL;
-
-    regs->eflags = (eflags & ~X86_EFLAGS_IOPL) | X86_EFLAGS_IF;
-
-    if ( unlikely(eflags & X86_EFLAGS_VM) )
-    {
-        /*
-         * Cannot return to VM86 mode: inject a GP fault instead. Note that
-         * the GP fault is reported on the first VM86 mode instruction, not on
-         * the IRET (which is why we can simply leave the stack frame as-is
-         * (except for perhaps having to copy it), which in turn seems better
-         * than teaching create_bounce_frame() to needlessly deal with vm86
-         * mode frames).
-         */
-        const struct trap_info *ti;
-        u32 x, ksp = v->arch.pv_vcpu.kernel_sp - 40;
-        unsigned int i;
-        int rc = 0;
-
-        gdprintk(XENLOG_ERR, "VM86 mode unavailable (ksp:%08X->%08X)\n",
-                 regs->esp, ksp);
-        if ( ksp < regs->esp )
-        {
-            for (i = 1; i < 10; ++i)
-            {
-                rc |= __get_user(x, (u32 *)regs->rsp + i);
-                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
-            }
-        }
-        else if ( ksp > regs->esp )
-        {
-            for ( i = 9; i > 0; --i )
-            {
-                rc |= __get_user(x, (u32 *)regs->rsp + i);
-                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
-            }
-        }
-        if ( rc )
-        {
-            domain_crash(v->domain);
-            return 0;
-        }
-        regs->esp = ksp;
-        regs->ss = v->arch.pv_vcpu.kernel_ss;
-
-        ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
-        if ( TI_GET_IF(ti) )
-            eflags &= ~X86_EFLAGS_IF;
-        regs->eflags &= ~(X86_EFLAGS_VM|X86_EFLAGS_RF|
-                          X86_EFLAGS_NT|X86_EFLAGS_TF);
-        if ( unlikely(__put_user(0, (u32 *)regs->rsp)) )
-        {
-            domain_crash(v->domain);
-            return 0;
-        }
-        regs->eip = ti->address;
-        regs->cs = ti->cs;
-    }
-    else if ( unlikely(ring_0(regs)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-    else if ( ring_1(regs) )
-        regs->esp += 16;
-    /* Return to ring 2/3: restore ESP and SS. */
-    else if ( __get_user(regs->ss, (u32 *)regs->rsp + 5) ||
-              __get_user(regs->esp, (u32 *)regs->rsp + 4) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    /* Restore upcall mask from supplied EFLAGS.IF. */
-    vcpu_info(v, evtchn_upcall_mask) = !(eflags & X86_EFLAGS_IF);
-
-    async_exception_cleanup(v);
-
-    /*
-     * The hypercall exit path will overwrite EAX with this return
-     * value.
-     */
-    return regs->eax;
-}
-
-static long compat_register_guest_callback(
-    struct compat_callback_register *reg)
-{
-    long ret = 0;
-    struct vcpu *v = current;
-
-    fixup_guest_code_selector(v->domain, reg->address.cs);
-
-    switch ( reg->type )
-    {
-    case CALLBACKTYPE_event:
-        v->arch.pv_vcpu.event_callback_cs     = reg->address.cs;
-        v->arch.pv_vcpu.event_callback_eip    = reg->address.eip;
-        break;
-
-    case CALLBACKTYPE_failsafe:
-        v->arch.pv_vcpu.failsafe_callback_cs  = reg->address.cs;
-        v->arch.pv_vcpu.failsafe_callback_eip = reg->address.eip;
-        if ( reg->flags & CALLBACKF_mask_events )
-            set_bit(_VGCF_failsafe_disables_events,
-                    &v->arch.vgc_flags);
-        else
-            clear_bit(_VGCF_failsafe_disables_events,
-                      &v->arch.vgc_flags);
-        break;
-
-    case CALLBACKTYPE_syscall32:
-        v->arch.pv_vcpu.syscall32_callback_cs     = reg->address.cs;
-        v->arch.pv_vcpu.syscall32_callback_eip    = reg->address.eip;
-        v->arch.pv_vcpu.syscall32_disables_events =
-            (reg->flags & CALLBACKF_mask_events) != 0;
-        break;
-
-    case CALLBACKTYPE_sysenter:
-        v->arch.pv_vcpu.sysenter_callback_cs     = reg->address.cs;
-        v->arch.pv_vcpu.sysenter_callback_eip    = reg->address.eip;
-        v->arch.pv_vcpu.sysenter_disables_events =
-            (reg->flags & CALLBACKF_mask_events) != 0;
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = register_guest_nmi_callback(reg->address.eip);
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-static long compat_unregister_guest_callback(
-    struct compat_callback_unregister *unreg)
-{
-    long ret;
-
-    switch ( unreg->type )
-    {
-    case CALLBACKTYPE_event:
-    case CALLBACKTYPE_failsafe:
-    case CALLBACKTYPE_syscall32:
-    case CALLBACKTYPE_sysenter:
-        ret = -EINVAL;
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = unregister_guest_nmi_callback();
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-
-long compat_callback_op(int cmd, XEN_GUEST_HANDLE(void) arg)
-{
-    long ret;
-
-    switch ( cmd )
-    {
-    case CALLBACKOP_register:
-    {
-        struct compat_callback_register reg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&reg, arg, 1) )
-            break;
-
-        ret = compat_register_guest_callback(&reg);
-    }
-    break;
-
-    case CALLBACKOP_unregister:
-    {
-        struct compat_callback_unregister unreg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&unreg, arg, 1) )
-            break;
-
-        ret = compat_unregister_guest_callback(&unreg);
-    }
-    break;
-
-    default:
-        ret = -EINVAL;
-        break;
-    }
-
-    return ret;
-}
-
-long compat_set_callbacks(unsigned long event_selector,
-                          unsigned long event_address,
-                          unsigned long failsafe_selector,
-                          unsigned long failsafe_address)
-{
-    struct compat_callback_register event = {
-        .type = CALLBACKTYPE_event,
-        .address = {
-            .cs = event_selector,
-            .eip = event_address
-        }
-    };
-    struct compat_callback_register failsafe = {
-        .type = CALLBACKTYPE_failsafe,
-        .address = {
-            .cs = failsafe_selector,
-            .eip = failsafe_address
-        }
-    };
-
-    compat_register_guest_callback(&event);
-    compat_register_guest_callback(&failsafe);
-
-    return 0;
-}
-
-int compat_set_trap_table(XEN_GUEST_HANDLE(trap_info_compat_t) traps)
-{
-    struct compat_trap_info cur;
-    struct trap_info *dst = current->arch.pv_vcpu.trap_ctxt;
-    long rc = 0;
-
-    /* If no table is presented then clear the entire virtual IDT. */
-    if ( guest_handle_is_null(traps) )
-    {
-        memset(dst, 0, NR_VECTORS * sizeof(*dst));
-        return 0;
-    }
-
-    for ( ; ; )
-    {
-        if ( copy_from_guest(&cur, traps, 1) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( cur.address == 0 )
-            break;
-
-        fixup_guest_code_selector(current->domain, cur.cs);
-
-        XLAT_trap_info(dst + cur.vector, &cur);
-
-        if ( cur.vector == 0x80 )
-            init_int80_direct_trap(current);
-
-        guest_handle_add_offset(traps, 1);
-
-        if ( hypercall_preempt_check() )
-        {
-            rc = hypercall_create_continuation(
-                __HYPERVISOR_set_trap_table, "h", traps);
-            break;
-        }
-    }
-
-    return rc;
-}
-
-static void hypercall_page_initialise_ring1_kernel(void *hypercall_page)
-{
-    char *p;
-    int i;
-
-    /* Fill in all the transfer points with template machine code. */
-
-    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
-    {
-        if ( i == __HYPERVISOR_iret )
-            continue;
-
-        p = (char *)(hypercall_page + (i * 32));
-        *(u8  *)(p+ 0) = 0xb8;    /* mov  $<i>,%eax */
-        *(u32 *)(p+ 1) = i;
-        *(u16 *)(p+ 5) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
-        *(u8  *)(p+ 7) = 0xc3;    /* ret */
-    }
-
-    /*
-     * HYPERVISOR_iret is special because it doesn't return and expects a
-     * special stack frame. Guests jump at this transfer point instead of
-     * calling it.
-     */
-    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
-    *(u8  *)(p+ 0) = 0x50;    /* push %eax */
-    *(u8  *)(p+ 1) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
-    *(u32 *)(p+ 2) = __HYPERVISOR_iret;
-    *(u16 *)(p+ 6) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
-}
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * tab-width: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 2027a6a4ae..d6f862084a 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -264,8 +264,6 @@ void subarch_percpu_traps_init(void)
     pv_percpu_traps_init();
 }
 
-#include "compat/traps.c"
-
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
 {
     memset(hypercall_page, 0xCC, PAGE_SIZE);
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index dfa60b080c..67e370ebf3 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -30,6 +30,7 @@ int pv_domain_initialise(struct domain *d, unsigned int domcr_flags,
                          struct xen_arch_domainconfig *config);
 
 void hypercall_page_initialise_ring3_kernel(void *hypercall_page);
+void hypercall_page_initialise_ring1_kernel(void *hypercall_page);
 
 #else  /* !CONFIG_PV */
 
@@ -46,6 +47,7 @@ static inline int pv_domain_initialise(struct domain *d,
 }
 
 void hypercall_page_initialise_ring3_kernel(void *hypercall_page) {}
+void hypercall_page_initialise_ring1_kernel(void *hypercall_page) {}
 
 #endif	/* CONFIG_PV */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 15/18] x86: clean up pv/traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (13 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 14/18] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 16/18] x86: guest_has_trap_callback should return bool Wei Liu
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Fix coding style issues.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 8cabef7a44..8aa4d9e335 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -294,12 +294,12 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
             return -EBUSY;
         if ( !test_and_set_bool(v->nmi_pending) )
         {
-               st->domain = d;
-               st->processor = v->processor;
+            st->domain = d;
+            st->processor = v->processor;
 
-               /* not safe to wake up a vcpu here */
-               raise_softirq(NMI_MCE_SOFTIRQ);
-               return 0;
+            /* not safe to wake up a vcpu here */
+            raise_softirq(NMI_MCE_SOFTIRQ);
+            return 0;
         }
         st->vcpu = NULL;
         break;
@@ -309,16 +309,17 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
             return -EBUSY;
 
         /* We are called by the machine check (exception or polling) handlers
-         * on the physical CPU that reported a machine check error. */
+         * on the physical CPU that reported a machine check error.
+         */
 
         if ( !test_and_set_bool(v->mce_pending) )
         {
-                st->domain = d;
-                st->processor = v->processor;
+            st->domain = d;
+            st->processor = v->processor;
 
-                /* not safe to wake up a vcpu here */
-                raise_softirq(NMI_MCE_SOFTIRQ);
-                return 0;
+            /* not safe to wake up a vcpu here */
+            raise_softirq(NMI_MCE_SOFTIRQ);
+            return 0;
         }
         st->vcpu = NULL;
         break;
@@ -772,7 +773,7 @@ unsigned int compat_iret(void)
 
     /* Restore CS and EIP. */
     if ( unlikely(__get_user(regs->eip, (u32 *)regs->rsp + 1)) ||
-        unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
+         unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
     {
         domain_crash(v->domain);
         return 0;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 16/18] x86: guest_has_trap_callback should return bool
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (14 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 15/18] x86: clean up pv/traps.c Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 17/18] x86: fix coding style issues in asm-x86/traps.h Wei Liu
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c     | 4 ++--
 xen/include/asm-x86/traps.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 8aa4d9e335..c946d855c3 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -260,8 +260,8 @@ long unregister_guest_nmi_callback(void)
     return 0;
 }
 
-int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
-                            unsigned int trap_nr)
+bool guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
+                             unsigned int trap_nr)
 {
     struct vcpu *v;
     struct trap_info *t;
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 4e8760482f..7f36f6c1a7 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -33,10 +33,10 @@ void async_exception_cleanup(struct vcpu *);
 /**
  * guest_has_trap_callback
  *
- * returns true (non-zero) if guest registered a trap handler
+ * returns true if guest registered a trap handler
  */
-extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
-				unsigned int trap_nr);
+extern bool guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
+                                    unsigned int trap_nr);
 
 /**
  * send_guest_trap
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 17/18] x86: fix coding style issues in asm-x86/traps.h
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (15 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 16/18] x86: guest_has_trap_callback should return bool Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-05 14:48 ` [PATCH v2 18/18] x86: clean up traps.c Wei Liu
  2017-05-09 10:50 ` [PATCH v2 00/18] Refactor x86 trap handling code Andrew Cooper
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

And provide an Emacs block.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/asm-x86/traps.h | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 7f36f6c1a7..0676e81d1a 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -20,16 +20,16 @@
 #define ASM_TRAP_H
 
 struct softirq_trap {
-	struct domain *domain;  /* domain to inject trap */
-	struct vcpu *vcpu;	/* vcpu to inject trap */
-	int processor;		/* physical cpu to inject trap */
+    struct domain *domain;  /* domain to inject trap */
+    struct vcpu *vcpu;      /* vcpu to inject trap */
+    int processor;          /* physical cpu to inject trap */
 };
 DECLARE_PER_CPU(struct softirq_trap, softirq_trap);
 
 struct cpu_user_regs;
 
 void async_exception_cleanup(struct vcpu *);
- 
+
 /**
  * guest_has_trap_callback
  *
@@ -45,7 +45,7 @@ extern bool guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
  * return 0 on successful delivery
  */
 extern int send_guest_trap(struct domain *d, uint16_t vcpuid,
-				unsigned int trap_nr);
+                           unsigned int trap_nr);
 
 uint32_t guest_io_read(unsigned int port, unsigned int bytes,
                        struct domain *);
@@ -55,3 +55,13 @@ void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
 const char *trapstr(unsigned int trapnr);
 
 #endif /* ASM_TRAP_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 18/18] x86: clean up traps.c
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (16 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 17/18] x86: fix coding style issues in asm-x86/traps.h Wei Liu
@ 2017-05-05 14:48 ` Wei Liu
  2017-05-09 10:50 ` [PATCH v2 00/18] Refactor x86 trap handling code Andrew Cooper
  18 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-05 14:48 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Replace bool_t with bool. Delete trailing white spaces. Fix some coding
style issues.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c | 77 +++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index c0d5b89b13..f3f964c469 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1,18 +1,18 @@
 /******************************************************************************
  * arch/x86/traps.c
- * 
+ *
  * Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
@@ -112,7 +112,7 @@ void (*ioemul_handle_quirk)(
 static int debug_stack_lines = 20;
 integer_param("debug_stack_lines", debug_stack_lines);
 
-static bool_t opt_ler;
+static bool opt_ler;
 boolean_param("ler", opt_ler);
 
 #define stack_words_per_line 4
@@ -527,7 +527,7 @@ void vcpu_show_execution_state(struct vcpu *v)
 }
 
 static cpumask_t show_state_mask;
-static bool_t opt_show_all;
+static bool opt_show_all;
 boolean_param("async-show-all", opt_show_all);
 
 static int nmi_show_execution_state(const struct cpu_user_regs *regs, int cpu)
@@ -538,8 +538,8 @@ static int nmi_show_execution_state(const struct cpu_user_regs *regs, int cpu)
     if ( opt_show_all )
         show_execution_state(regs);
     else
-        printk(XENLOG_ERR "CPU%d @ %04x:%08lx (%pS)\n", cpu, regs->cs, regs->rip,
-               guest_mode(regs) ? _p(regs->rip) : NULL);
+        printk(XENLOG_ERR "CPU%d @ %04x:%08lx (%pS)\n", cpu, regs->cs,
+               regs->rip, guest_mode(regs) ? _p(regs->rip) : NULL);
     cpumask_clear_cpu(cpu, &show_state_mask);
 
     return 1;
@@ -564,7 +564,7 @@ const char *trapstr(unsigned int trapnr)
  * are disabled). In such situations we can't do much that is safe. We try to
  * print out some tracing and then we just spin.
  */
-void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
+void fatal_trap(const struct cpu_user_regs *regs, bool show_remote)
 {
     static DEFINE_PER_CPU(char, depth);
     unsigned int trapnr = regs->entry_vector;
@@ -1013,8 +1013,8 @@ void do_int3(struct cpu_user_regs *regs)
     pv_inject_trap(TRAP_int3, regs);
 }
 
-static void reserved_bit_page_fault(
-    unsigned long addr, struct cpu_user_regs *regs)
+static void reserved_bit_page_fault(unsigned long addr,
+                                    struct cpu_user_regs *regs)
 {
     printk("%pv: reserved bit in page table (ec=%04X)\n",
            current, regs->error_code);
@@ -1022,8 +1022,8 @@ static void reserved_bit_page_fault(
     show_execution_state(regs);
 }
 
-static int handle_gdt_ldt_mapping_fault(
-    unsigned long offset, struct cpu_user_regs *regs)
+static int handle_gdt_ldt_mapping_fault(unsigned long offset,
+                                        struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
     /* Which vcpu's area did we fault in, and is it in the ldt sub-area? */
@@ -1091,8 +1091,8 @@ enum pf_type {
     spurious_fault
 };
 
-static enum pf_type __page_fault_type(
-    unsigned long addr, const struct cpu_user_regs *regs)
+static enum pf_type __page_fault_type(unsigned long addr,
+                                      const struct cpu_user_regs *regs)
 {
     unsigned long mfn, cr3 = read_cr3();
     l4_pgentry_t l4e, *l4t;
@@ -1198,8 +1198,8 @@ leaf:
     return spurious_fault;
 }
 
-static enum pf_type spurious_page_fault(
-    unsigned long addr, const struct cpu_user_regs *regs)
+static enum pf_type spurious_page_fault(unsigned long addr,
+                                        const struct cpu_user_regs *regs)
 {
     unsigned long flags;
     enum pf_type pf_type;
@@ -1308,7 +1308,8 @@ void do_page_fault(struct cpu_user_regs *regs)
         if ( (pf_type == smep_fault) || (pf_type == smap_fault) )
         {
             console_start_sync();
-            printk("Xen SM%cP violation\n", (pf_type == smep_fault) ? 'E' : 'A');
+            printk("Xen SM%cP violation\n",
+                   (pf_type == smep_fault) ? 'E' : 'A');
             fatal_trap(regs, 0);
         }
 
@@ -1358,9 +1359,9 @@ void do_page_fault(struct cpu_user_regs *regs)
 
 /*
  * Early #PF handler to print CR2, error code, and stack.
- * 
+ *
  * We also deal with spurious faults here, even though they should never happen
- * during early boot (an issue was seen once, but was most likely a hardware 
+ * during early boot (an issue was seen once, but was most likely a hardware
  * problem).
  */
 void __init do_early_page_fault(struct cpu_user_regs *regs)
@@ -1404,7 +1405,7 @@ void do_general_protection(struct cpu_user_regs *regs)
 
     /*
      * Cunning trick to allow arbitrary "INT n" handling.
-     * 
+     *
      * We set DPL == 0 on all vectors in the IDT. This prevents any INT <n>
      * instruction from trapping to the appropriate vector, when that might not
      * be expected by Xen or the guest OS. For example, that entry might be for
@@ -1412,12 +1413,12 @@ void do_general_protection(struct cpu_user_regs *regs)
      * expect an error code on the stack (which a software trap never
      * provides), or might be a hardware interrupt handler that doesn't like
      * being called spuriously.
-     * 
+     *
      * Instead, a GPF occurs with the faulting IDT vector in the error code.
-     * Bit 1 is set to indicate that an IDT entry caused the fault. Bit 0 is 
+     * Bit 1 is set to indicate that an IDT entry caused the fault. Bit 0 is
      * clear (which got already checked above) to indicate that it's a software
      * fault, not a hardware one.
-     * 
+     *
      * NOTE: Vectors 3 and 4 are dealt with from their own handler. This is
      * okay because they can only be triggered by an explicit DPL-checked
      * instruction. The DPL specified by the guest OS for these vectors is NOT
@@ -1596,7 +1597,8 @@ static void io_check_error(const struct cpu_user_regs *regs)
     outb((inb(0x61) & 0x07) | 0x00, 0x61); /* enable IOCK */
 }
 
-static void unknown_nmi_error(const struct cpu_user_regs *regs, unsigned char reason)
+static void unknown_nmi_error(const struct cpu_user_regs *regs,
+                              unsigned char reason)
 {
     switch ( opt_nmi[0] )
     {
@@ -1616,14 +1618,14 @@ static int dummy_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
     return 0;
 }
- 
+
 static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
     unsigned char reason;
-    bool_t handle_unknown = 0;
+    bool handle_unknown = false;
 
     ++nmi_count(cpu);
 
@@ -1632,7 +1634,7 @@ void do_nmi(const struct cpu_user_regs *regs)
 
     if ( (nmi_watchdog == NMI_NONE) ||
          (!nmi_watchdog_tick(regs) && watchdog_force) )
-        handle_unknown = 1;
+        handle_unknown = true;
 
     /* Only the BSP gets external NMIs from the system. */
     if ( cpu == 0 )
@@ -1752,7 +1754,8 @@ void do_debug(struct cpu_user_regs *regs)
     return;
 }
 
-static void __init noinline __set_intr_gate(unsigned int n, uint32_t dpl, void *addr)
+static void __init noinline __set_intr_gate(unsigned int n,
+                                            uint32_t dpl, void *addr)
 {
     _set_gate(&idt_table[n], SYS_DESC_irq_gate, dpl, addr);
 }
@@ -1939,28 +1942,28 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
 
     switch ( reg )
     {
-    case 0: 
+    case 0:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(0, value);
         break;
-    case 1: 
+    case 1:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(1, value);
         break;
-    case 2: 
+    case 2:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(2, value);
         break;
     case 3:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(3, value);
         break;
     case 6:
@@ -1970,7 +1973,7 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
          */
         value &= ~DR_STATUS_RESERVED_ZERO; /* reserved bits => 0 */
         value |=  DR_STATUS_RESERVED_ONE;  /* reserved bits => 1 */
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(6, value);
         break;
     case 7:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init
  2017-05-05 14:48 ` [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init Wei Liu
@ 2017-05-05 16:12   ` Andrew Cooper
  2017-05-12 12:00     ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-05-05 16:12 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Jan Beulich

On 05/05/17 15:48, Wei Liu wrote:
> The body of subarch_percpu_traps_init is for setting up PV syscall
> trampoline. Move that into a dedicated function.
>
> Leave the BUILD_BUG_ON in the original function as it is not tied to PV.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

The trampolines are only intended to be used by PV guests, but are still
constitute parts of Xen setting up its entrypoints in a safe manor.

At the time I was adjusting the MSR_CSTAR handling, it was decided to
retain it pointing at a safe location, even on hardware which shouldn't
be able to use it.

Therefore, my plan was to keep the trampolines as general x86
infrastructure, and have the (lifted into C) syscall entry points gain
something like BUG_ON(!IS_DEFINED(CONFIG_PV))

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/18] Refactor x86 trap handling code
  2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
                   ` (17 preceding siblings ...)
  2017-05-05 14:48 ` [PATCH v2 18/18] x86: clean up traps.c Wei Liu
@ 2017-05-09 10:50 ` Andrew Cooper
  18 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2017-05-09 10:50 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Jan Beulich

On 05/05/17 15:48, Wei Liu wrote:

>   x86/traps: export trapstr
>   x86/traps: lift do_guest_trap to domain.h

These two Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> and
pulled into x86-next.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h
  2017-05-05 14:48 ` [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h Wei Liu
@ 2017-05-11 10:18   ` Andrew Cooper
  2017-05-12 12:00     ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-05-11 10:18 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Jan Beulich

On 05/05/17 15:48, Wei Liu wrote:
> Move the following emulation code from traps.c:
>
> 1. invalid op
> 2. rdtsc
> 3. privilege instructions
> 4. gate operation
> 5. pv cpuid emulation
>
> Export the emulate_* functions via pv/traps.h.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

As you are moving all emulation code, arch/x86/x86_64/gpr_switch.S
should move along as well.

> ---
>  xen/arch/x86/pv/Makefile       |    1 +
>  xen/arch/x86/pv/emulate_ops.c  | 1929 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/traps.c           | 1880 +--------------------------------------
>  xen/include/asm-x86/pv/traps.h |   54 ++
>  4 files changed, 1985 insertions(+), 1879 deletions(-)
>  create mode 100644 xen/arch/x86/pv/emulate_ops.c
>  create mode 100644 xen/include/asm-x86/pv/traps.h
>
> diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
> index 489a9f59cb..ef3cecc463 100644
> --- a/xen/arch/x86/pv/Makefile
> +++ b/xen/arch/x86/pv/Makefile
> @@ -3,3 +3,4 @@ obj-y += traps.o
>  
>  obj-bin-y += dom0_build.init.o
>  obj-y += domain.o
> +obj-y += emulate_ops.o
> diff --git a/xen/arch/x86/pv/emulate_ops.c b/xen/arch/x86/pv/emulate_ops.c
> new file mode 100644
> index 0000000000..5f0965e05b
> --- /dev/null
> +++ b/xen/arch/x86/pv/emulate_ops.c

Why emulate_ops?  What about just plain emulate.c ?

>
> <snip>
>
> +    return X86EMUL_UNHANDLEABLE;
> +}
> +
> +#include "../x86_64/mmconfig.h"

Please can this be moved sensibly to the head of the file?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 05/18] x86/pv: clean up emulate_ops.c
  2017-05-05 14:48 ` [PATCH v2 05/18] x86/pv: clean up emulate_ops.c Wei Liu
@ 2017-05-11 11:22   ` Andrew Cooper
  2017-05-12 12:02     ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-05-11 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Please can you fold this following delta?
---
 xen/arch/x86/pv/emulate_ops.c | 101 +++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/pv/emulate_ops.c b/xen/arch/x86/pv/emulate_ops.c
index 97c8d14..9341dec 100644
--- a/xen/arch/x86/pv/emulate_ops.c
+++ b/xen/arch/x86/pv/emulate_ops.c
@@ -39,7 +39,7 @@
 
 #include <xsm/xsm.h>
 
-/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
+/* I/O emulation support. Helper routines for, and type of, the stack stub. */
 void host_to_guest_gpr_switch(struct cpu_user_regs *);
 unsigned long guest_to_host_gpr_switch(unsigned long);
 
@@ -318,15 +318,14 @@ static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
 }
 
 /* Has the guest requested sufficient permission for this I/O access? */
-static int guest_io_okay(unsigned int port, unsigned int bytes,
-                         struct vcpu *v, struct cpu_user_regs *regs)
+static bool guest_io_okay(unsigned int port, unsigned int bytes,
+                          struct vcpu *v, struct cpu_user_regs *regs)
 {
     /* If in user mode, switch to kernel mode just to read I/O bitmap. */
-    int user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+    const bool user_mode = !(v->arch.flags & TF_kernel_mode);
 
     if ( iopl_ok(v, regs) )
-        return 1;
+        return true;
 
     if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
     {
@@ -336,9 +335,11 @@ static int guest_io_okay(unsigned int port, unsigned int bytes,
          * Grab permission bytes from guest space. Inaccessible bytes are
          * read as 0xff (no access allowed).
          */
-        TOGGLE_MODE();
+        if ( user_mode )
+            toggle_guest_mode(v);
+
         switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
-                                          port>>3, 2) )
+                                          port >> 3, 2) )
         {
         default: x.bytes[0] = ~0;
             /* fallthrough */
@@ -346,14 +347,15 @@ static int guest_io_okay(unsigned int port, unsigned int bytes,
             /* fallthrough */
         case 0:  break;
         }
-        TOGGLE_MODE();
 
-        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
-            return 1;
+        if ( user_mode )
+            toggle_guest_mode(v);
+
+        if ( (x.mask & (((1u << bytes) - 1) << (port & 7))) == 0 )
+            return true;
     }
 
-#undef TOGGLE_MODE
-    return 0;
+    return false;
 }
 
 static unsigned int check_guest_io_breakpoint(struct vcpu *v,
@@ -386,7 +388,7 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
         }
 
         if ( (start < (port + len)) && ((start + width) > port) )
-            match |= 1 << i;
+            match |= 1u << i;
     }
 
     return match;
@@ -401,11 +403,11 @@ static bool admin_io_okay(unsigned int port, unsigned int bytes,
      * We never permit direct access to that register.
      */
     if ( (port == 0xcf8) && (bytes == 4) )
-        return 0;
+        return false;
 
     /* We also never permit direct access to the RTC/CMOS registers. */
     if ( ((port & ~1) == RTC_PORT(0)) )
-        return 0;
+        return false;
 
     return ioports_access_permitted(d, port, port + bytes - 1);
 }
@@ -416,10 +418,10 @@ static bool pci_cfg_ok(struct domain *currd, unsigned int start,
     uint32_t machine_bdf;
 
     if ( !is_hardware_domain(currd) )
-        return 0;
+        return false;
 
     if ( !CF8_ENABLED(currd->arch.pci_cf8) )
-        return 1;
+        return true;
 
     machine_bdf = CF8_BDF(currd->arch.pci_cf8);
     if ( write )
@@ -427,7 +429,7 @@ static bool pci_cfg_ok(struct domain *currd, unsigned int start,
         const unsigned long *ro_map = pci_get_ro_map(0);
 
         if ( ro_map && test_bit(machine_bdf, ro_map) )
-            return 0;
+            return false;
     }
     start |= CF8_ADDR_LO(currd->arch.pci_cf8);
     /* AMD extended configuration space access? */
@@ -438,7 +440,7 @@ static bool pci_cfg_ok(struct domain *currd, unsigned int start,
         uint64_t msr_val;
 
         if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
-            return 0;
+            return false;
         if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
             start |= CF8_ADDR_HI(currd->arch.pci_cf8);
     }
@@ -835,7 +837,7 @@ static int priv_op_write_cr(unsigned int reg, unsigned long val,
         if ( (val ^ read_cr0()) & ~X86_CR0_TS )
         {
             gdprintk(XENLOG_WARNING,
-                    "Attempt to change unmodifiable CR0 flags\n");
+                     "Attempt to change unmodifiable CR0 flags\n");
             break;
         }
         do_fpu_taskswitch(!!(val & X86_CR0_TS));
@@ -948,11 +950,11 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
         *val = curr->arch.pv_vcpu.gs_base_user;
         return X86EMUL_OKAY;
 
-    /*
-     * In order to fully retain original behavior, defer calling
-     * pv_soft_rdtsc() until after emulation. This may want/need to be
-     * reconsidered.
-     */
+        /*
+         * In order to fully retain original behavior, defer calling
+         * pv_soft_rdtsc() until after emulation. This may want/need to be
+         * reconsidered.
+         */
     case MSR_IA32_TSC:
         poc->tsc |= TSC_BASE;
         goto normal;
@@ -1042,16 +1044,16 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
             *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
         return X86EMUL_OKAY;
 
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL ... MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
         {
             vpmu_msr = true;
             /* fall through */
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+    case MSR_AMD_FAM15H_EVNTSEL0 ... MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
             {
                 if ( vpmu_do_rdmsr(reg, val) )
@@ -1249,15 +1251,15 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
         curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
         return X86EMUL_OKAY;
 
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL ... MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
         {
             vpmu_msr = true;
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+    case MSR_AMD_FAM15H_EVNTSEL0 ... MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
             {
                 if ( (vpmu_mode & XENPMU_MODE_ALL) &&
@@ -1542,7 +1544,6 @@ static int read_gate_descriptor(unsigned int gate_sel,
     struct desc_struct desc;
     const struct desc_struct *pdesc;
 
-
     pdesc = (const struct desc_struct *)
         (!(gate_sel & 4) ? GDT_VIRT_START(v) : LDT_VIRT_START(v))
         + (gate_sel >> 3);
@@ -1724,17 +1725,17 @@ void emulate_gate_op(struct cpu_user_regs *regs)
     {
         unsigned int ss, esp, *stkp;
         int rc;
-#define push(item) do \
-        { \
-            --stkp; \
-            esp -= 4; \
-            rc = __put_user(item, stkp); \
-            if ( rc ) \
-            { \
-                pv_inject_page_fault(PFEC_write_access, \
-                                     (unsigned long)(stkp + 1) - rc); \
-                return; \
-            } \
+#define push(item) do                                                   \
+        {                                                               \
+            --stkp;                                                     \
+            esp -= 4;                                                   \
+            rc = __put_user(item, stkp);                                \
+            if ( rc )                                                   \
+            {                                                           \
+                pv_inject_page_fault(PFEC_write_access,                 \
+                                     (unsigned long)(stkp + 1) - rc);   \
+                return;                                                 \
+            }                                                           \
         } while ( 0 )
 
         if ( ((ar >> 13) & 3) < (regs->cs & 3) )
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init
  2017-05-05 16:12   ` Andrew Cooper
@ 2017-05-12 12:00     ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-12 12:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Jan Beulich

On Fri, May 05, 2017 at 05:12:25PM +0100, Andrew Cooper wrote:
> On 05/05/17 15:48, Wei Liu wrote:
> > The body of subarch_percpu_traps_init is for setting up PV syscall
> > trampoline. Move that into a dedicated function.
> >
> > Leave the BUILD_BUG_ON in the original function as it is not tied to PV.
> >
> > No functional change.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> The trampolines are only intended to be used by PV guests, but are still
> constitute parts of Xen setting up its entrypoints in a safe manor.
> 
> At the time I was adjusting the MSR_CSTAR handling, it was decided to
> retain it pointing at a safe location, even on hardware which shouldn't
> be able to use it.
> 
> Therefore, my plan was to keep the trampolines as general x86
> infrastructure, and have the (lifted into C) syscall entry points gain
> something like BUG_ON(!IS_DEFINED(CONFIG_PV))
> 

Fine by me. I will drop this patch in next version.

> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h
  2017-05-11 10:18   ` Andrew Cooper
@ 2017-05-12 12:00     ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-12 12:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Jan Beulich

On Thu, May 11, 2017 at 11:18:39AM +0100, Andrew Cooper wrote:
> On 05/05/17 15:48, Wei Liu wrote:
> > Move the following emulation code from traps.c:
> >
> > 1. invalid op
> > 2. rdtsc
> > 3. privilege instructions
> > 4. gate operation
> > 5. pv cpuid emulation
> >
> > Export the emulate_* functions via pv/traps.h.
> >
> > No functional change.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> As you are moving all emulation code, arch/x86/x86_64/gpr_switch.S
> should move along as well.
> 

Sure. An extra patch it is.

> > ---
> >  xen/arch/x86/pv/Makefile       |    1 +
> >  xen/arch/x86/pv/emulate_ops.c  | 1929 ++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/x86/traps.c           | 1880 +--------------------------------------
> >  xen/include/asm-x86/pv/traps.h |   54 ++
> >  4 files changed, 1985 insertions(+), 1879 deletions(-)
> >  create mode 100644 xen/arch/x86/pv/emulate_ops.c
> >  create mode 100644 xen/include/asm-x86/pv/traps.h
> >
> > diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
> > index 489a9f59cb..ef3cecc463 100644
> > --- a/xen/arch/x86/pv/Makefile
> > +++ b/xen/arch/x86/pv/Makefile
> > @@ -3,3 +3,4 @@ obj-y += traps.o
> >  
> >  obj-bin-y += dom0_build.init.o
> >  obj-y += domain.o
> > +obj-y += emulate_ops.o
> > diff --git a/xen/arch/x86/pv/emulate_ops.c b/xen/arch/x86/pv/emulate_ops.c
> > new file mode 100644
> > index 0000000000..5f0965e05b
> > --- /dev/null
> > +++ b/xen/arch/x86/pv/emulate_ops.c
> 
> Why emulate_ops?  What about just plain emulate.c ?
> 
> >
> > <snip>
> >
> > +    return X86EMUL_UNHANDLEABLE;
> > +}
> > +
> > +#include "../x86_64/mmconfig.h"
> 
> Please can this be moved sensibly to the head of the file?
> 

I m fine with both suggestion.

> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/18] x86/pv: clean up emulate_ops.c
  2017-05-11 11:22   ` Andrew Cooper
@ 2017-05-12 12:02     ` Wei Liu
  2017-05-12 12:03       ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-05-12 12:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Xen-devel

On Thu, May 11, 2017 at 12:22:29PM +0100, Andrew Cooper wrote:
> Please can you fold this following delta?

Actually I'm going to combine your patch with mine #5.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/18] x86/pv: clean up emulate_ops.c
  2017-05-12 12:02     ` Wei Liu
@ 2017-05-12 12:03       ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-05-12 12:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Xen-devel

On Fri, May 12, 2017 at 01:02:04PM +0100, Wei Liu wrote:
> On Thu, May 11, 2017 at 12:22:29PM +0100, Andrew Cooper wrote:
> > Please can you fold this following delta?
> 
> Actually I'm going to combine your patch with mine #5.

Oops, I thought you were replying to #4...

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h
  2017-05-05 14:48 ` [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h Wei Liu
@ 2017-05-12 18:16   ` Andrew Cooper
  2017-05-12 18:26     ` Andrew Cooper
  2017-05-15  7:50     ` Jan Beulich
  0 siblings, 2 replies; 30+ messages in thread
From: Andrew Cooper @ 2017-05-12 18:16 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

On 05/05/17 15:48, Wei Liu wrote:
> Put it along side with other pv_inject functions and rename it to
> pv_inject_trap.
>
> We need this because this function is used by PV emulation code and PV
> trap handling code, which will be split into different files.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

On further thought, it might be better to use this patch instead, which
drops do_guest_trap() in favour of better alternatives, and in a manor
more consistent with HVM guests.

Thoughts?

~Andrew

[-- Attachment #2: 0001-x86-pv-Drop-the-use-of-do_guest_trap.patch --]
[-- Type: text/x-diff, Size: 10543 bytes --]

>From 34ed6f6f4ea6113874a8733f036aa8f8370ed9e1 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri, 12 May 2017 16:05:31 +0000
Subject: [PATCH] x86/pv: Drop the use of do_guest_trap()

do_guest_trap() was introduced for compatibility at the time that x86_event
was being introduced, but all of the callers of do_guest_trap() have better
options.

Most callsites are switched to pv_inject_hw_exception(), which allows the
error code to be passed directly, rather than via a cpu_user_regs parameter.

For the int $N emulation code, introduce pv_inject_sw_interrupt() and teach
pv_inject_event() to cope with low vector numbers not necesserily having an
error code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c         | 62 +++++++++++++++++++-------------------------
 xen/include/asm-x86/domain.h | 11 ++++++++
 2 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 27fdf12..6ac4821 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -633,6 +633,7 @@ void pv_inject_event(const struct x86_event *event)
     const struct trap_info *ti;
     const uint8_t vector = event->vector;
     const bool use_error_code =
+        (event->type == X86_EVENTTYPE_HW_EXCEPTION) &&
         ((vector < 32) && (TRAP_HAVE_EC & (1u << vector)));
     unsigned int error_code = event->error_code;
 
@@ -684,18 +685,6 @@ void pv_inject_event(const struct x86_event *event)
     }
 }
 
-static inline void do_guest_trap(unsigned int trapnr,
-                                 const struct cpu_user_regs *regs)
-{
-    const struct x86_event event = {
-        .vector = trapnr,
-        .error_code = (((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)))
-                       ? regs->error_code : X86_EVENT_NO_EC),
-    };
-
-    pv_inject_event(&event);
-}
-
 static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
 {
     regs->rip = rip;
@@ -703,7 +692,7 @@ static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
     if ( regs->eflags & X86_EFLAGS_TF )
     {
         current->arch.debugreg[6] |= DR_STEP | DR_STATUS_RESERVED_ONE;
-        do_guest_trap(TRAP_debug, regs);
+        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
     }
 }
 
@@ -751,7 +740,7 @@ int set_guest_machinecheck_trapbounce(void)
     struct vcpu *v = current;
     struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
  
-    do_guest_trap(TRAP_machine_check, guest_cpu_user_regs());
+    pv_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
     tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
     return !null_trap_bounce(v, tb);
 }
@@ -764,7 +753,7 @@ int set_guest_nmi_trapbounce(void)
 {
     struct vcpu *v = current;
     struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
-    do_guest_trap(TRAP_nmi, guest_cpu_user_regs());
+    pv_inject_hw_exception(TRAP_nmi, X86_EVENT_NO_EC);
     tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
     return !null_trap_bounce(v, tb);
 }
@@ -794,7 +783,10 @@ void do_trap(struct cpu_user_regs *regs)
 
     if ( guest_mode(regs) )
     {
-        do_guest_trap(trapnr, regs);
+        int ec = ((trapnr < 32) && (TRAP_HAVE_EC & (1u << trapnr)))
+            ? regs->error_code : X86_EVENT_NO_EC;
+
+        pv_inject_hw_exception(trapnr, ec);
         return;
     }
 
@@ -1060,7 +1052,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
     if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
     {
         regs->rip = eip;
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
         return EXCRET_fault_fixed;
     }
 
@@ -1096,7 +1088,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
     {
         if ( !emulate_invalid_rdtscp(regs) &&
              !emulate_forced_invalid_op(regs) )
-            do_guest_trap(TRAP_invalid_op, regs);
+            pv_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
         return;
     }
 
@@ -1224,7 +1216,7 @@ void do_int3(struct cpu_user_regs *regs)
         return;
     }
 
-    do_guest_trap(TRAP_int3, regs);
+    pv_inject_hw_exception(TRAP_int3, X86_EVENT_NO_EC);
 }
 
 static void reserved_bit_page_fault(
@@ -3038,7 +3030,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         {
             curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
             if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
-                do_guest_trap(TRAP_debug, regs);
+                pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
         }
         /* fall through */
     case X86EMUL_RETRY:
@@ -3153,12 +3145,12 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
          (((ar >> 13) & 3) < (regs->cs & 3)) ||
          ((ar & _SEGMENT_TYPE) != 0xc00) )
     {
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
         return;
     }
     if ( !(ar & _SEGMENT_P) )
     {
-        do_guest_trap(TRAP_no_segment, regs);
+        pv_inject_hw_exception(TRAP_no_segment, regs->error_code);
         return;
     }
     dpl = (ar >> 13) & 3;
@@ -3174,7 +3166,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
          !(ar & _SEGMENT_P) ||
          !(ar & _SEGMENT_CODE) )
     {
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
         return;
     }
 
@@ -3187,7 +3179,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
         if ( PTR_ERR(state) == -X86EMUL_EXCEPTION )
             pv_inject_event(&ctxt.ctxt.event);
         else
-            do_guest_trap(TRAP_gp_fault, regs);
+            pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
         return;
     }
 
@@ -3237,7 +3229,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
          (opnd_sel & ~3) != regs->error_code ||
          dpl < (opnd_sel & 3) )
     {
-        do_guest_trap(TRAP_gp_fault, regs);
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
         return;
     }
 
@@ -3285,7 +3277,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             /* Inner stack known only for kernel ring. */
             if ( (sel & 3) != GUEST_KERNEL_RPL(v->domain) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
                 return;
             }
             esp = v->arch.pv_vcpu.kernel_sp;
@@ -3309,7 +3301,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
             if ( !compat_access_ok(stkp - 4 - nparm, (4 + nparm) * 4) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
                 return;
             }
             push(regs->ss);
@@ -3324,12 +3316,12 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
                      (ar & _SEGMENT_CODE) ||
                      !(ar & _SEGMENT_WR) ||
                      !check_stack_limit(ar, limit, esp + nparm * 4, nparm * 4) )
-                    return do_guest_trap(TRAP_gp_fault, regs);
+                    return pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
                 ustkp = (unsigned int *)(unsigned long)
                         ((unsigned int)base + regs->esp + nparm * 4);
                 if ( !compat_access_ok(ustkp - nparm, nparm * 4) )
                 {
-                    do_guest_trap(TRAP_gp_fault, regs);
+                    pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
                     return;
                 }
                 do
@@ -3355,7 +3347,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             if ( !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
                  ((ar >> 13) & 3) != (sel & 3) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
                 return;
             }
             if ( !check_stack_limit(ar, limit, esp, 2 * 4) )
@@ -3366,7 +3358,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs)
             stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
             if ( !compat_access_ok(stkp - 2, 2 * 4) )
             {
-                do_guest_trap(TRAP_gp_fault, regs);
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
                 return;
             }
         }
@@ -3427,7 +3419,7 @@ void do_general_protection(struct cpu_user_regs *regs)
         if ( permit_softint(TI_GET_DPL(ti), v, regs) )
         {
             regs->rip += 2;
-            do_guest_trap(vector, regs);
+            pv_inject_sw_interrupt(vector);
             return;
         }
     }
@@ -3446,7 +3438,7 @@ void do_general_protection(struct cpu_user_regs *regs)
     }
 
     /* Pass on GPF as is. */
-    do_guest_trap(TRAP_gp_fault, regs);
+    pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
     return;
 
  gp_in_kernel:
@@ -3666,7 +3658,7 @@ void do_device_not_available(struct cpu_user_regs *regs)
 
     if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
     {
-        do_guest_trap(TRAP_no_device, regs);
+        pv_inject_hw_exception(TRAP_no_device, X86_EVENT_NO_EC);
         curr->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS;
     }
     else
@@ -3739,7 +3731,7 @@ void do_debug(struct cpu_user_regs *regs)
     v->arch.debugreg[6] = read_debugreg(6);
 
     ler_enable();
-    do_guest_trap(TRAP_debug, regs);
+    pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
     return;
 
  out:
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6ab987f..924caac 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -655,6 +655,17 @@ static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
     pv_inject_event(&event);
 }
 
+static inline void pv_inject_sw_interrupt(unsigned int vector)
+{
+    const struct x86_event event = {
+        .vector = vector,
+        .type = X86_EVENTTYPE_SW_INTERRUPT,
+        .error_code = X86_EVENT_NO_EC,
+    };
+
+    pv_inject_event(&event);
+}
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
-- 
2.1.4


[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h
  2017-05-12 18:16   ` Andrew Cooper
@ 2017-05-12 18:26     ` Andrew Cooper
  2017-05-15  7:50     ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2017-05-12 18:26 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: Jan Beulich

On 12/05/17 19:16, Andrew Cooper wrote:
> On 05/05/17 15:48, Wei Liu wrote:
>> Put it along side with other pv_inject functions and rename it to
>> pv_inject_trap.
>>
>> We need this because this function is used by PV emulation code and PV
>> trap handling code, which will be split into different files.
>>
>> No functional change.
>>
>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> On further thought, it might be better to use this patch instead, which
> drops do_guest_trap() in favour of better alternatives, and in a manor
> more consistent with HVM guests.
>
> Thoughts?

On yet further thought, this patch actually fixes guest stack corruption
if the PV kernel uses int $x for vectors with an error code (or
configures such vectors to be accessable to userspace).

This was actually a regression introduced by me in 2e426d6eecf, and
6480cc6280e really should have made the regression much more obvious...

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h
  2017-05-12 18:16   ` Andrew Cooper
  2017-05-12 18:26     ` Andrew Cooper
@ 2017-05-15  7:50     ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2017-05-15  7:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu

>>> On 12.05.17 at 20:16, <andrew.cooper3@citrix.com> wrote:
> On 05/05/17 15:48, Wei Liu wrote:
>> Put it along side with other pv_inject functions and rename it to
>> pv_inject_trap.
>>
>> We need this because this function is used by PV emulation code and PV
>> trap handling code, which will be split into different files.
>>
>> No functional change.
>>
>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> On further thought, it might be better to use this patch instead, which
> drops do_guest_trap() in favour of better alternatives, and in a manor
> more consistent with HVM guests.
> 
> Thoughts?

Yes please.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-05-15  7:50 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-05 14:48 [PATCH v2 00/18] Refactor x86 trap handling code Wei Liu
2017-05-05 14:48 ` [PATCH v2 01/18] x86/traps: factor out pv_percpu_traps_init Wei Liu
2017-05-05 16:12   ` Andrew Cooper
2017-05-12 12:00     ` Wei Liu
2017-05-05 14:48 ` [PATCH v2 02/18] x86/traps: export trapstr Wei Liu
2017-05-05 14:48 ` [PATCH v2 03/18] x86/traps: lift do_guest_trap to domain.h Wei Liu
2017-05-12 18:16   ` Andrew Cooper
2017-05-12 18:26     ` Andrew Cooper
2017-05-15  7:50     ` Jan Beulich
2017-05-05 14:48 ` [PATCH v2 04/18] x86/traps: move all PV emulation code to pv/emulate_ops.h Wei Liu
2017-05-11 10:18   ` Andrew Cooper
2017-05-12 12:00     ` Wei Liu
2017-05-05 14:48 ` [PATCH v2 05/18] x86/pv: clean up emulate_ops.c Wei Liu
2017-05-11 11:22   ` Andrew Cooper
2017-05-12 12:02     ` Wei Liu
2017-05-12 12:03       ` Wei Liu
2017-05-05 14:48 ` [PATCH v2 06/18] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
2017-05-05 14:48 ` [PATCH v2 07/18] x86/traps: move pv_inject_event " Wei Liu
2017-05-05 14:48 ` [PATCH v2 08/18] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
2017-05-05 14:48 ` [PATCH v2 09/18] x86/traps: move {un, }register_guest_nmi_callback Wei Liu
2017-05-05 14:48 ` [PATCH v2 10/18] x86/traps: delcare percpu softirq_trap Wei Liu
2017-05-05 14:48 ` [PATCH v2 11/18] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
2017-05-05 14:48 ` [PATCH v2 12/18] x86/traps: move send_guest_trap " Wei Liu
2017-05-05 14:48 ` [PATCH v2 13/18] x86/traps: move PV specific code in x86_64/traps.c Wei Liu
2017-05-05 14:48 ` [PATCH v2 14/18] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
2017-05-05 14:48 ` [PATCH v2 15/18] x86: clean up pv/traps.c Wei Liu
2017-05-05 14:48 ` [PATCH v2 16/18] x86: guest_has_trap_callback should return bool Wei Liu
2017-05-05 14:48 ` [PATCH v2 17/18] x86: fix coding style issues in asm-x86/traps.h Wei Liu
2017-05-05 14:48 ` [PATCH v2 18/18] x86: clean up traps.c Wei Liu
2017-05-09 10:50 ` [PATCH v2 00/18] Refactor x86 trap handling code Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.