All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v3 00/22] x86: refactor trap handling code
@ 2017-05-18 17:09 Wei Liu
  2017-05-18 17:09 ` [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code Wei Liu
                   ` (21 more replies)
  0 siblings, 22 replies; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

V3 of the series. Rebased on top of x86-next branch.

Patches are broken down into the smallest trunk possible to ease review and
future rebasing.

   git://xenbits.xen.org/people/liuw/xen.git wip.move-traps-v3

Wei Liu (22):
  x86/traps: move privilege instruction emulation code
  x86/traps: move gate op emulation code
  x86/traps: move emulate_invalid_rdtscp
  x86/traps: move emulate_forced_invalid_op
  x86/pv: clean up emulate.c
  x86/traps: move PV hypercall handlers to pv/traps.c
  x86/traps: move pv_inject_event to pv/traps.c
  x86/traps: move set_guest_{machinecheck,nmi}_trapbounce
  x86/traps: move {un,}register_guest_nmi_callback
  x86/traps: delcare percpu softirq_trap
  x86/traps: move guest_has_trap_callback to pv/traps.c
  x86/traps: move send_guest_trap to pv/traps.c
  x86/traps: move toggle_guest_mode
  x86/traps: move do_iret to pv/traps.c
  x86/traps: move init_int80_direct_trap
  x86/traps: move callback_op code
  x86/traps: move hypercall_page_initialise_ring3_kernel
  x86/traps: merge x86_64/compat/traps.c into pv/traps.c
  x86: clean up pv/traps.c
  x86: guest_has_trap_callback should return bool
  x86: fix coding style issues in asm-x86/traps.h
  x86: clean up traps.c

 xen/arch/x86/pv/Makefile           |    1 +
 xen/arch/x86/pv/emulate.c          | 1947 ++++++++++++++++++++++++++++
 xen/arch/x86/pv/traps.c            | 1014 +++++++++++++++
 xen/arch/x86/traps.c               | 2483 +++---------------------------------
 xen/arch/x86/x86_64/compat/traps.c |  416 ------
 xen/arch/x86/x86_64/traps.c        |  288 +----
 xen/include/asm-x86/pv/domain.h    |    7 +
 xen/include/asm-x86/pv/traps.h     |   54 +
 xen/include/asm-x86/traps.h        |   27 +-
 9 files changed, 3202 insertions(+), 3035 deletions(-)
 create mode 100644 xen/arch/x86/pv/emulate.c
 delete mode 100644 xen/arch/x86/x86_64/compat/traps.c
 create mode 100644 xen/include/asm-x86/pv/traps.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-18 17:28   ` Wei Liu
  2017-05-18 17:09 ` [PATCH for-next v3 02/22] x86/traps: move gate op " Wei Liu
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Move relevant code to pv/emulate.c. Export emulate_privileged_op in
pv/traps.h.

Note that read_descriptor is duplicated in emulate.c. The duplication
will be gone once all emulation code is moved.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/Makefile       |    1 +
 xen/arch/x86/pv/emulate.c      | 1470 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c           | 1358 +------------------------------------
 xen/include/asm-x86/pv/traps.h |   48 ++
 4 files changed, 1521 insertions(+), 1356 deletions(-)
 create mode 100644 xen/arch/x86/pv/emulate.c
 create mode 100644 xen/include/asm-x86/pv/traps.h

diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 489a9f59cb..564202cbb7 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -3,3 +3,4 @@ obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
 obj-y += domain.o
+obj-y += emulate.o
diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
new file mode 100644
index 0000000000..fb0d066a3b
--- /dev/null
+++ b/xen/arch/x86/pv/emulate.c
@@ -0,0 +1,1470 @@
+/******************************************************************************
+ * arch/x86/pv/emulate.c
+ *
+ * PV emulation code
+ *
+ * Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/errno.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/iocap.h>
+#include <xen/spinlock.h>
+#include <xen/trace.h>
+
+#include <asm/apic.h>
+#include <asm/debugreg.h>
+#include <asm/hpet.h>
+#include <asm/hypercall.h>
+#include <asm/mc146818rtc.h>
+#include <asm/p2m.h>
+#include <asm/pv/traps.h>
+#include <asm/shared.h>
+#include <asm/traps.h>
+#include <asm/x86_emulate.h>
+
+#include <xsm/xsm.h>
+
+#include "../x86_64/mmconfig.h"
+
+/******************
+ * Helper functions
+ */
+
+static int read_descriptor(unsigned int sel,
+                           const struct vcpu *v,
+                           unsigned long *base,
+                           unsigned long *limit,
+                           unsigned int *ar,
+                           bool_t insn_fetch)
+{
+    struct desc_struct desc;
+
+    if ( sel < 4)
+        desc.b = desc.a = 0;
+    else if ( __get_user(desc,
+                         (const struct desc_struct *)(!(sel & 4)
+                                                      ? GDT_VIRT_START(v)
+                                                      : LDT_VIRT_START(v))
+                         + (sel >> 3)) )
+        return 0;
+    if ( !insn_fetch )
+        desc.b &= ~_SEGMENT_L;
+
+    *ar = desc.b & 0x00f0ff00;
+    if ( !(desc.b & _SEGMENT_L) )
+    {
+        *base = ((desc.a >> 16) + ((desc.b & 0xff) << 16) +
+                 (desc.b & 0xff000000));
+        *limit = (desc.a & 0xffff) | (desc.b & 0x000f0000);
+        if ( desc.b & _SEGMENT_G )
+            *limit = ((*limit + 1) << 12) - 1;
+#ifndef NDEBUG
+        if ( sel > 3 )
+        {
+            unsigned int a, l;
+            unsigned char valid;
+
+            asm volatile (
+                "larl %2,%0 ; setz %1"
+                : "=r" (a), "=qm" (valid) : "rm" (sel));
+            BUG_ON(valid && ((a & 0x00f0ff00) != *ar));
+            asm volatile (
+                "lsll %2,%0 ; setz %1"
+                : "=r" (l), "=qm" (valid) : "rm" (sel));
+            BUG_ON(valid && (l != *limit));
+        }
+#endif
+    }
+    else
+    {
+        *base = 0UL;
+        *limit = ~0UL;
+    }
+
+    return 1;
+}
+
+/***********************
+ * I/O emulation support
+ */
+
+struct priv_op_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    struct {
+        unsigned long base, limit;
+    } cs;
+    char *io_emul_stub;
+    unsigned int bpmatch;
+    unsigned int tsc;
+#define TSC_BASE 1
+#define TSC_AUX 2
+};
+
+/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
+void host_to_guest_gpr_switch(struct cpu_user_regs *);
+unsigned long guest_to_host_gpr_switch(unsigned long);
+
+void (*pv_post_outb_hook)(unsigned int port, u8 value);
+
+typedef void io_emul_stub_t(struct cpu_user_regs *);
+
+static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
+                                          unsigned int port, unsigned int bytes)
+{
+    if ( !ctxt->io_emul_stub )
+        ctxt->io_emul_stub = map_domain_page(_mfn(this_cpu(stubs.mfn))) +
+                                             (this_cpu(stubs.addr) &
+                                              ~PAGE_MASK) +
+                                             STUB_BUF_SIZE / 2;
+
+    /* movq $host_to_guest_gpr_switch,%rcx */
+    ctxt->io_emul_stub[0] = 0x48;
+    ctxt->io_emul_stub[1] = 0xb9;
+    *(void **)&ctxt->io_emul_stub[2] = (void *)host_to_guest_gpr_switch;
+    /* callq *%rcx */
+    ctxt->io_emul_stub[10] = 0xff;
+    ctxt->io_emul_stub[11] = 0xd1;
+    /* data16 or nop */
+    ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
+    /* <io-access opcode> */
+    ctxt->io_emul_stub[13] = opcode;
+    /* imm8 or nop */
+    ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
+    /* ret (jumps to guest_to_host_gpr_switch) */
+    ctxt->io_emul_stub[15] = 0xc3;
+    BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
+
+    if ( ioemul_handle_quirk )
+        ioemul_handle_quirk(opcode, &ctxt->io_emul_stub[12], ctxt->ctxt.regs);
+
+    /* Handy function-typed pointer to the stub. */
+    return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
+}
+
+
+/* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
+static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
+{
+    unsigned int cpl = guest_kernel_mode(v, regs) ?
+        (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
+
+    ASSERT((v->arch.pv_vcpu.iopl & ~X86_EFLAGS_IOPL) == 0);
+
+    return IOPL(cpl) <= v->arch.pv_vcpu.iopl;
+}
+
+/* Has the guest requested sufficient permission for this I/O access? */
+static int guest_io_okay(
+    unsigned int port, unsigned int bytes,
+    struct vcpu *v, struct cpu_user_regs *regs)
+{
+    /* If in user mode, switch to kernel mode just to read I/O bitmap. */
+    int user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+    if ( iopl_ok(v, regs) )
+        return 1;
+
+    if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
+    {
+        union { uint8_t bytes[2]; uint16_t mask; } x;
+
+        /*
+         * Grab permission bytes from guest space. Inaccessible bytes are
+         * read as 0xff (no access allowed).
+         */
+        TOGGLE_MODE();
+        switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
+                                          port>>3, 2) )
+        {
+        default: x.bytes[0] = ~0;
+            /* fallthrough */
+        case 1:  x.bytes[1] = ~0;
+            /* fallthrough */
+        case 0:  break;
+        }
+        TOGGLE_MODE();
+
+        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
+            return 1;
+    }
+
+    return 0;
+}
+
+/* Has the administrator granted sufficient permission for this I/O access? */
+static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
+                            const struct domain *d)
+{
+    /*
+     * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
+     * We never permit direct access to that register.
+     */
+    if ( (port == 0xcf8) && (bytes == 4) )
+        return 0;
+
+    /* We also never permit direct access to the RTC/CMOS registers. */
+    if ( ((port & ~1) == RTC_PORT(0)) )
+        return 0;
+
+    return ioports_access_permitted(d, port, port + bytes - 1);
+}
+
+static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
+                         unsigned int size, uint32_t *write)
+{
+    uint32_t machine_bdf;
+
+    if ( !is_hardware_domain(currd) )
+        return 0;
+
+    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
+        return 1;
+
+    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
+    if ( write )
+    {
+        const unsigned long *ro_map = pci_get_ro_map(0);
+
+        if ( ro_map && test_bit(machine_bdf, ro_map) )
+            return 0;
+    }
+    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
+    /* AMD extended configuration space access? */
+    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
+         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
+    {
+        uint64_t msr_val;
+
+        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
+            return 0;
+        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
+            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
+    }
+
+    return !write ?
+           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
+                                     start, start + size - 1, 0) == 0 :
+           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
+}
+
+uint32_t guest_io_read(unsigned int port, unsigned int bytes,
+                       struct domain *currd)
+{
+    uint32_t data = 0;
+    unsigned int shift = 0;
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        switch ( bytes )
+        {
+        case 1: return inb(port);
+        case 2: return inw(port);
+        case 4: return inl(port);
+        }
+    }
+
+    while ( bytes != 0 )
+    {
+        unsigned int size = 1;
+        uint32_t sub_data = ~0;
+
+        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
+        {
+            sub_data = pv_pit_handler(port, 0, 0);
+        }
+        else if ( port == RTC_PORT(0) )
+        {
+            sub_data = currd->arch.cmos_idx;
+        }
+        else if ( (port == RTC_PORT(1)) &&
+                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
+        {
+            unsigned long flags;
+
+            spin_lock_irqsave(&rtc_lock, flags);
+            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+            sub_data = inb(RTC_PORT(1));
+            spin_unlock_irqrestore(&rtc_lock, flags);
+        }
+        else if ( (port == 0xcf8) && (bytes == 4) )
+        {
+            size = 4;
+            sub_data = currd->arch.pci_cf8;
+        }
+        else if ( (port & 0xfffc) == 0xcfc )
+        {
+            size = min(bytes, 4 - (port & 3));
+            if ( size == 3 )
+                size = 2;
+            if ( pci_cfg_ok(currd, port & 3, size, NULL) )
+                sub_data = pci_conf_read(currd->arch.pci_cf8, port & 3, size);
+        }
+
+        if ( size == 4 )
+            return sub_data;
+
+        data |= (sub_data & ((1u << (size * 8)) - 1)) << shift;
+        shift += size * 8;
+        port += size;
+        bytes -= size;
+    }
+
+    return data;
+}
+
+static unsigned int check_guest_io_breakpoint(struct vcpu *v,
+    unsigned int port, unsigned int len)
+{
+    unsigned int width, i, match = 0;
+    unsigned long start;
+
+    if ( !(v->arch.debugreg[5]) ||
+         !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
+        return 0;
+
+    for ( i = 0; i < 4; i++ )
+    {
+        if ( !(v->arch.debugreg[5] &
+               (3 << (i * DR_ENABLE_SIZE))) )
+            continue;
+
+        start = v->arch.debugreg[i];
+        width = 0;
+
+        switch ( (v->arch.debugreg[7] >>
+                  (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
+        {
+        case DR_LEN_1: width = 1; break;
+        case DR_LEN_2: width = 2; break;
+        case DR_LEN_4: width = 4; break;
+        case DR_LEN_8: width = 8; break;
+        }
+
+        if ( (start < (port + len)) && ((start + width) > port) )
+            match |= 1 << i;
+    }
+
+    return match;
+}
+
+static int priv_op_read_io(unsigned int port, unsigned int bytes,
+                           unsigned long *val, struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+
+    /* INS must not come here. */
+    ASSERT((ctxt->opcode & ~9) == 0xe4);
+
+    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        io_emul_stub_t *io_emul =
+            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
+
+        mark_regs_dirty(ctxt->regs);
+        io_emul(ctxt->regs);
+        return X86EMUL_DONE;
+    }
+
+    *val = guest_io_read(port, bytes, currd);
+
+    return X86EMUL_OKAY;
+}
+
+void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
+                    struct domain *currd)
+{
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        switch ( bytes ) {
+        case 1:
+            outb((uint8_t)data, port);
+            if ( pv_post_outb_hook )
+                pv_post_outb_hook(port, (uint8_t)data);
+            break;
+        case 2:
+            outw((uint16_t)data, port);
+            break;
+        case 4:
+            outl(data, port);
+            break;
+        }
+        return;
+    }
+
+    while ( bytes != 0 )
+    {
+        unsigned int size = 1;
+
+        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
+        {
+            pv_pit_handler(port, (uint8_t)data, 1);
+        }
+        else if ( port == RTC_PORT(0) )
+        {
+            currd->arch.cmos_idx = data;
+        }
+        else if ( (port == RTC_PORT(1)) &&
+                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
+        {
+            unsigned long flags;
+
+            if ( pv_rtc_handler )
+                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
+            spin_lock_irqsave(&rtc_lock, flags);
+            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+            outb(data, RTC_PORT(1));
+            spin_unlock_irqrestore(&rtc_lock, flags);
+        }
+        else if ( (port == 0xcf8) && (bytes == 4) )
+        {
+            size = 4;
+            currd->arch.pci_cf8 = data;
+        }
+        else if ( (port & 0xfffc) == 0xcfc )
+        {
+            size = min(bytes, 4 - (port & 3));
+            if ( size == 3 )
+                size = 2;
+            if ( pci_cfg_ok(currd, port & 3, size, &data) )
+                pci_conf_write(currd->arch.pci_cf8, port & 3, size, data);
+        }
+
+        if ( size == 4 )
+            return;
+
+        port += size;
+        bytes -= size;
+        data >>= size * 8;
+    }
+}
+
+static int priv_op_write_io(unsigned int port, unsigned int bytes,
+                            unsigned long val, struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+
+    /* OUTS must not come here. */
+    ASSERT((ctxt->opcode & ~9) == 0xe6);
+
+    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        io_emul_stub_t *io_emul =
+            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
+
+        mark_regs_dirty(ctxt->regs);
+        io_emul(ctxt->regs);
+        if ( (bytes == 1) && pv_post_outb_hook )
+            pv_post_outb_hook(port, val);
+        return X86EMUL_DONE;
+    }
+
+    guest_io_write(port, bytes, val, currd);
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_read_segment(enum x86_segment seg,
+                                struct segment_register *reg,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    /* Check if this is an attempt to access the I/O bitmap. */
+    if ( seg == x86_seg_tr )
+    {
+        switch ( ctxt->opcode )
+        {
+        case 0x6c ... 0x6f: /* ins / outs */
+        case 0xe4 ... 0xe7: /* in / out (immediate port) */
+        case 0xec ... 0xef: /* in / out (port in %dx) */
+            /* Defer the check to priv_op_{read,write}_io(). */
+            return X86EMUL_DONE;
+        }
+    }
+
+    if ( ctxt->addr_size < 64 )
+    {
+        unsigned long limit;
+        unsigned int sel, ar;
+
+        switch ( seg )
+        {
+        case x86_seg_cs: sel = ctxt->regs->cs; break;
+        case x86_seg_ds: sel = read_sreg(ds);  break;
+        case x86_seg_es: sel = read_sreg(es);  break;
+        case x86_seg_fs: sel = read_sreg(fs);  break;
+        case x86_seg_gs: sel = read_sreg(gs);  break;
+        case x86_seg_ss: sel = ctxt->regs->ss; break;
+        default: return X86EMUL_UNHANDLEABLE;
+        }
+
+        if ( !read_descriptor(sel, current, &reg->base, &limit, &ar, 0) )
+            return X86EMUL_UNHANDLEABLE;
+
+        reg->limit = limit;
+        reg->attr.bytes = ar >> 8;
+    }
+    else
+    {
+        switch ( seg )
+        {
+        default:
+            if ( !is_x86_user_segment(seg) )
+                return X86EMUL_UNHANDLEABLE;
+            reg->base = 0;
+            break;
+        case x86_seg_fs:
+            reg->base = rdfsbase();
+            break;
+        case x86_seg_gs:
+            reg->base = rdgsbase();
+            break;
+        }
+
+        reg->limit = ~0U;
+
+        reg->attr.bytes = 0;
+        reg->attr.fields.type = _SEGMENT_WR >> 8;
+        if ( seg == x86_seg_cs )
+        {
+            reg->attr.fields.type |= _SEGMENT_CODE >> 8;
+            reg->attr.fields.l = 1;
+        }
+        else
+            reg->attr.fields.db = 1;
+        reg->attr.fields.s   = 1;
+        reg->attr.fields.dpl = 3;
+        reg->attr.fields.p   = 1;
+        reg->attr.fields.g   = 1;
+    }
+
+    /*
+     * For x86_emulate.c's mode_ring0() to work, fake a DPL of zero.
+     * Also do this for consistency for non-conforming code segments.
+     */
+    if ( (seg == x86_seg_ss ||
+          (seg == x86_seg_cs &&
+           !(reg->attr.fields.type & (_SEGMENT_EC >> 8)))) &&
+         guest_kernel_mode(current, ctxt->regs) )
+        reg->attr.fields.dpl = 0;
+
+    return X86EMUL_OKAY;
+}
+
+static int pv_emul_virt_to_linear(unsigned long base, unsigned long offset,
+                                  unsigned int bytes, unsigned long limit,
+                                  enum x86_segment seg,
+                                  struct x86_emulate_ctxt *ctxt,
+                                  unsigned long *addr)
+{
+    int rc = X86EMUL_OKAY;
+
+    *addr = base + offset;
+
+    if ( ctxt->addr_size < 64 )
+    {
+        if ( limit < bytes - 1 || offset > limit - bytes + 1 )
+            rc = X86EMUL_EXCEPTION;
+        *addr = (uint32_t)*addr;
+    }
+    else if ( !__addr_ok(*addr) )
+        rc = X86EMUL_EXCEPTION;
+
+    if ( unlikely(rc == X86EMUL_EXCEPTION) )
+        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
+                                                : TRAP_stack_error,
+                              0, ctxt);
+
+    return rc;
+}
+
+static int priv_op_rep_ins(uint16_t port,
+                           enum x86_segment seg, unsigned long offset,
+                           unsigned int bytes_per_rep, unsigned long *reps,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+    unsigned long goal = *reps;
+    struct segment_register sreg;
+    int rc;
+
+    ASSERT(seg == x86_seg_es);
+
+    *reps = 0;
+
+    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = priv_op_read_segment(x86_seg_es, &sreg, ctxt);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( !sreg.attr.fields.p )
+        return X86EMUL_UNHANDLEABLE;
+    if ( !sreg.attr.fields.s ||
+         (sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) ||
+         !(sreg.attr.fields.type & (_SEGMENT_WR >> 8)) )
+    {
+        x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+
+    while ( *reps < goal )
+    {
+        unsigned int data = guest_io_read(port, bytes_per_rep, currd);
+        unsigned long addr;
+
+        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
+                                    sreg.limit, x86_seg_es, ctxt, &addr);
+        if ( rc != X86EMUL_OKAY )
+            return rc;
+
+        if ( (rc = __copy_to_user((void *)addr, &data, bytes_per_rep)) != 0 )
+        {
+            x86_emul_pagefault(PFEC_write_access,
+                               addr + bytes_per_rep - rc, ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+
+        ++*reps;
+
+        if ( poc->bpmatch || hypercall_preempt_check() )
+            break;
+
+        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
+        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
+            offset -= bytes_per_rep;
+        else
+            offset += bytes_per_rep;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_rep_outs(enum x86_segment seg, unsigned long offset,
+                            uint16_t port,
+                            unsigned int bytes_per_rep, unsigned long *reps,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+    unsigned long goal = *reps;
+    struct segment_register sreg;
+    int rc;
+
+    *reps = 0;
+
+    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = priv_op_read_segment(seg, &sreg, ctxt);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( !sreg.attr.fields.p )
+        return X86EMUL_UNHANDLEABLE;
+    if ( !sreg.attr.fields.s ||
+         ((sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) &&
+          !(sreg.attr.fields.type & (_SEGMENT_WR >> 8))) )
+    {
+        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
+                                                : TRAP_stack_error,
+                              0, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+
+    while ( *reps < goal )
+    {
+        unsigned int data = 0;
+        unsigned long addr;
+
+        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
+                                    sreg.limit, seg, ctxt, &addr);
+        if ( rc != X86EMUL_OKAY )
+            return rc;
+
+        if ( (rc = __copy_from_user(&data, (void *)addr, bytes_per_rep)) != 0 )
+        {
+            x86_emul_pagefault(0, addr + bytes_per_rep - rc, ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+
+        guest_io_write(port, bytes_per_rep, data, currd);
+
+        ++*reps;
+
+        if ( poc->bpmatch || hypercall_preempt_check() )
+            break;
+
+        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
+        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
+            offset -= bytes_per_rep;
+        else
+            offset += bytes_per_rep;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_read_cr(unsigned int reg, unsigned long *val,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    const struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0: /* Read CR0 */
+        *val = (read_cr0() & ~X86_CR0_TS) | curr->arch.pv_vcpu.ctrlreg[0];
+        return X86EMUL_OKAY;
+
+    case 2: /* Read CR2 */
+    case 4: /* Read CR4 */
+        *val = curr->arch.pv_vcpu.ctrlreg[reg];
+        return X86EMUL_OKAY;
+
+    case 3: /* Read CR3 */
+    {
+        const struct domain *currd = curr->domain;
+        unsigned long mfn;
+
+        if ( !is_pv_32bit_domain(currd) )
+        {
+            mfn = pagetable_get_pfn(curr->arch.guest_table);
+            *val = xen_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
+        }
+        else
+        {
+            l4_pgentry_t *pl4e =
+                map_domain_page(_mfn(pagetable_get_pfn(curr->arch.guest_table)));
+
+            mfn = l4e_get_pfn(*pl4e);
+            unmap_domain_page(pl4e);
+            *val = compat_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
+        }
+        /* PTs should not be shared */
+        BUG_ON(page_get_owner(mfn_to_page(mfn)) == dom_cow);
+        return X86EMUL_OKAY;
+    }
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_write_cr(unsigned int reg, unsigned long val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0: /* Write CR0 */
+        if ( (val ^ read_cr0()) & ~X86_CR0_TS )
+        {
+            gdprintk(XENLOG_WARNING,
+                    "Attempt to change unmodifiable CR0 flags\n");
+            break;
+        }
+        do_fpu_taskswitch(!!(val & X86_CR0_TS));
+        return X86EMUL_OKAY;
+
+    case 2: /* Write CR2 */
+        curr->arch.pv_vcpu.ctrlreg[2] = val;
+        arch_set_cr2(curr, val);
+        return X86EMUL_OKAY;
+
+    case 3: /* Write CR3 */
+    {
+        struct domain *currd = curr->domain;
+        unsigned long gfn;
+        struct page_info *page;
+        int rc;
+
+        gfn = !is_pv_32bit_domain(currd)
+              ? xen_cr3_to_pfn(val) : compat_cr3_to_pfn(val);
+        page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
+        if ( !page )
+            break;
+        rc = new_guest_cr3(page_to_mfn(page));
+        put_page(page);
+
+        switch ( rc )
+        {
+        case 0:
+            return X86EMUL_OKAY;
+        case -ERESTART: /* retry after preemption */
+            return X86EMUL_RETRY;
+        }
+        break;
+    }
+
+    case 4: /* Write CR4 */
+        curr->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(curr, val);
+        write_cr4(pv_guest_cr4_to_real_cr4(curr));
+        ctxt_switch_levelling(curr);
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_read_dr(unsigned int reg, unsigned long *val,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    unsigned long res = do_get_debugreg(reg);
+
+    if ( IS_ERR_VALUE(res) )
+        return X86EMUL_UNHANDLEABLE;
+
+    *val = res;
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_write_dr(unsigned int reg, unsigned long val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    return do_set_debugreg(reg, val) == 0
+           ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
+}
+
+static inline uint64_t guest_misc_enable(uint64_t val)
+{
+    val &= ~(MSR_IA32_MISC_ENABLE_PERF_AVAIL |
+             MSR_IA32_MISC_ENABLE_MONITOR_ENABLE);
+    val |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
+           MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
+           MSR_IA32_MISC_ENABLE_XTPR_DISABLE;
+    return val;
+}
+
+static inline bool is_cpufreq_controller(const struct domain *d)
+{
+    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
+            is_hardware_domain(d));
+}
+
+static int priv_op_read_msr(unsigned int reg, uint64_t *val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    const struct vcpu *curr = current;
+    const struct domain *currd = curr->domain;
+    bool vpmu_msr = false;
+
+    switch ( reg )
+    {
+        int rc;
+
+    case MSR_FS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = cpu_has_fsgsbase ? __rdfsbase() : curr->arch.pv_vcpu.fs_base;
+        return X86EMUL_OKAY;
+
+    case MSR_GS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = cpu_has_fsgsbase ? __rdgsbase()
+                                : curr->arch.pv_vcpu.gs_base_kernel;
+        return X86EMUL_OKAY;
+
+    case MSR_SHADOW_GS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = curr->arch.pv_vcpu.gs_base_user;
+        return X86EMUL_OKAY;
+
+    /*
+     * In order to fully retain original behavior, defer calling
+     * pv_soft_rdtsc() until after emulation. This may want/need to be
+     * reconsidered.
+     */
+    case MSR_IA32_TSC:
+        poc->tsc |= TSC_BASE;
+        goto normal;
+
+    case MSR_TSC_AUX:
+        poc->tsc |= TSC_AUX;
+        if ( cpu_has_rdtscp )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_EFER:
+        *val = read_efer();
+        if ( is_pv_32bit_domain(currd) )
+            *val &= ~(EFER_LME | EFER_LMA | EFER_LMSLE);
+        return X86EMUL_OKAY;
+
+    case MSR_K7_FID_VID_CTL:
+    case MSR_K7_FID_VID_STATUS:
+    case MSR_K8_PSTATE_LIMIT:
+    case MSR_K8_PSTATE_CTRL:
+    case MSR_K8_PSTATE_STATUS:
+    case MSR_K8_PSTATE0:
+    case MSR_K8_PSTATE1:
+    case MSR_K8_PSTATE2:
+    case MSR_K8_PSTATE3:
+    case MSR_K8_PSTATE4:
+    case MSR_K8_PSTATE5:
+    case MSR_K8_PSTATE6:
+    case MSR_K8_PSTATE7:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+            break;
+        if ( unlikely(is_cpufreq_controller(currd)) )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_UCODE_REV:
+        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
+                break;
+            /* As documented in the SDM: Do a CPUID 1 here */
+            cpuid_eax(1);
+        }
+        goto normal;
+
+    case MSR_IA32_MISC_ENABLE:
+        if ( rdmsr_safe(reg, *val) )
+            break;
+        *val = guest_misc_enable(*val);
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR0_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
+            break;
+        *val = curr->arch.pv_vcpu.dr_mask[0];
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
+            break;
+        *val = curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1];
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_PERF_CAPABILITIES:
+        /* No extra capabilities are supported. */
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_PLATFORM_INFO:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
+            break;
+        *val = 0;
+        if ( this_cpu(cpuid_faulting_enabled) )
+            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
+            break;
+        *val = 0;
+        if ( curr->arch.cpuid_faulting )
+            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            vpmu_msr = true;
+            /* fall through */
+    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( vpmu_do_rdmsr(reg, val) )
+                    break;
+                return X86EMUL_OKAY;
+            }
+        }
+        /* fall through */
+    default:
+        if ( rdmsr_hypervisor_regs(reg, val) )
+            return X86EMUL_OKAY;
+
+        rc = vmce_rdmsr(reg, val);
+        if ( rc < 0 )
+            break;
+        if ( rc )
+            return X86EMUL_OKAY;
+        /* fall through */
+    normal:
+        /* Everyone can read the MSR space. */
+        /* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
+        if ( rdmsr_safe(reg, *val) )
+            break;
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_write_msr(unsigned int reg, uint64_t val,
+                             struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+    const struct domain *currd = curr->domain;
+    bool vpmu_msr = false;
+
+    switch ( reg )
+    {
+        uint64_t temp;
+        int rc;
+
+    case MSR_FS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrfsbase(val);
+        curr->arch.pv_vcpu.fs_base = val;
+        return X86EMUL_OKAY;
+
+    case MSR_GS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrgsbase(val);
+        curr->arch.pv_vcpu.gs_base_kernel = val;
+        return X86EMUL_OKAY;
+
+    case MSR_SHADOW_GS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrmsrl(MSR_SHADOW_GS_BASE, val);
+        curr->arch.pv_vcpu.gs_base_user = val;
+        return X86EMUL_OKAY;
+
+    case MSR_K7_FID_VID_STATUS:
+    case MSR_K7_FID_VID_CTL:
+    case MSR_K8_PSTATE_LIMIT:
+    case MSR_K8_PSTATE_CTRL:
+    case MSR_K8_PSTATE_STATUS:
+    case MSR_K8_PSTATE0:
+    case MSR_K8_PSTATE1:
+    case MSR_K8_PSTATE2:
+    case MSR_K8_PSTATE3:
+    case MSR_K8_PSTATE4:
+    case MSR_K8_PSTATE5:
+    case MSR_K8_PSTATE6:
+    case MSR_K8_PSTATE7:
+    case MSR_K8_HWCR:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_AMD64_NB_CFG:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
+             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
+             ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
+            goto invalid;
+        if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_FAM10H_MMIO_CONF_BASE:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
+             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( rdmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, temp) != 0 )
+            break;
+        if ( (pci_probe & PCI_PROBE_MASK) == PCI_PROBE_MMCONF ?
+             temp != val :
+             ((temp ^ val) &
+              ~(FAM10H_MMIO_CONF_ENABLE |
+                (FAM10H_MMIO_CONF_BUSRANGE_MASK <<
+                 FAM10H_MMIO_CONF_BUSRANGE_SHIFT) |
+                ((u64)FAM10H_MMIO_CONF_BASE_MASK <<
+                 FAM10H_MMIO_CONF_BASE_SHIFT))) )
+            goto invalid;
+        if ( wrmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_UCODE_REV:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( rdmsr_safe(reg, temp) )
+            break;
+        if ( val )
+            goto invalid;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_MISC_ENABLE:
+        if ( rdmsr_safe(reg, temp) )
+            break;
+        if ( val != guest_misc_enable(temp) )
+            goto invalid;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_MPERF:
+    case MSR_IA32_APERF:
+        if ( (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) &&
+             (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_PERF_CTL:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_THERM_CONTROL:
+    case MSR_IA32_ENERGY_PERF_BIAS:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_AMD64_DR0_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
+            break;
+        curr->arch.pv_vcpu.dr_mask[0] = val;
+        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
+            wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, val);
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
+            break;
+        curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1] = val;
+        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
+            wrmsrl(reg, val);
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_PLATFORM_INFO:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
+            break;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
+            break;
+        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
+             !this_cpu(cpuid_faulting_enabled) )
+            break;
+        curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
+        return X86EMUL_OKAY;
+
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            vpmu_msr = true;
+    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                     !is_hardware_domain(currd) )
+                    return X86EMUL_OKAY;
+
+                if ( vpmu_do_wrmsr(reg, val, 0) )
+                    break;
+                return X86EMUL_OKAY;
+            }
+        }
+        /* fall through */
+    default:
+        if ( wrmsr_hypervisor_regs(reg, val) == 1 )
+            return X86EMUL_OKAY;
+
+        rc = vmce_wrmsr(reg, val);
+        if ( rc < 0 )
+            break;
+        if ( rc )
+            return X86EMUL_OKAY;
+
+        if ( (rdmsr_safe(reg, temp) != 0) || (val != temp) )
+    invalid:
+            gdprintk(XENLOG_WARNING,
+                     "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 0x%016"PRIx64"\n",
+                     reg, temp, val);
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_wbinvd(struct x86_emulate_ctxt *ctxt)
+{
+    /* Ignore the instruction if unprivileged. */
+    if ( !cache_flush_permitted(current->domain) )
+        /*
+         * Non-physdev domain attempted WBINVD; ignore for now since
+         * newer linux uses this in some start-of-day timing loops.
+         */
+        ;
+    else
+        wbinvd();
+
+    return X86EMUL_OKAY;
+}
+
+int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
+                  struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt)
+{
+    guest_cpuid(current, leaf, subleaf, res);
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_validate(const struct x86_emulate_state *state,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    switch ( ctxt->opcode )
+    {
+    case 0x6c ... 0x6f: /* ins / outs */
+    case 0xe4 ... 0xe7: /* in / out (immediate port) */
+    case 0xec ... 0xef: /* in / out (port in %dx) */
+    case X86EMUL_OPC(0x0f, 0x06): /* clts */
+    case X86EMUL_OPC(0x0f, 0x09): /* wbinvd */
+    case X86EMUL_OPC(0x0f, 0x20) ...
+         X86EMUL_OPC(0x0f, 0x23): /* mov to/from cr/dr */
+    case X86EMUL_OPC(0x0f, 0x30): /* wrmsr */
+    case X86EMUL_OPC(0x0f, 0x31): /* rdtsc */
+    case X86EMUL_OPC(0x0f, 0x32): /* rdmsr */
+    case X86EMUL_OPC(0x0f, 0xa2): /* cpuid */
+        return X86EMUL_OKAY;
+
+    case 0xfa: case 0xfb: /* cli / sti */
+        if ( !iopl_ok(current, ctxt->regs) )
+            break;
+        /*
+         * This is just too dangerous to allow, in my opinion. Consider if the
+         * caller then tries to reenable interrupts using POPF: we can't trap
+         * that and we'll end up with hard-to-debug lockups. Fast & loose will
+         * do for us. :-)
+        vcpu_info(current, evtchn_upcall_mask) = (ctxt->opcode == 0xfa);
+         */
+        return X86EMUL_DONE;
+
+    case X86EMUL_OPC(0x0f, 0x01):
+    {
+        unsigned int modrm_rm, modrm_reg;
+
+        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
+             (modrm_rm & 7) != 1 )
+            break;
+        switch ( modrm_reg & 7 )
+        {
+        case 2: /* xsetbv */
+        case 7: /* rdtscp */
+            return X86EMUL_OKAY;
+        }
+        break;
+    }
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_insn_fetch(enum x86_segment seg,
+                              unsigned long offset,
+                              void *p_data,
+                              unsigned int bytes,
+                              struct x86_emulate_ctxt *ctxt)
+{
+    const struct priv_op_ctxt *poc =
+        container_of(ctxt, struct priv_op_ctxt, ctxt);
+    unsigned int rc;
+    unsigned long addr = poc->cs.base + offset;
+
+    ASSERT(seg == x86_seg_cs);
+
+    /* We don't mean to emulate any branches. */
+    if ( !bytes )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = pv_emul_virt_to_linear(poc->cs.base, offset, bytes, poc->cs.limit,
+                                x86_seg_cs, ctxt, &addr);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) != 0 )
+    {
+        /*
+         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
+         * cpu_has_nx, but we'd then need a "fetch" variant of
+         * __copy_from_user() respecting NX, SMEP, and protection keys.
+         */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+
+static const struct x86_emulate_ops priv_op_ops = {
+    .insn_fetch          = priv_op_insn_fetch,
+    .read                = x86emul_unhandleable_rw,
+    .validate            = priv_op_validate,
+    .read_io             = priv_op_read_io,
+    .write_io            = priv_op_write_io,
+    .rep_ins             = priv_op_rep_ins,
+    .rep_outs            = priv_op_rep_outs,
+    .read_segment        = priv_op_read_segment,
+    .read_cr             = priv_op_read_cr,
+    .write_cr            = priv_op_write_cr,
+    .read_dr             = priv_op_read_dr,
+    .write_dr            = priv_op_write_dr,
+    .read_msr            = priv_op_read_msr,
+    .write_msr           = priv_op_write_msr,
+    .cpuid               = pv_emul_cpuid,
+    .wbinvd              = priv_op_wbinvd,
+};
+
+int emulate_privileged_op(struct cpu_user_regs *regs)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct priv_op_ctxt ctxt = {
+        .ctxt.regs = regs,
+        .ctxt.vendor = currd->arch.cpuid->x86_vendor,
+        .ctxt.lma = !is_pv_32bit_domain(currd),
+    };
+    int rc;
+    unsigned int eflags, ar;
+
+    if ( !read_descriptor(regs->cs, curr, &ctxt.cs.base, &ctxt.cs.limit,
+                          &ar, 1) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_P) ||
+         !(ar & _SEGMENT_CODE) )
+        return 0;
+
+    /* Mirror virtualized state into EFLAGS. */
+    ASSERT(regs->eflags & X86_EFLAGS_IF);
+    if ( vcpu_info(curr, evtchn_upcall_mask) )
+        regs->eflags &= ~X86_EFLAGS_IF;
+    else
+        regs->eflags |= X86_EFLAGS_IF;
+    ASSERT(!(regs->eflags & X86_EFLAGS_IOPL));
+    regs->eflags |= curr->arch.pv_vcpu.iopl;
+    eflags = regs->eflags;
+
+    ctxt.ctxt.addr_size = ar & _SEGMENT_L ? 64 : ar & _SEGMENT_DB ? 32 : 16;
+    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed. */
+    rc = x86_emulate(&ctxt.ctxt, &priv_op_ops);
+
+    if ( ctxt.io_emul_stub )
+        unmap_domain_page(ctxt.io_emul_stub);
+
+    /*
+     * Un-mirror virtualized state from EFLAGS.
+     * Nothing we allow to be emulated can change anything other than the
+     * arithmetic bits, and the resume flag.
+     */
+    ASSERT(!((regs->eflags ^ eflags) &
+             ~(X86_EFLAGS_RF | X86_EFLAGS_ARITH_MASK)));
+    regs->eflags |= X86_EFLAGS_IF;
+    regs->eflags &= ~X86_EFLAGS_IOPL;
+
+    switch ( rc )
+    {
+    case X86EMUL_OKAY:
+        if ( ctxt.tsc & TSC_BASE )
+        {
+            if ( ctxt.tsc & TSC_AUX )
+                pv_soft_rdtsc(curr, regs, 1);
+            else if ( currd->arch.vtsc )
+                pv_soft_rdtsc(curr, regs, 0);
+            else
+                msr_split(regs, rdtsc());
+        }
+
+        if ( ctxt.ctxt.retire.singlestep )
+            ctxt.bpmatch |= DR_STEP;
+        if ( ctxt.bpmatch )
+        {
+            curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
+            if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
+                pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+        }
+        /* fall through */
+    case X86EMUL_RETRY:
+        return EXCRET_fault_fixed;
+
+    case X86EMUL_EXCEPTION:
+        pv_inject_event(&ctxt.ctxt.event);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index cd8ca20398..cd43e9f44c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -78,6 +78,8 @@
 #include <asm/cpuid.h>
 #include <xsm/xsm.h>
 
+#include <asm/pv/traps.h>
+
 /*
  * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
  *  fatal:  Xen prints diagnostic message and then hangs.
@@ -705,41 +707,6 @@ static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
     }
 }
 
-static unsigned int check_guest_io_breakpoint(struct vcpu *v,
-    unsigned int port, unsigned int len)
-{
-    unsigned int width, i, match = 0;
-    unsigned long start;
-
-    if ( !(v->arch.debugreg[5]) ||
-         !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
-        return 0;
-
-    for ( i = 0; i < 4; i++ )
-    {
-        if ( !(v->arch.debugreg[5] &
-               (3 << (i * DR_ENABLE_SIZE))) )
-            continue;
-
-        start = v->arch.debugreg[i];
-        width = 0;
-
-        switch ( (v->arch.debugreg[7] >>
-                  (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
-        {
-        case DR_LEN_1: width = 1; break;
-        case DR_LEN_2: width = 2; break;
-        case DR_LEN_4: width = 4; break;
-        case DR_LEN_8: width = 8; break;
-        }
-
-        if ( (start < (port + len)) && ((start + width) > port) )
-            match |= 1 << i;
-    }
-
-    return match;
-}
-
 /*
  * Called from asm to set up the MCE trapbounce info.
  * Returns 0 if no callback is set up, else 1.
@@ -1733,1327 +1700,6 @@ static int read_gate_descriptor(unsigned int gate_sel,
     return 1;
 }
 
-static int pv_emul_virt_to_linear(unsigned long base, unsigned long offset,
-                                  unsigned int bytes, unsigned long limit,
-                                  enum x86_segment seg,
-                                  struct x86_emulate_ctxt *ctxt,
-                                  unsigned long *addr)
-{
-    int rc = X86EMUL_OKAY;
-
-    *addr = base + offset;
-
-    if ( ctxt->addr_size < 64 )
-    {
-        if ( limit < bytes - 1 || offset > limit - bytes + 1 )
-            rc = X86EMUL_EXCEPTION;
-        *addr = (uint32_t)*addr;
-    }
-    else if ( !__addr_ok(*addr) )
-        rc = X86EMUL_EXCEPTION;
-
-    if ( unlikely(rc == X86EMUL_EXCEPTION) )
-        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
-                                                : TRAP_stack_error,
-                              0, ctxt);
-
-    return rc;
-}
-
-struct priv_op_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    struct {
-        unsigned long base, limit;
-    } cs;
-    char *io_emul_stub;
-    unsigned int bpmatch;
-    unsigned int tsc;
-#define TSC_BASE 1
-#define TSC_AUX 2
-};
-
-static int priv_op_insn_fetch(enum x86_segment seg,
-                              unsigned long offset,
-                              void *p_data,
-                              unsigned int bytes,
-                              struct x86_emulate_ctxt *ctxt)
-{
-    const struct priv_op_ctxt *poc =
-        container_of(ctxt, struct priv_op_ctxt, ctxt);
-    unsigned int rc;
-    unsigned long addr = poc->cs.base + offset;
-
-    ASSERT(seg == x86_seg_cs);
-
-    /* We don't mean to emulate any branches. */
-    if ( !bytes )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = pv_emul_virt_to_linear(poc->cs.base, offset, bytes, poc->cs.limit,
-                                x86_seg_cs, ctxt, &addr);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) != 0 )
-    {
-        /*
-         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
-         * cpu_has_nx, but we'd then need a "fetch" variant of
-         * __copy_from_user() respecting NX, SMEP, and protection keys.
-         */
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_read_segment(enum x86_segment seg,
-                                struct segment_register *reg,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    /* Check if this is an attempt to access the I/O bitmap. */
-    if ( seg == x86_seg_tr )
-    {
-        switch ( ctxt->opcode )
-        {
-        case 0x6c ... 0x6f: /* ins / outs */
-        case 0xe4 ... 0xe7: /* in / out (immediate port) */
-        case 0xec ... 0xef: /* in / out (port in %dx) */
-            /* Defer the check to priv_op_{read,write}_io(). */
-            return X86EMUL_DONE;
-        }
-    }
-
-    if ( ctxt->addr_size < 64 )
-    {
-        unsigned long limit;
-        unsigned int sel, ar;
-
-        switch ( seg )
-        {
-        case x86_seg_cs: sel = ctxt->regs->cs; break;
-        case x86_seg_ds: sel = read_sreg(ds);  break;
-        case x86_seg_es: sel = read_sreg(es);  break;
-        case x86_seg_fs: sel = read_sreg(fs);  break;
-        case x86_seg_gs: sel = read_sreg(gs);  break;
-        case x86_seg_ss: sel = ctxt->regs->ss; break;
-        default: return X86EMUL_UNHANDLEABLE;
-        }
-
-        if ( !read_descriptor(sel, current, &reg->base, &limit, &ar, 0) )
-            return X86EMUL_UNHANDLEABLE;
-
-        reg->limit = limit;
-        reg->attr.bytes = ar >> 8;
-    }
-    else
-    {
-        switch ( seg )
-        {
-        default:
-            if ( !is_x86_user_segment(seg) )
-                return X86EMUL_UNHANDLEABLE;
-            reg->base = 0;
-            break;
-        case x86_seg_fs:
-            reg->base = rdfsbase();
-            break;
-        case x86_seg_gs:
-            reg->base = rdgsbase();
-            break;
-        }
-
-        reg->limit = ~0U;
-
-        reg->attr.bytes = 0;
-        reg->attr.fields.type = _SEGMENT_WR >> 8;
-        if ( seg == x86_seg_cs )
-        {
-            reg->attr.fields.type |= _SEGMENT_CODE >> 8;
-            reg->attr.fields.l = 1;
-        }
-        else
-            reg->attr.fields.db = 1;
-        reg->attr.fields.s   = 1;
-        reg->attr.fields.dpl = 3;
-        reg->attr.fields.p   = 1;
-        reg->attr.fields.g   = 1;
-    }
-
-    /*
-     * For x86_emulate.c's mode_ring0() to work, fake a DPL of zero.
-     * Also do this for consistency for non-conforming code segments.
-     */
-    if ( (seg == x86_seg_ss ||
-          (seg == x86_seg_cs &&
-           !(reg->attr.fields.type & (_SEGMENT_EC >> 8)))) &&
-         guest_kernel_mode(current, ctxt->regs) )
-        reg->attr.fields.dpl = 0;
-
-    return X86EMUL_OKAY;
-}
-
-/* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
-static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
-{
-    unsigned int cpl = guest_kernel_mode(v, regs) ?
-        (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
-
-    ASSERT((v->arch.pv_vcpu.iopl & ~X86_EFLAGS_IOPL) == 0);
-
-    return IOPL(cpl) <= v->arch.pv_vcpu.iopl;
-}
-
-/* Has the guest requested sufficient permission for this I/O access? */
-static int guest_io_okay(
-    unsigned int port, unsigned int bytes,
-    struct vcpu *v, struct cpu_user_regs *regs)
-{
-    /* If in user mode, switch to kernel mode just to read I/O bitmap. */
-    int user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
-    if ( iopl_ok(v, regs) )
-        return 1;
-
-    if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
-    {
-        union { uint8_t bytes[2]; uint16_t mask; } x;
-
-        /*
-         * Grab permission bytes from guest space. Inaccessible bytes are
-         * read as 0xff (no access allowed).
-         */
-        TOGGLE_MODE();
-        switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
-                                          port>>3, 2) )
-        {
-        default: x.bytes[0] = ~0;
-            /* fallthrough */
-        case 1:  x.bytes[1] = ~0;
-            /* fallthrough */
-        case 0:  break;
-        }
-        TOGGLE_MODE();
-
-        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
-            return 1;
-    }
-
-    return 0;
-}
-
-/* Has the administrator granted sufficient permission for this I/O access? */
-static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
-                            const struct domain *d)
-{
-    /*
-     * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
-     * We never permit direct access to that register.
-     */
-    if ( (port == 0xcf8) && (bytes == 4) )
-        return 0;
-
-    /* We also never permit direct access to the RTC/CMOS registers. */
-    if ( ((port & ~1) == RTC_PORT(0)) )
-        return 0;
-
-    return ioports_access_permitted(d, port, port + bytes - 1);
-}
-
-static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
-                         unsigned int size, uint32_t *write)
-{
-    uint32_t machine_bdf;
-
-    if ( !is_hardware_domain(currd) )
-        return 0;
-
-    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
-        return 1;
-
-    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
-    if ( write )
-    {
-        const unsigned long *ro_map = pci_get_ro_map(0);
-
-        if ( ro_map && test_bit(machine_bdf, ro_map) )
-            return 0;
-    }
-    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
-    /* AMD extended configuration space access? */
-    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
-         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
-    {
-        uint64_t msr_val;
-
-        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
-            return 0;
-        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
-            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
-    }
-
-    return !write ?
-           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
-                                     start, start + size - 1, 0) == 0 :
-           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
-}
-
-uint32_t guest_io_read(unsigned int port, unsigned int bytes,
-                       struct domain *currd)
-{
-    uint32_t data = 0;
-    unsigned int shift = 0;
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        switch ( bytes )
-        {
-        case 1: return inb(port);
-        case 2: return inw(port);
-        case 4: return inl(port);
-        }
-    }
-
-    while ( bytes != 0 )
-    {
-        unsigned int size = 1;
-        uint32_t sub_data = ~0;
-
-        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
-        {
-            sub_data = pv_pit_handler(port, 0, 0);
-        }
-        else if ( port == RTC_PORT(0) )
-        {
-            sub_data = currd->arch.cmos_idx;
-        }
-        else if ( (port == RTC_PORT(1)) &&
-                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
-        {
-            unsigned long flags;
-
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            sub_data = inb(RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
-        }
-        else if ( (port == 0xcf8) && (bytes == 4) )
-        {
-            size = 4;
-            sub_data = currd->arch.pci_cf8;
-        }
-        else if ( (port & 0xfffc) == 0xcfc )
-        {
-            size = min(bytes, 4 - (port & 3));
-            if ( size == 3 )
-                size = 2;
-            if ( pci_cfg_ok(currd, port & 3, size, NULL) )
-                sub_data = pci_conf_read(currd->arch.pci_cf8, port & 3, size);
-        }
-
-        if ( size == 4 )
-            return sub_data;
-
-        data |= (sub_data & ((1u << (size * 8)) - 1)) << shift;
-        shift += size * 8;
-        port += size;
-        bytes -= size;
-    }
-
-    return data;
-}
-
-void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
-                    struct domain *currd)
-{
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        switch ( bytes ) {
-        case 1:
-            outb((uint8_t)data, port);
-            if ( pv_post_outb_hook )
-                pv_post_outb_hook(port, (uint8_t)data);
-            break;
-        case 2:
-            outw((uint16_t)data, port);
-            break;
-        case 4:
-            outl(data, port);
-            break;
-        }
-        return;
-    }
-
-    while ( bytes != 0 )
-    {
-        unsigned int size = 1;
-
-        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
-        {
-            pv_pit_handler(port, (uint8_t)data, 1);
-        }
-        else if ( port == RTC_PORT(0) )
-        {
-            currd->arch.cmos_idx = data;
-        }
-        else if ( (port == RTC_PORT(1)) &&
-                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
-        {
-            unsigned long flags;
-
-            if ( pv_rtc_handler )
-                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            outb(data, RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
-        }
-        else if ( (port == 0xcf8) && (bytes == 4) )
-        {
-            size = 4;
-            currd->arch.pci_cf8 = data;
-        }
-        else if ( (port & 0xfffc) == 0xcfc )
-        {
-            size = min(bytes, 4 - (port & 3));
-            if ( size == 3 )
-                size = 2;
-            if ( pci_cfg_ok(currd, port & 3, size, &data) )
-                pci_conf_write(currd->arch.pci_cf8, port & 3, size, data);
-        }
-
-        if ( size == 4 )
-            return;
-
-        port += size;
-        bytes -= size;
-        data >>= size * 8;
-    }
-}
-
-/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
-void host_to_guest_gpr_switch(struct cpu_user_regs *);
-unsigned long guest_to_host_gpr_switch(unsigned long);
-
-void (*pv_post_outb_hook)(unsigned int port, u8 value);
-
-typedef void io_emul_stub_t(struct cpu_user_regs *);
-
-static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
-                                          unsigned int port, unsigned int bytes)
-{
-    if ( !ctxt->io_emul_stub )
-        ctxt->io_emul_stub = map_domain_page(_mfn(this_cpu(stubs.mfn))) +
-                                             (this_cpu(stubs.addr) &
-                                              ~PAGE_MASK) +
-                                             STUB_BUF_SIZE / 2;
-
-    /* movq $host_to_guest_gpr_switch,%rcx */
-    ctxt->io_emul_stub[0] = 0x48;
-    ctxt->io_emul_stub[1] = 0xb9;
-    *(void **)&ctxt->io_emul_stub[2] = (void *)host_to_guest_gpr_switch;
-    /* callq *%rcx */
-    ctxt->io_emul_stub[10] = 0xff;
-    ctxt->io_emul_stub[11] = 0xd1;
-    /* data16 or nop */
-    ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
-    /* <io-access opcode> */
-    ctxt->io_emul_stub[13] = opcode;
-    /* imm8 or nop */
-    ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
-    /* ret (jumps to guest_to_host_gpr_switch) */
-    ctxt->io_emul_stub[15] = 0xc3;
-    BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
-
-    if ( ioemul_handle_quirk )
-        ioemul_handle_quirk(opcode, &ctxt->io_emul_stub[12], ctxt->ctxt.regs);
-
-    /* Handy function-typed pointer to the stub. */
-    return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
-}
-
-static int priv_op_read_io(unsigned int port, unsigned int bytes,
-                           unsigned long *val, struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-
-    /* INS must not come here. */
-    ASSERT((ctxt->opcode & ~9) == 0xe4);
-
-    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        io_emul_stub_t *io_emul =
-            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
-
-        mark_regs_dirty(ctxt->regs);
-        io_emul(ctxt->regs);
-        return X86EMUL_DONE;
-    }
-
-    *val = guest_io_read(port, bytes, currd);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_write_io(unsigned int port, unsigned int bytes,
-                            unsigned long val, struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-
-    /* OUTS must not come here. */
-    ASSERT((ctxt->opcode & ~9) == 0xe6);
-
-    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        io_emul_stub_t *io_emul =
-            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
-
-        mark_regs_dirty(ctxt->regs);
-        io_emul(ctxt->regs);
-        if ( (bytes == 1) && pv_post_outb_hook )
-            pv_post_outb_hook(port, val);
-        return X86EMUL_DONE;
-    }
-
-    guest_io_write(port, bytes, val, currd);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_rep_ins(uint16_t port,
-                           enum x86_segment seg, unsigned long offset,
-                           unsigned int bytes_per_rep, unsigned long *reps,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-    unsigned long goal = *reps;
-    struct segment_register sreg;
-    int rc;
-
-    ASSERT(seg == x86_seg_es);
-
-    *reps = 0;
-
-    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = priv_op_read_segment(x86_seg_es, &sreg, ctxt);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( !sreg.attr.fields.p )
-        return X86EMUL_UNHANDLEABLE;
-    if ( !sreg.attr.fields.s ||
-         (sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) ||
-         !(sreg.attr.fields.type & (_SEGMENT_WR >> 8)) )
-    {
-        x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
-
-    while ( *reps < goal )
-    {
-        unsigned int data = guest_io_read(port, bytes_per_rep, currd);
-        unsigned long addr;
-
-        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
-                                    sreg.limit, x86_seg_es, ctxt, &addr);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-
-        if ( (rc = __copy_to_user((void *)addr, &data, bytes_per_rep)) != 0 )
-        {
-            x86_emul_pagefault(PFEC_write_access,
-                               addr + bytes_per_rep - rc, ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-
-        ++*reps;
-
-        if ( poc->bpmatch || hypercall_preempt_check() )
-            break;
-
-        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
-        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
-            offset -= bytes_per_rep;
-        else
-            offset += bytes_per_rep;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_rep_outs(enum x86_segment seg, unsigned long offset,
-                            uint16_t port,
-                            unsigned int bytes_per_rep, unsigned long *reps,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-    unsigned long goal = *reps;
-    struct segment_register sreg;
-    int rc;
-
-    *reps = 0;
-
-    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = priv_op_read_segment(seg, &sreg, ctxt);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( !sreg.attr.fields.p )
-        return X86EMUL_UNHANDLEABLE;
-    if ( !sreg.attr.fields.s ||
-         ((sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) &&
-          !(sreg.attr.fields.type & (_SEGMENT_WR >> 8))) )
-    {
-        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
-                                                : TRAP_stack_error,
-                              0, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
-
-    while ( *reps < goal )
-    {
-        unsigned int data = 0;
-        unsigned long addr;
-
-        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
-                                    sreg.limit, seg, ctxt, &addr);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-
-        if ( (rc = __copy_from_user(&data, (void *)addr, bytes_per_rep)) != 0 )
-        {
-            x86_emul_pagefault(0, addr + bytes_per_rep - rc, ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-
-        guest_io_write(port, bytes_per_rep, data, currd);
-
-        ++*reps;
-
-        if ( poc->bpmatch || hypercall_preempt_check() )
-            break;
-
-        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
-        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
-            offset -= bytes_per_rep;
-        else
-            offset += bytes_per_rep;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_read_cr(unsigned int reg, unsigned long *val,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    const struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0: /* Read CR0 */
-        *val = (read_cr0() & ~X86_CR0_TS) | curr->arch.pv_vcpu.ctrlreg[0];
-        return X86EMUL_OKAY;
-
-    case 2: /* Read CR2 */
-    case 4: /* Read CR4 */
-        *val = curr->arch.pv_vcpu.ctrlreg[reg];
-        return X86EMUL_OKAY;
-
-    case 3: /* Read CR3 */
-    {
-        const struct domain *currd = curr->domain;
-        unsigned long mfn;
-
-        if ( !is_pv_32bit_domain(currd) )
-        {
-            mfn = pagetable_get_pfn(curr->arch.guest_table);
-            *val = xen_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
-        }
-        else
-        {
-            l4_pgentry_t *pl4e =
-                map_domain_page(_mfn(pagetable_get_pfn(curr->arch.guest_table)));
-
-            mfn = l4e_get_pfn(*pl4e);
-            unmap_domain_page(pl4e);
-            *val = compat_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
-        }
-        /* PTs should not be shared */
-        BUG_ON(page_get_owner(mfn_to_page(mfn)) == dom_cow);
-        return X86EMUL_OKAY;
-    }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_write_cr(unsigned int reg, unsigned long val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0: /* Write CR0 */
-        if ( (val ^ read_cr0()) & ~X86_CR0_TS )
-        {
-            gdprintk(XENLOG_WARNING,
-                    "Attempt to change unmodifiable CR0 flags\n");
-            break;
-        }
-        do_fpu_taskswitch(!!(val & X86_CR0_TS));
-        return X86EMUL_OKAY;
-
-    case 2: /* Write CR2 */
-        curr->arch.pv_vcpu.ctrlreg[2] = val;
-        arch_set_cr2(curr, val);
-        return X86EMUL_OKAY;
-
-    case 3: /* Write CR3 */
-    {
-        struct domain *currd = curr->domain;
-        unsigned long gfn;
-        struct page_info *page;
-        int rc;
-
-        gfn = !is_pv_32bit_domain(currd)
-              ? xen_cr3_to_pfn(val) : compat_cr3_to_pfn(val);
-        page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
-        if ( !page )
-            break;
-        rc = new_guest_cr3(page_to_mfn(page));
-        put_page(page);
-
-        switch ( rc )
-        {
-        case 0:
-            return X86EMUL_OKAY;
-        case -ERESTART: /* retry after preemption */
-            return X86EMUL_RETRY;
-        }
-        break;
-    }
-
-    case 4: /* Write CR4 */
-        curr->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(curr, val);
-        write_cr4(pv_guest_cr4_to_real_cr4(curr));
-        ctxt_switch_levelling(curr);
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_read_dr(unsigned int reg, unsigned long *val,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    unsigned long res = do_get_debugreg(reg);
-
-    if ( IS_ERR_VALUE(res) )
-        return X86EMUL_UNHANDLEABLE;
-
-    *val = res;
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_write_dr(unsigned int reg, unsigned long val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    return do_set_debugreg(reg, val) == 0
-           ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
-}
-
-static inline uint64_t guest_misc_enable(uint64_t val)
-{
-    val &= ~(MSR_IA32_MISC_ENABLE_PERF_AVAIL |
-             MSR_IA32_MISC_ENABLE_MONITOR_ENABLE);
-    val |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
-           MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
-           MSR_IA32_MISC_ENABLE_XTPR_DISABLE;
-    return val;
-}
-
-static inline bool is_cpufreq_controller(const struct domain *d)
-{
-    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
-            is_hardware_domain(d));
-}
-
-static int priv_op_read_msr(unsigned int reg, uint64_t *val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    const struct vcpu *curr = current;
-    const struct domain *currd = curr->domain;
-    bool vpmu_msr = false;
-
-    switch ( reg )
-    {
-        int rc;
-
-    case MSR_FS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = cpu_has_fsgsbase ? __rdfsbase() : curr->arch.pv_vcpu.fs_base;
-        return X86EMUL_OKAY;
-
-    case MSR_GS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = cpu_has_fsgsbase ? __rdgsbase()
-                                : curr->arch.pv_vcpu.gs_base_kernel;
-        return X86EMUL_OKAY;
-
-    case MSR_SHADOW_GS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = curr->arch.pv_vcpu.gs_base_user;
-        return X86EMUL_OKAY;
-
-    /*
-     * In order to fully retain original behavior, defer calling
-     * pv_soft_rdtsc() until after emulation. This may want/need to be
-     * reconsidered.
-     */
-    case MSR_IA32_TSC:
-        poc->tsc |= TSC_BASE;
-        goto normal;
-
-    case MSR_TSC_AUX:
-        poc->tsc |= TSC_AUX;
-        if ( cpu_has_rdtscp )
-            goto normal;
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_EFER:
-        *val = read_efer();
-        if ( is_pv_32bit_domain(currd) )
-            *val &= ~(EFER_LME | EFER_LMA | EFER_LMSLE);
-        return X86EMUL_OKAY;
-
-    case MSR_K7_FID_VID_CTL:
-    case MSR_K7_FID_VID_STATUS:
-    case MSR_K8_PSTATE_LIMIT:
-    case MSR_K8_PSTATE_CTRL:
-    case MSR_K8_PSTATE_STATUS:
-    case MSR_K8_PSTATE0:
-    case MSR_K8_PSTATE1:
-    case MSR_K8_PSTATE2:
-    case MSR_K8_PSTATE3:
-    case MSR_K8_PSTATE4:
-    case MSR_K8_PSTATE5:
-    case MSR_K8_PSTATE6:
-    case MSR_K8_PSTATE7:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            break;
-        if ( unlikely(is_cpufreq_controller(currd)) )
-            goto normal;
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_UCODE_REV:
-        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
-                break;
-            /* As documented in the SDM: Do a CPUID 1 here */
-            cpuid_eax(1);
-        }
-        goto normal;
-
-    case MSR_IA32_MISC_ENABLE:
-        if ( rdmsr_safe(reg, *val) )
-            break;
-        *val = guest_misc_enable(*val);
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR0_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            break;
-        *val = curr->arch.pv_vcpu.dr_mask[0];
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            break;
-        *val = curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1];
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_PERF_CAPABILITIES:
-        /* No extra capabilities are supported. */
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_PLATFORM_INFO:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
-            break;
-        *val = 0;
-        if ( this_cpu(cpuid_faulting_enabled) )
-            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_MISC_FEATURES_ENABLES:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
-            break;
-        *val = 0;
-        if ( curr->arch.cpuid_faulting )
-            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
-        return X86EMUL_OKAY;
-
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            vpmu_msr = true;
-            /* fall through */
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
-            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
-            {
-                if ( vpmu_do_rdmsr(reg, val) )
-                    break;
-                return X86EMUL_OKAY;
-            }
-        }
-        /* fall through */
-    default:
-        if ( rdmsr_hypervisor_regs(reg, val) )
-            return X86EMUL_OKAY;
-
-        rc = vmce_rdmsr(reg, val);
-        if ( rc < 0 )
-            break;
-        if ( rc )
-            return X86EMUL_OKAY;
-        /* fall through */
-    normal:
-        /* Everyone can read the MSR space. */
-        /* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
-        if ( rdmsr_safe(reg, *val) )
-            break;
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-#include "x86_64/mmconfig.h"
-
-static int priv_op_write_msr(unsigned int reg, uint64_t val,
-                             struct x86_emulate_ctxt *ctxt)
-{
-    struct vcpu *curr = current;
-    const struct domain *currd = curr->domain;
-    bool vpmu_msr = false;
-
-    switch ( reg )
-    {
-        uint64_t temp;
-        int rc;
-
-    case MSR_FS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrfsbase(val);
-        curr->arch.pv_vcpu.fs_base = val;
-        return X86EMUL_OKAY;
-
-    case MSR_GS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrgsbase(val);
-        curr->arch.pv_vcpu.gs_base_kernel = val;
-        return X86EMUL_OKAY;
-
-    case MSR_SHADOW_GS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrmsrl(MSR_SHADOW_GS_BASE, val);
-        curr->arch.pv_vcpu.gs_base_user = val;
-        return X86EMUL_OKAY;
-
-    case MSR_K7_FID_VID_STATUS:
-    case MSR_K7_FID_VID_CTL:
-    case MSR_K8_PSTATE_LIMIT:
-    case MSR_K8_PSTATE_CTRL:
-    case MSR_K8_PSTATE_STATUS:
-    case MSR_K8_PSTATE0:
-    case MSR_K8_PSTATE1:
-    case MSR_K8_PSTATE2:
-    case MSR_K8_PSTATE3:
-    case MSR_K8_PSTATE4:
-    case MSR_K8_PSTATE5:
-    case MSR_K8_PSTATE6:
-    case MSR_K8_PSTATE7:
-    case MSR_K8_HWCR:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_AMD64_NB_CFG:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
-             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
-             ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-            goto invalid;
-        if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_FAM10H_MMIO_CONF_BASE:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
-             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, temp) != 0 )
-            break;
-        if ( (pci_probe & PCI_PROBE_MASK) == PCI_PROBE_MMCONF ?
-             temp != val :
-             ((temp ^ val) &
-              ~(FAM10H_MMIO_CONF_ENABLE |
-                (FAM10H_MMIO_CONF_BUSRANGE_MASK <<
-                 FAM10H_MMIO_CONF_BUSRANGE_SHIFT) |
-                ((u64)FAM10H_MMIO_CONF_BASE_MASK <<
-                 FAM10H_MMIO_CONF_BASE_SHIFT))) )
-            goto invalid;
-        if ( wrmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_UCODE_REV:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val )
-            goto invalid;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_MISC_ENABLE:
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val != guest_misc_enable(temp) )
-            goto invalid;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_MPERF:
-    case MSR_IA32_APERF:
-        if ( (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) &&
-             (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_PERF_CTL:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_THERM_CONTROL:
-    case MSR_IA32_ENERGY_PERF_BIAS:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_AMD64_DR0_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
-            break;
-        curr->arch.pv_vcpu.dr_mask[0] = val;
-        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
-            wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, val);
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
-            break;
-        curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1] = val;
-        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
-            wrmsrl(reg, val);
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_PLATFORM_INFO:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
-            break;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_MISC_FEATURES_ENABLES:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
-             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
-            break;
-        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
-             !this_cpu(cpuid_faulting_enabled) )
-            break;
-        curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
-        return X86EMUL_OKAY;
-
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            vpmu_msr = true;
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
-            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
-            {
-                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
-                     !is_hardware_domain(currd) )
-                    return X86EMUL_OKAY;
-
-                if ( vpmu_do_wrmsr(reg, val, 0) )
-                    break;
-                return X86EMUL_OKAY;
-            }
-        }
-        /* fall through */
-    default:
-        if ( wrmsr_hypervisor_regs(reg, val) == 1 )
-            return X86EMUL_OKAY;
-
-        rc = vmce_wrmsr(reg, val);
-        if ( rc < 0 )
-            break;
-        if ( rc )
-            return X86EMUL_OKAY;
-
-        if ( (rdmsr_safe(reg, temp) != 0) || (val != temp) )
-    invalid:
-            gdprintk(XENLOG_WARNING,
-                     "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 0x%016"PRIx64"\n",
-                     reg, temp, val);
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_wbinvd(struct x86_emulate_ctxt *ctxt)
-{
-    /* Ignore the instruction if unprivileged. */
-    if ( !cache_flush_permitted(current->domain) )
-        /*
-         * Non-physdev domain attempted WBINVD; ignore for now since
-         * newer linux uses this in some start-of-day timing loops.
-         */
-        ;
-    else
-        wbinvd();
-
-    return X86EMUL_OKAY;
-}
-
-int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
-                  struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt)
-{
-    guest_cpuid(current, leaf, subleaf, res);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_validate(const struct x86_emulate_state *state,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    switch ( ctxt->opcode )
-    {
-    case 0x6c ... 0x6f: /* ins / outs */
-    case 0xe4 ... 0xe7: /* in / out (immediate port) */
-    case 0xec ... 0xef: /* in / out (port in %dx) */
-    case X86EMUL_OPC(0x0f, 0x06): /* clts */
-    case X86EMUL_OPC(0x0f, 0x09): /* wbinvd */
-    case X86EMUL_OPC(0x0f, 0x20) ...
-         X86EMUL_OPC(0x0f, 0x23): /* mov to/from cr/dr */
-    case X86EMUL_OPC(0x0f, 0x30): /* wrmsr */
-    case X86EMUL_OPC(0x0f, 0x31): /* rdtsc */
-    case X86EMUL_OPC(0x0f, 0x32): /* rdmsr */
-    case X86EMUL_OPC(0x0f, 0xa2): /* cpuid */
-        return X86EMUL_OKAY;
-
-    case 0xfa: case 0xfb: /* cli / sti */
-        if ( !iopl_ok(current, ctxt->regs) )
-            break;
-        /*
-         * This is just too dangerous to allow, in my opinion. Consider if the
-         * caller then tries to reenable interrupts using POPF: we can't trap
-         * that and we'll end up with hard-to-debug lockups. Fast & loose will
-         * do for us. :-)
-        vcpu_info(current, evtchn_upcall_mask) = (ctxt->opcode == 0xfa);
-         */
-        return X86EMUL_DONE;
-
-    case X86EMUL_OPC(0x0f, 0x01):
-    {
-        unsigned int modrm_rm, modrm_reg;
-
-        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
-             (modrm_rm & 7) != 1 )
-            break;
-        switch ( modrm_reg & 7 )
-        {
-        case 2: /* xsetbv */
-        case 7: /* rdtscp */
-            return X86EMUL_OKAY;
-        }
-        break;
-    }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops priv_op_ops = {
-    .insn_fetch          = priv_op_insn_fetch,
-    .read                = x86emul_unhandleable_rw,
-    .validate            = priv_op_validate,
-    .read_io             = priv_op_read_io,
-    .write_io            = priv_op_write_io,
-    .rep_ins             = priv_op_rep_ins,
-    .rep_outs            = priv_op_rep_outs,
-    .read_segment        = priv_op_read_segment,
-    .read_cr             = priv_op_read_cr,
-    .write_cr            = priv_op_write_cr,
-    .read_dr             = priv_op_read_dr,
-    .write_dr            = priv_op_write_dr,
-    .read_msr            = priv_op_read_msr,
-    .write_msr           = priv_op_write_msr,
-    .cpuid               = pv_emul_cpuid,
-    .wbinvd              = priv_op_wbinvd,
-};
-
-static int emulate_privileged_op(struct cpu_user_regs *regs)
-{
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    struct priv_op_ctxt ctxt = {
-        .ctxt.regs = regs,
-        .ctxt.vendor = currd->arch.cpuid->x86_vendor,
-        .ctxt.lma = !is_pv_32bit_domain(currd),
-    };
-    int rc;
-    unsigned int eflags, ar;
-
-    if ( !read_descriptor(regs->cs, curr, &ctxt.cs.base, &ctxt.cs.limit,
-                          &ar, 1) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_P) ||
-         !(ar & _SEGMENT_CODE) )
-        return 0;
-
-    /* Mirror virtualized state into EFLAGS. */
-    ASSERT(regs->eflags & X86_EFLAGS_IF);
-    if ( vcpu_info(curr, evtchn_upcall_mask) )
-        regs->eflags &= ~X86_EFLAGS_IF;
-    else
-        regs->eflags |= X86_EFLAGS_IF;
-    ASSERT(!(regs->eflags & X86_EFLAGS_IOPL));
-    regs->eflags |= curr->arch.pv_vcpu.iopl;
-    eflags = regs->eflags;
-
-    ctxt.ctxt.addr_size = ar & _SEGMENT_L ? 64 : ar & _SEGMENT_DB ? 32 : 16;
-    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed. */
-    rc = x86_emulate(&ctxt.ctxt, &priv_op_ops);
-
-    if ( ctxt.io_emul_stub )
-        unmap_domain_page(ctxt.io_emul_stub);
-
-    /*
-     * Un-mirror virtualized state from EFLAGS.
-     * Nothing we allow to be emulated can change anything other than the
-     * arithmetic bits, and the resume flag.
-     */
-    ASSERT(!((regs->eflags ^ eflags) &
-             ~(X86_EFLAGS_RF | X86_EFLAGS_ARITH_MASK)));
-    regs->eflags |= X86_EFLAGS_IF;
-    regs->eflags &= ~X86_EFLAGS_IOPL;
-
-    switch ( rc )
-    {
-    case X86EMUL_OKAY:
-        if ( ctxt.tsc & TSC_BASE )
-        {
-            if ( ctxt.tsc & TSC_AUX )
-                pv_soft_rdtsc(curr, regs, 1);
-            else if ( currd->arch.vtsc )
-                pv_soft_rdtsc(curr, regs, 0);
-            else
-                msr_split(regs, rdtsc());
-        }
-
-        if ( ctxt.ctxt.retire.singlestep )
-            ctxt.bpmatch |= DR_STEP;
-        if ( ctxt.bpmatch )
-        {
-            curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
-            if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
-                pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-        }
-        /* fall through */
-    case X86EMUL_RETRY:
-        return EXCRET_fault_fixed;
-
-    case X86EMUL_EXCEPTION:
-        pv_inject_event(&ctxt.ctxt.event);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 static inline int check_stack_limit(unsigned int ar, unsigned int limit,
                                     unsigned int esp, unsigned int decr)
 {
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
new file mode 100644
index 0000000000..32c7bac587
--- /dev/null
+++ b/xen/include/asm-x86/pv/traps.h
@@ -0,0 +1,48 @@
+/*
+ * pv/traps.h
+ *
+ * PV guest traps interface definitions
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_TRAPS_H__
+#define __X86_PV_TRAPS_H__
+
+#ifdef CONFIG_PV
+
+#include <public/xen.h>
+
+int emulate_privileged_op(struct cpu_user_regs *regs);
+
+#else  /* !CONFIG_PV */
+
+#include <xen/errno.h>
+
+int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
+
+#endif	/* CONFIG_PV */
+
+#endif	/* __X86_PV_TRAPS_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 02/22] x86/traps: move gate op emulation code
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
  2017-05-18 17:09 ` [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:15   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp Wei Liu
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

The code is moved to pv/emulate.c. Export emulate_gate_op in
pv/traps.h.

Delete the now unused read_descriptor in x86/traps.c. Duplicate
instruction_done in pv/traps.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/emulate.c      | 403 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c           | 441 -----------------------------------------
 xen/include/asm-x86/pv/traps.h |   2 +
 3 files changed, 405 insertions(+), 441 deletions(-)

diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index fb0d066a3b..6d096b74b2 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -45,6 +45,17 @@
  * Helper functions
  */
 
+static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
+{
+    regs->rip = rip;
+    regs->eflags &= ~X86_EFLAGS_RF;
+    if ( regs->eflags & X86_EFLAGS_TF )
+    {
+        current->arch.debugreg[6] |= DR_STEP | DR_STATUS_RESERVED_ONE;
+        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+    }
+}
+
 static int read_descriptor(unsigned int sel,
                            const struct vcpu *v,
                            unsigned long *base,
@@ -1459,6 +1470,398 @@ int emulate_privileged_op(struct cpu_user_regs *regs)
     return 0;
 }
 
+/*******************
+ * Gate OP emulation
+ */
+
+
+static int read_gate_descriptor(unsigned int gate_sel,
+                                const struct vcpu *v,
+                                unsigned int *sel,
+                                unsigned long *off,
+                                unsigned int *ar)
+{
+    struct desc_struct desc;
+    const struct desc_struct *pdesc;
+
+
+    pdesc = (const struct desc_struct *)
+        (!(gate_sel & 4) ? GDT_VIRT_START(v) : LDT_VIRT_START(v))
+        + (gate_sel >> 3);
+    if ( (gate_sel < 4) ||
+         ((gate_sel >= FIRST_RESERVED_GDT_BYTE) && !(gate_sel & 4)) ||
+         __get_user(desc, pdesc) )
+        return 0;
+
+    *sel = (desc.a >> 16) & 0x0000fffc;
+    *off = (desc.a & 0x0000ffff) | (desc.b & 0xffff0000);
+    *ar = desc.b & 0x0000ffff;
+
+    /*
+     * check_descriptor() clears the DPL field and stores the
+     * guest requested DPL in the selector's RPL field.
+     */
+    if ( *ar & _SEGMENT_DPL )
+        return 0;
+    *ar |= (desc.a >> (16 - 13)) & _SEGMENT_DPL;
+
+    if ( !is_pv_32bit_vcpu(v) )
+    {
+        if ( (*ar & 0x1f00) != 0x0c00 ||
+             (gate_sel >= FIRST_RESERVED_GDT_BYTE - 8 && !(gate_sel & 4)) ||
+             __get_user(desc, pdesc + 1) ||
+             (desc.b & 0x1f00) )
+            return 0;
+
+        *off |= (unsigned long)desc.a << 32;
+        return 1;
+    }
+
+    switch ( *ar & 0x1f00 )
+    {
+    case 0x0400:
+        *off &= 0xffff;
+        break;
+    case 0x0c00:
+        break;
+    default:
+        return 0;
+    }
+
+    return 1;
+}
+
+static inline int check_stack_limit(unsigned int ar, unsigned int limit,
+                                    unsigned int esp, unsigned int decr)
+{
+    return (((esp - decr) < (esp - 1)) &&
+            (!(ar & _SEGMENT_EC) ? (esp - 1) <= limit : (esp - decr) > limit));
+}
+
+struct gate_op_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    struct {
+        unsigned long base, limit;
+    } cs;
+    bool insn_fetch;
+};
+
+static int gate_op_read(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    const struct gate_op_ctxt *goc =
+        container_of(ctxt, struct gate_op_ctxt, ctxt);
+    unsigned int rc = bytes, sel = 0;
+    unsigned long addr = offset, limit = 0;
+
+    switch ( seg )
+    {
+    case x86_seg_cs:
+        addr += goc->cs.base;
+        limit = goc->cs.limit;
+        break;
+    case x86_seg_ds:
+        sel = read_sreg(ds);
+        break;
+    case x86_seg_es:
+        sel = read_sreg(es);
+        break;
+    case x86_seg_fs:
+        sel = read_sreg(fs);
+        break;
+    case x86_seg_gs:
+        sel = read_sreg(gs);
+        break;
+    case x86_seg_ss:
+        sel = ctxt->regs->ss;
+        break;
+    default:
+        return X86EMUL_UNHANDLEABLE;
+    }
+    if ( sel )
+    {
+        unsigned int ar;
+
+        ASSERT(!goc->insn_fetch);
+        if ( !read_descriptor(sel, current, &addr, &limit, &ar, 0) ||
+             !(ar & _SEGMENT_S) ||
+             !(ar & _SEGMENT_P) ||
+             ((ar & _SEGMENT_CODE) && !(ar & _SEGMENT_WR)) )
+            return X86EMUL_UNHANDLEABLE;
+        addr += offset;
+    }
+    else if ( seg != x86_seg_cs )
+        return X86EMUL_UNHANDLEABLE;
+
+    /* We don't mean to emulate any branches. */
+    if ( limit < bytes - 1 || offset > limit - bytes + 1 )
+        return X86EMUL_UNHANDLEABLE;
+
+    addr = (uint32_t)addr;
+
+    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        /*
+         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
+         * cpu_has_nx, but we'd then need a "fetch" variant of
+         * __copy_from_user() respecting NX, SMEP, and protection keys.
+         */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+void emulate_gate_op(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    unsigned int sel, ar, dpl, nparm, insn_len;
+    struct gate_op_ctxt ctxt = { .ctxt.regs = regs, .insn_fetch = true };
+    struct x86_emulate_state *state;
+    unsigned long off, base, limit;
+    uint16_t opnd_sel = 0;
+    int jump = -1, rc = X86EMUL_OKAY;
+
+    /* Check whether this fault is due to the use of a call gate. */
+    if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) ||
+         (((ar >> 13) & 3) < (regs->cs & 3)) ||
+         ((ar & _SEGMENT_TYPE) != 0xc00) )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+        return;
+    }
+    if ( !(ar & _SEGMENT_P) )
+    {
+        pv_inject_hw_exception(TRAP_no_segment, regs->error_code);
+        return;
+    }
+    dpl = (ar >> 13) & 3;
+    nparm = ar & 0x1f;
+
+    /*
+     * Decode instruction (and perhaps operand) to determine RPL,
+     * whether this is a jump or a call, and the call return offset.
+     */
+    if ( !read_descriptor(regs->cs, v, &ctxt.cs.base, &ctxt.cs.limit,
+                          &ar, 0) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_P) ||
+         !(ar & _SEGMENT_CODE) )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+        return;
+    }
+
+    ctxt.ctxt.addr_size = ar & _SEGMENT_DB ? 32 : 16;
+    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed for decoding. */
+    state = x86_decode_insn(&ctxt.ctxt, gate_op_read);
+    ctxt.insn_fetch = false;
+    if ( IS_ERR_OR_NULL(state) )
+    {
+        if ( PTR_ERR(state) == -X86EMUL_EXCEPTION )
+            pv_inject_event(&ctxt.ctxt.event);
+        else
+            pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+        return;
+    }
+
+    switch ( ctxt.ctxt.opcode )
+    {
+        unsigned int modrm_345;
+
+    case 0xea:
+        ++jump;
+        /* fall through */
+    case 0x9a:
+        ++jump;
+        opnd_sel = x86_insn_immediate(state, 1);
+        break;
+    case 0xff:
+        if ( x86_insn_modrm(state, NULL, &modrm_345) >= 3 )
+            break;
+        switch ( modrm_345 & 7 )
+        {
+            enum x86_segment seg;
+
+        case 5:
+            ++jump;
+            /* fall through */
+        case 3:
+            ++jump;
+            base = x86_insn_operand_ea(state, &seg);
+            rc = gate_op_read(seg,
+                              base + (x86_insn_opsize(state) >> 3),
+                              &opnd_sel, sizeof(opnd_sel), &ctxt.ctxt);
+            break;
+        }
+        break;
+    }
+
+    insn_len = x86_insn_length(state, &ctxt.ctxt);
+    x86_emulate_free_state(state);
+
+    if ( rc == X86EMUL_EXCEPTION )
+    {
+        pv_inject_event(&ctxt.ctxt.event);
+        return;
+    }
+
+    if ( rc != X86EMUL_OKAY ||
+         jump < 0 ||
+         (opnd_sel & ~3) != regs->error_code ||
+         dpl < (opnd_sel & 3) )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+        return;
+    }
+
+    if ( !read_descriptor(sel, v, &base, &limit, &ar, 0) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_CODE) ||
+         (!jump || (ar & _SEGMENT_EC) ?
+          ((ar >> 13) & 3) > (regs->cs & 3) :
+          ((ar >> 13) & 3) != (regs->cs & 3)) )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, sel);
+        return;
+    }
+    if ( !(ar & _SEGMENT_P) )
+    {
+        pv_inject_hw_exception(TRAP_no_segment, sel);
+        return;
+    }
+    if ( off > limit )
+    {
+        pv_inject_hw_exception(TRAP_gp_fault, 0);
+        return;
+    }
+
+    if ( !jump )
+    {
+        unsigned int ss, esp, *stkp;
+        int rc;
+#define push(item) do \
+        { \
+            --stkp; \
+            esp -= 4; \
+            rc = __put_user(item, stkp); \
+            if ( rc ) \
+            { \
+                pv_inject_page_fault(PFEC_write_access, \
+                                     (unsigned long)(stkp + 1) - rc); \
+                return; \
+            } \
+        } while ( 0 )
+
+        if ( ((ar >> 13) & 3) < (regs->cs & 3) )
+        {
+            sel |= (ar >> 13) & 3;
+            /* Inner stack known only for kernel ring. */
+            if ( (sel & 3) != GUEST_KERNEL_RPL(v->domain) )
+            {
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+                return;
+            }
+            esp = v->arch.pv_vcpu.kernel_sp;
+            ss = v->arch.pv_vcpu.kernel_ss;
+            if ( (ss & 3) != (sel & 3) ||
+                 !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
+                 ((ar >> 13) & 3) != (sel & 3) ||
+                 !(ar & _SEGMENT_S) ||
+                 (ar & _SEGMENT_CODE) ||
+                 !(ar & _SEGMENT_WR) )
+            {
+                pv_inject_hw_exception(TRAP_invalid_tss, ss & ~3);
+                return;
+            }
+            if ( !(ar & _SEGMENT_P) ||
+                 !check_stack_limit(ar, limit, esp, (4 + nparm) * 4) )
+            {
+                pv_inject_hw_exception(TRAP_stack_error, ss & ~3);
+                return;
+            }
+            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
+            if ( !compat_access_ok(stkp - 4 - nparm, (4 + nparm) * 4) )
+            {
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+                return;
+            }
+            push(regs->ss);
+            push(regs->rsp);
+            if ( nparm )
+            {
+                const unsigned int *ustkp;
+
+                if ( !read_descriptor(regs->ss, v, &base, &limit, &ar, 0) ||
+                     ((ar >> 13) & 3) != (regs->cs & 3) ||
+                     !(ar & _SEGMENT_S) ||
+                     (ar & _SEGMENT_CODE) ||
+                     !(ar & _SEGMENT_WR) ||
+                     !check_stack_limit(ar, limit, esp + nparm * 4, nparm * 4) )
+                    return pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+                ustkp = (unsigned int *)(unsigned long)
+                        ((unsigned int)base + regs->esp + nparm * 4);
+                if ( !compat_access_ok(ustkp - nparm, nparm * 4) )
+                {
+                    pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+                    return;
+                }
+                do
+                {
+                    unsigned int parm;
+
+                    --ustkp;
+                    rc = __get_user(parm, ustkp);
+                    if ( rc )
+                    {
+                        pv_inject_page_fault(0, (unsigned long)(ustkp + 1) - rc);
+                        return;
+                    }
+                    push(parm);
+                } while ( --nparm );
+            }
+        }
+        else
+        {
+            sel |= (regs->cs & 3);
+            esp = regs->rsp;
+            ss = regs->ss;
+            if ( !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
+                 ((ar >> 13) & 3) != (sel & 3) )
+            {
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+                return;
+            }
+            if ( !check_stack_limit(ar, limit, esp, 2 * 4) )
+            {
+                pv_inject_hw_exception(TRAP_stack_error, 0);
+                return;
+            }
+            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
+            if ( !compat_access_ok(stkp - 2, 2 * 4) )
+            {
+                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+                return;
+            }
+        }
+        push(regs->cs);
+        push(regs->rip + insn_len);
+#undef push
+        regs->rsp = esp;
+        regs->ss = ss;
+    }
+    else
+        sel |= (regs->cs & 3);
+
+    regs->cs = sel;
+    instruction_done(regs, off);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index cd43e9f44c..b55c425071 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1590,447 +1590,6 @@ long do_fpu_taskswitch(int set)
     return 0;
 }
 
-static int read_descriptor(unsigned int sel,
-                           const struct vcpu *v,
-                           unsigned long *base,
-                           unsigned long *limit,
-                           unsigned int *ar,
-                           bool_t insn_fetch)
-{
-    struct desc_struct desc;
-
-    if ( sel < 4)
-        desc.b = desc.a = 0;
-    else if ( __get_user(desc,
-                         (const struct desc_struct *)(!(sel & 4)
-                                                      ? GDT_VIRT_START(v)
-                                                      : LDT_VIRT_START(v))
-                         + (sel >> 3)) )
-        return 0;
-    if ( !insn_fetch )
-        desc.b &= ~_SEGMENT_L;
-
-    *ar = desc.b & 0x00f0ff00;
-    if ( !(desc.b & _SEGMENT_L) )
-    {
-        *base = ((desc.a >> 16) + ((desc.b & 0xff) << 16) +
-                 (desc.b & 0xff000000));
-        *limit = (desc.a & 0xffff) | (desc.b & 0x000f0000);
-        if ( desc.b & _SEGMENT_G )
-            *limit = ((*limit + 1) << 12) - 1;
-#ifndef NDEBUG
-        if ( sel > 3 )
-        {
-            unsigned int a, l;
-            unsigned char valid;
-
-            asm volatile (
-                "larl %2,%0 ; setz %1"
-                : "=r" (a), "=qm" (valid) : "rm" (sel));
-            BUG_ON(valid && ((a & 0x00f0ff00) != *ar));
-            asm volatile (
-                "lsll %2,%0 ; setz %1"
-                : "=r" (l), "=qm" (valid) : "rm" (sel));
-            BUG_ON(valid && (l != *limit));
-        }
-#endif
-    }
-    else
-    {
-        *base = 0UL;
-        *limit = ~0UL;
-    }
-
-    return 1;
-}
-
-static int read_gate_descriptor(unsigned int gate_sel,
-                                const struct vcpu *v,
-                                unsigned int *sel,
-                                unsigned long *off,
-                                unsigned int *ar)
-{
-    struct desc_struct desc;
-    const struct desc_struct *pdesc;
-
-
-    pdesc = (const struct desc_struct *)
-        (!(gate_sel & 4) ? GDT_VIRT_START(v) : LDT_VIRT_START(v))
-        + (gate_sel >> 3);
-    if ( (gate_sel < 4) ||
-         ((gate_sel >= FIRST_RESERVED_GDT_BYTE) && !(gate_sel & 4)) ||
-         __get_user(desc, pdesc) )
-        return 0;
-
-    *sel = (desc.a >> 16) & 0x0000fffc;
-    *off = (desc.a & 0x0000ffff) | (desc.b & 0xffff0000);
-    *ar = desc.b & 0x0000ffff;
-
-    /*
-     * check_descriptor() clears the DPL field and stores the
-     * guest requested DPL in the selector's RPL field.
-     */
-    if ( *ar & _SEGMENT_DPL )
-        return 0;
-    *ar |= (desc.a >> (16 - 13)) & _SEGMENT_DPL;
-
-    if ( !is_pv_32bit_vcpu(v) )
-    {
-        if ( (*ar & 0x1f00) != 0x0c00 ||
-             (gate_sel >= FIRST_RESERVED_GDT_BYTE - 8 && !(gate_sel & 4)) ||
-             __get_user(desc, pdesc + 1) ||
-             (desc.b & 0x1f00) )
-            return 0;
-
-        *off |= (unsigned long)desc.a << 32;
-        return 1;
-    }
-
-    switch ( *ar & 0x1f00 )
-    {
-    case 0x0400:
-        *off &= 0xffff;
-        break;
-    case 0x0c00:
-        break;
-    default:
-        return 0;
-    }
-
-    return 1;
-}
-
-static inline int check_stack_limit(unsigned int ar, unsigned int limit,
-                                    unsigned int esp, unsigned int decr)
-{
-    return (((esp - decr) < (esp - 1)) &&
-            (!(ar & _SEGMENT_EC) ? (esp - 1) <= limit : (esp - decr) > limit));
-}
-
-struct gate_op_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    struct {
-        unsigned long base, limit;
-    } cs;
-    bool insn_fetch;
-};
-
-static int gate_op_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    const struct gate_op_ctxt *goc =
-        container_of(ctxt, struct gate_op_ctxt, ctxt);
-    unsigned int rc = bytes, sel = 0;
-    unsigned long addr = offset, limit = 0;
-
-    switch ( seg )
-    {
-    case x86_seg_cs:
-        addr += goc->cs.base;
-        limit = goc->cs.limit;
-        break;
-    case x86_seg_ds:
-        sel = read_sreg(ds);
-        break;
-    case x86_seg_es:
-        sel = read_sreg(es);
-        break;
-    case x86_seg_fs:
-        sel = read_sreg(fs);
-        break;
-    case x86_seg_gs:
-        sel = read_sreg(gs);
-        break;
-    case x86_seg_ss:
-        sel = ctxt->regs->ss;
-        break;
-    default:
-        return X86EMUL_UNHANDLEABLE;
-    }
-    if ( sel )
-    {
-        unsigned int ar;
-
-        ASSERT(!goc->insn_fetch);
-        if ( !read_descriptor(sel, current, &addr, &limit, &ar, 0) ||
-             !(ar & _SEGMENT_S) ||
-             !(ar & _SEGMENT_P) ||
-             ((ar & _SEGMENT_CODE) && !(ar & _SEGMENT_WR)) )
-            return X86EMUL_UNHANDLEABLE;
-        addr += offset;
-    }
-    else if ( seg != x86_seg_cs )
-        return X86EMUL_UNHANDLEABLE;
-
-    /* We don't mean to emulate any branches. */
-    if ( limit < bytes - 1 || offset > limit - bytes + 1 )
-        return X86EMUL_UNHANDLEABLE;
-
-    addr = (uint32_t)addr;
-
-    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        /*
-         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
-         * cpu_has_nx, but we'd then need a "fetch" variant of
-         * __copy_from_user() respecting NX, SMEP, and protection keys.
-         */
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static void emulate_gate_op(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    unsigned int sel, ar, dpl, nparm, insn_len;
-    struct gate_op_ctxt ctxt = { .ctxt.regs = regs, .insn_fetch = true };
-    struct x86_emulate_state *state;
-    unsigned long off, base, limit;
-    uint16_t opnd_sel = 0;
-    int jump = -1, rc = X86EMUL_OKAY;
-
-    /* Check whether this fault is due to the use of a call gate. */
-    if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) ||
-         (((ar >> 13) & 3) < (regs->cs & 3)) ||
-         ((ar & _SEGMENT_TYPE) != 0xc00) )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-        return;
-    }
-    if ( !(ar & _SEGMENT_P) )
-    {
-        pv_inject_hw_exception(TRAP_no_segment, regs->error_code);
-        return;
-    }
-    dpl = (ar >> 13) & 3;
-    nparm = ar & 0x1f;
-
-    /*
-     * Decode instruction (and perhaps operand) to determine RPL,
-     * whether this is a jump or a call, and the call return offset.
-     */
-    if ( !read_descriptor(regs->cs, v, &ctxt.cs.base, &ctxt.cs.limit,
-                          &ar, 0) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_P) ||
-         !(ar & _SEGMENT_CODE) )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-        return;
-    }
-
-    ctxt.ctxt.addr_size = ar & _SEGMENT_DB ? 32 : 16;
-    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed for decoding. */
-    state = x86_decode_insn(&ctxt.ctxt, gate_op_read);
-    ctxt.insn_fetch = false;
-    if ( IS_ERR_OR_NULL(state) )
-    {
-        if ( PTR_ERR(state) == -X86EMUL_EXCEPTION )
-            pv_inject_event(&ctxt.ctxt.event);
-        else
-            pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-        return;
-    }
-
-    switch ( ctxt.ctxt.opcode )
-    {
-        unsigned int modrm_345;
-
-    case 0xea:
-        ++jump;
-        /* fall through */
-    case 0x9a:
-        ++jump;
-        opnd_sel = x86_insn_immediate(state, 1);
-        break;
-    case 0xff:
-        if ( x86_insn_modrm(state, NULL, &modrm_345) >= 3 )
-            break;
-        switch ( modrm_345 & 7 )
-        {
-            enum x86_segment seg;
-
-        case 5:
-            ++jump;
-            /* fall through */
-        case 3:
-            ++jump;
-            base = x86_insn_operand_ea(state, &seg);
-            rc = gate_op_read(seg,
-                              base + (x86_insn_opsize(state) >> 3),
-                              &opnd_sel, sizeof(opnd_sel), &ctxt.ctxt);
-            break;
-        }
-        break;
-    }
-
-    insn_len = x86_insn_length(state, &ctxt.ctxt);
-    x86_emulate_free_state(state);
-
-    if ( rc == X86EMUL_EXCEPTION )
-    {
-        pv_inject_event(&ctxt.ctxt.event);
-        return;
-    }
-
-    if ( rc != X86EMUL_OKAY ||
-         jump < 0 ||
-         (opnd_sel & ~3) != regs->error_code ||
-         dpl < (opnd_sel & 3) )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-        return;
-    }
-
-    if ( !read_descriptor(sel, v, &base, &limit, &ar, 0) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_CODE) ||
-         (!jump || (ar & _SEGMENT_EC) ?
-          ((ar >> 13) & 3) > (regs->cs & 3) :
-          ((ar >> 13) & 3) != (regs->cs & 3)) )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, sel);
-        return;
-    }
-    if ( !(ar & _SEGMENT_P) )
-    {
-        pv_inject_hw_exception(TRAP_no_segment, sel);
-        return;
-    }
-    if ( off > limit )
-    {
-        pv_inject_hw_exception(TRAP_gp_fault, 0);
-        return;
-    }
-
-    if ( !jump )
-    {
-        unsigned int ss, esp, *stkp;
-        int rc;
-#define push(item) do \
-        { \
-            --stkp; \
-            esp -= 4; \
-            rc = __put_user(item, stkp); \
-            if ( rc ) \
-            { \
-                pv_inject_page_fault(PFEC_write_access, \
-                                     (unsigned long)(stkp + 1) - rc); \
-                return; \
-            } \
-        } while ( 0 )
-
-        if ( ((ar >> 13) & 3) < (regs->cs & 3) )
-        {
-            sel |= (ar >> 13) & 3;
-            /* Inner stack known only for kernel ring. */
-            if ( (sel & 3) != GUEST_KERNEL_RPL(v->domain) )
-            {
-                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-                return;
-            }
-            esp = v->arch.pv_vcpu.kernel_sp;
-            ss = v->arch.pv_vcpu.kernel_ss;
-            if ( (ss & 3) != (sel & 3) ||
-                 !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
-                 ((ar >> 13) & 3) != (sel & 3) ||
-                 !(ar & _SEGMENT_S) ||
-                 (ar & _SEGMENT_CODE) ||
-                 !(ar & _SEGMENT_WR) )
-            {
-                pv_inject_hw_exception(TRAP_invalid_tss, ss & ~3);
-                return;
-            }
-            if ( !(ar & _SEGMENT_P) ||
-                 !check_stack_limit(ar, limit, esp, (4 + nparm) * 4) )
-            {
-                pv_inject_hw_exception(TRAP_stack_error, ss & ~3);
-                return;
-            }
-            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
-            if ( !compat_access_ok(stkp - 4 - nparm, (4 + nparm) * 4) )
-            {
-                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-                return;
-            }
-            push(regs->ss);
-            push(regs->rsp);
-            if ( nparm )
-            {
-                const unsigned int *ustkp;
-
-                if ( !read_descriptor(regs->ss, v, &base, &limit, &ar, 0) ||
-                     ((ar >> 13) & 3) != (regs->cs & 3) ||
-                     !(ar & _SEGMENT_S) ||
-                     (ar & _SEGMENT_CODE) ||
-                     !(ar & _SEGMENT_WR) ||
-                     !check_stack_limit(ar, limit, esp + nparm * 4, nparm * 4) )
-                    return pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-                ustkp = (unsigned int *)(unsigned long)
-                        ((unsigned int)base + regs->esp + nparm * 4);
-                if ( !compat_access_ok(ustkp - nparm, nparm * 4) )
-                {
-                    pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-                    return;
-                }
-                do
-                {
-                    unsigned int parm;
-
-                    --ustkp;
-                    rc = __get_user(parm, ustkp);
-                    if ( rc )
-                    {
-                        pv_inject_page_fault(0, (unsigned long)(ustkp + 1) - rc);
-                        return;
-                    }
-                    push(parm);
-                } while ( --nparm );
-            }
-        }
-        else
-        {
-            sel |= (regs->cs & 3);
-            esp = regs->rsp;
-            ss = regs->ss;
-            if ( !read_descriptor(ss, v, &base, &limit, &ar, 0) ||
-                 ((ar >> 13) & 3) != (sel & 3) )
-            {
-                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-                return;
-            }
-            if ( !check_stack_limit(ar, limit, esp, 2 * 4) )
-            {
-                pv_inject_hw_exception(TRAP_stack_error, 0);
-                return;
-            }
-            stkp = (unsigned int *)(unsigned long)((unsigned int)base + esp);
-            if ( !compat_access_ok(stkp - 2, 2 * 4) )
-            {
-                pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-                return;
-            }
-        }
-        push(regs->cs);
-        push(regs->rip + insn_len);
-#undef push
-        regs->rsp = esp;
-        regs->ss = ss;
-    }
-    else
-        sel |= (regs->cs & 3);
-
-    regs->cs = sel;
-    instruction_done(regs, off);
-}
-
 void do_general_protection(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
index 32c7bac587..2d9f23dbde 100644
--- a/xen/include/asm-x86/pv/traps.h
+++ b/xen/include/asm-x86/pv/traps.h
@@ -26,12 +26,14 @@
 #include <public/xen.h>
 
 int emulate_privileged_op(struct cpu_user_regs *regs);
+void emulate_gate_op(struct cpu_user_regs *regs);
 
 #else  /* !CONFIG_PV */
 
 #include <xen/errno.h>
 
 int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
+void emulate_gate_op(struct cpu_user_regs *regs) {}
 
 #endif	/* CONFIG_PV */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
  2017-05-18 17:09 ` [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code Wei Liu
  2017-05-18 17:09 ` [PATCH for-next v3 02/22] x86/traps: move gate op " Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:18   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op Wei Liu
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

And export it in pv/traps.h.

The stub function returns 0 because that represents "unsuccessful
emulation" in the original code.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/emulate.c      | 20 ++++++++++++++++++++
 xen/arch/x86/traps.c           | 20 --------------------
 xen/include/asm-x86/pv/traps.h |  2 ++
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index 6d096b74b2..364cd0f78c 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -1862,6 +1862,26 @@ void emulate_gate_op(struct cpu_user_regs *regs)
     instruction_done(regs, off);
 }
 
+int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
+{
+    char opcode[3];
+    unsigned long eip, rc;
+    struct vcpu *v = current;
+
+    eip = regs->rip;
+    if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
+    {
+        pv_inject_page_fault(0, eip + sizeof(opcode) - rc);
+        return EXCRET_fault_fixed;
+    }
+    if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
+        return 0;
+    eip += sizeof(opcode);
+    pv_soft_rdtsc(v, regs, 1);
+    instruction_done(regs, eip);
+    return EXCRET_fault_fixed;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b55c425071..38bc531f5b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -978,26 +978,6 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, uint32_t leaf,
     }
 }
 
-static int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
-{
-    char opcode[3];
-    unsigned long eip, rc;
-    struct vcpu *v = current;
-
-    eip = regs->rip;
-    if ( (rc = copy_from_user(opcode, (char *)eip, sizeof(opcode))) != 0 )
-    {
-        pv_inject_page_fault(0, eip + sizeof(opcode) - rc);
-        return EXCRET_fault_fixed;
-    }
-    if ( memcmp(opcode, "\xf\x1\xf9", sizeof(opcode)) )
-        return 0;
-    eip += sizeof(opcode);
-    pv_soft_rdtsc(v, regs, 1);
-    instruction_done(regs, eip);
-    return EXCRET_fault_fixed;
-}
-
 static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 {
     char sig[5], instr[2];
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
index 2d9f23dbde..88dc20928b 100644
--- a/xen/include/asm-x86/pv/traps.h
+++ b/xen/include/asm-x86/pv/traps.h
@@ -27,6 +27,7 @@
 
 int emulate_privileged_op(struct cpu_user_regs *regs);
 void emulate_gate_op(struct cpu_user_regs *regs);
+int emulate_invalid_rdtscp(struct cpu_user_regs *regs);
 
 #else  /* !CONFIG_PV */
 
@@ -34,6 +35,7 @@ void emulate_gate_op(struct cpu_user_regs *regs);
 
 int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
 void emulate_gate_op(struct cpu_user_regs *regs) {}
+int emulate_invalid_rdtscp(struct cpu_user_regs *regs) { return 0; }
 
 #endif	/* CONFIG_PV */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (2 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:19   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 05/22] x86/pv: clean up emulate.c Wei Liu
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

And remove the now unused instruction_done in x86/traps.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/emulate.c      | 51 ++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c           | 62 ------------------------------------------
 xen/include/asm-x86/pv/traps.h |  2 ++
 3 files changed, 53 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index 364cd0f78c..e261aeb0f7 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -1882,6 +1882,57 @@ int emulate_invalid_rdtscp(struct cpu_user_regs *regs)
     return EXCRET_fault_fixed;
 }
 
+int emulate_forced_invalid_op(struct cpu_user_regs *regs)
+{
+    char sig[5], instr[2];
+    unsigned long eip, rc;
+    struct cpuid_leaf res;
+
+    eip = regs->rip;
+
+    /* Check for forced emulation signature: ud2 ; .ascii "xen". */
+    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
+    {
+        pv_inject_page_fault(0, eip + sizeof(sig) - rc);
+        return EXCRET_fault_fixed;
+    }
+    if ( memcmp(sig, "\xf\xbxen", sizeof(sig)) )
+        return 0;
+    eip += sizeof(sig);
+
+    /* We only emulate CPUID. */
+    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
+    {
+        pv_inject_page_fault(0, eip + sizeof(instr) - rc);
+        return EXCRET_fault_fixed;
+    }
+    if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
+        return 0;
+
+    /* If cpuid faulting is enabled and CPL>0 inject a #GP in place of #UD. */
+    if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
+    {
+        regs->rip = eip;
+        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
+        return EXCRET_fault_fixed;
+    }
+
+    eip += sizeof(instr);
+
+    guest_cpuid(current, regs->eax, regs->ecx, &res);
+
+    regs->rax = res.a;
+    regs->rbx = res.b;
+    regs->rcx = res.c;
+    regs->rdx = res.d;
+
+    instruction_done(regs, eip);
+
+    trace_trap_one_addr(TRC_PV_FORCED_INVALID_OP, regs->rip);
+
+    return EXCRET_fault_fixed;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 38bc531f5b..ace346d377 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -696,17 +696,6 @@ void pv_inject_event(const struct x86_event *event)
     }
 }
 
-static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
-{
-    regs->rip = rip;
-    regs->eflags &= ~X86_EFLAGS_RF;
-    if ( regs->eflags & X86_EFLAGS_TF )
-    {
-        current->arch.debugreg[6] |= DR_STEP | DR_STATUS_RESERVED_ONE;
-        pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-    }
-}
-
 /*
  * Called from asm to set up the MCE trapbounce info.
  * Returns 0 if no callback is set up, else 1.
@@ -978,57 +967,6 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, uint32_t leaf,
     }
 }
 
-static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
-{
-    char sig[5], instr[2];
-    unsigned long eip, rc;
-    struct cpuid_leaf res;
-
-    eip = regs->rip;
-
-    /* Check for forced emulation signature: ud2 ; .ascii "xen". */
-    if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 )
-    {
-        pv_inject_page_fault(0, eip + sizeof(sig) - rc);
-        return EXCRET_fault_fixed;
-    }
-    if ( memcmp(sig, "\xf\xbxen", sizeof(sig)) )
-        return 0;
-    eip += sizeof(sig);
-
-    /* We only emulate CPUID. */
-    if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 )
-    {
-        pv_inject_page_fault(0, eip + sizeof(instr) - rc);
-        return EXCRET_fault_fixed;
-    }
-    if ( memcmp(instr, "\xf\xa2", sizeof(instr)) )
-        return 0;
-
-    /* If cpuid faulting is enabled and CPL>0 inject a #GP in place of #UD. */
-    if ( current->arch.cpuid_faulting && !guest_kernel_mode(current, regs) )
-    {
-        regs->rip = eip;
-        pv_inject_hw_exception(TRAP_gp_fault, regs->error_code);
-        return EXCRET_fault_fixed;
-    }
-
-    eip += sizeof(instr);
-
-    guest_cpuid(current, regs->eax, regs->ecx, &res);
-
-    regs->rax = res.a;
-    regs->rbx = res.b;
-    regs->rcx = res.c;
-    regs->rdx = res.d;
-
-    instruction_done(regs, eip);
-
-    trace_trap_one_addr(TRC_PV_FORCED_INVALID_OP, regs->rip);
-
-    return EXCRET_fault_fixed;
-}
-
 void do_invalid_op(struct cpu_user_regs *regs)
 {
     const struct bug_frame *bug = NULL;
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
index 88dc20928b..3f1c93a430 100644
--- a/xen/include/asm-x86/pv/traps.h
+++ b/xen/include/asm-x86/pv/traps.h
@@ -28,6 +28,7 @@
 int emulate_privileged_op(struct cpu_user_regs *regs);
 void emulate_gate_op(struct cpu_user_regs *regs);
 int emulate_invalid_rdtscp(struct cpu_user_regs *regs);
+int emulate_forced_invalid_op(struct cpu_user_regs *regs);
 
 #else  /* !CONFIG_PV */
 
@@ -36,6 +37,7 @@ int emulate_invalid_rdtscp(struct cpu_user_regs *regs);
 int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
 void emulate_gate_op(struct cpu_user_regs *regs) {}
 int emulate_invalid_rdtscp(struct cpu_user_regs *regs) { return 0; }
+int emulate_forced_invalid_op(struct cpu_user_regs *regs) { return 0; }
 
 #endif	/* CONFIG_PV */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 05/22] x86/pv: clean up emulate.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (3 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:37   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Fix coding style issues. Replace bool_t with bool. Add spaces around
binary ops. Use unsigned integer for shifting. Eliminate TOGGLE_MODE.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/pv/emulate.c | 123 ++++++++++++++++++++++++----------------------
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index e261aeb0f7..cae6c9e350 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -61,7 +61,7 @@ static int read_descriptor(unsigned int sel,
                            unsigned long *base,
                            unsigned long *limit,
                            unsigned int *ar,
-                           bool_t insn_fetch)
+                           bool insn_fetch)
 {
     struct desc_struct desc;
 
@@ -126,7 +126,7 @@ struct priv_op_ctxt {
 #define TSC_AUX 2
 };
 
-/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
+/* I/O emulation support. Helper routines for, and type of, the stack stub. */
 void host_to_guest_gpr_switch(struct cpu_user_regs *);
 unsigned long guest_to_host_gpr_switch(unsigned long);
 
@@ -169,7 +169,7 @@ static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
 
 
 /* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
-static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
+static bool iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
 {
     unsigned int cpl = guest_kernel_mode(v, regs) ?
         (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
@@ -180,16 +180,14 @@ static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
 }
 
 /* Has the guest requested sufficient permission for this I/O access? */
-static int guest_io_okay(
-    unsigned int port, unsigned int bytes,
-    struct vcpu *v, struct cpu_user_regs *regs)
+static bool guest_io_okay(unsigned int port, unsigned int bytes,
+                          struct vcpu *v, struct cpu_user_regs *regs)
 {
     /* If in user mode, switch to kernel mode just to read I/O bitmap. */
-    int user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+    const bool user_mode = !(v->arch.flags & TF_kernel_mode);
 
     if ( iopl_ok(v, regs) )
-        return 1;
+        return true;
 
     if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
     {
@@ -199,9 +197,11 @@ static int guest_io_okay(
          * Grab permission bytes from guest space. Inaccessible bytes are
          * read as 0xff (no access allowed).
          */
-        TOGGLE_MODE();
+        if ( user_mode )
+            toggle_guest_mode(v);
+
         switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
-                                          port>>3, 2) )
+                                          port >> 3, 2) )
         {
         default: x.bytes[0] = ~0;
             /* fallthrough */
@@ -209,43 +209,45 @@ static int guest_io_okay(
             /* fallthrough */
         case 0:  break;
         }
-        TOGGLE_MODE();
 
-        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
-            return 1;
+        if ( user_mode )
+            toggle_guest_mode(v);
+
+        if ( (x.mask & (((1 << bytes)-1) << (port & 7))) == 0 )
+            return true;
     }
 
-    return 0;
+    return false;
 }
 
 /* Has the administrator granted sufficient permission for this I/O access? */
-static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
-                            const struct domain *d)
+static bool admin_io_okay(unsigned int port, unsigned int bytes,
+                          const struct domain *d)
 {
     /*
      * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
      * We never permit direct access to that register.
      */
     if ( (port == 0xcf8) && (bytes == 4) )
-        return 0;
+        return false;
 
     /* We also never permit direct access to the RTC/CMOS registers. */
     if ( ((port & ~1) == RTC_PORT(0)) )
-        return 0;
+        return false;
 
     return ioports_access_permitted(d, port, port + bytes - 1);
 }
 
-static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
-                         unsigned int size, uint32_t *write)
+static bool pci_cfg_ok(struct domain *currd, unsigned int start,
+                       unsigned int size, uint32_t *write)
 {
     uint32_t machine_bdf;
 
     if ( !is_hardware_domain(currd) )
-        return 0;
+        return false;
 
     if ( !CF8_ENABLED(currd->arch.pci_cf8) )
-        return 1;
+        return true;
 
     machine_bdf = CF8_BDF(currd->arch.pci_cf8);
     if ( write )
@@ -253,7 +255,7 @@ static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
         const unsigned long *ro_map = pci_get_ro_map(0);
 
         if ( ro_map && test_bit(machine_bdf, ro_map) )
-            return 0;
+            return false;
     }
     start |= CF8_ADDR_LO(currd->arch.pci_cf8);
     /* AMD extended configuration space access? */
@@ -264,7 +266,7 @@ static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
         uint64_t msr_val;
 
         if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
-            return 0;
+            return false;
         if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
             start |= CF8_ADDR_HI(currd->arch.pci_cf8);
     }
@@ -341,7 +343,8 @@ uint32_t guest_io_read(unsigned int port, unsigned int bytes,
 }
 
 static unsigned int check_guest_io_breakpoint(struct vcpu *v,
-    unsigned int port, unsigned int len)
+                                              unsigned int port,
+                                              unsigned int len)
 {
     unsigned int width, i, match = 0;
     unsigned long start;
@@ -369,7 +372,7 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
         }
 
         if ( (start < (port + len)) && ((start + width) > port) )
-            match |= 1 << i;
+            match |= 1u << i;
     }
 
     return match;
@@ -410,7 +413,8 @@ void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
 {
     if ( admin_io_okay(port, bytes, currd) )
     {
-        switch ( bytes ) {
+        switch ( bytes )
+        {
         case 1:
             outb((uint8_t)data, port);
             if ( pv_post_outb_hook )
@@ -808,7 +812,7 @@ static int priv_op_write_cr(unsigned int reg, unsigned long val,
         if ( (val ^ read_cr0()) & ~X86_CR0_TS )
         {
             gdprintk(XENLOG_WARNING,
-                    "Attempt to change unmodifiable CR0 flags\n");
+                     "Attempt to change unmodifiable CR0 flags\n");
             break;
         }
         do_fpu_taskswitch(!!(val & X86_CR0_TS));
@@ -921,11 +925,11 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
         *val = curr->arch.pv_vcpu.gs_base_user;
         return X86EMUL_OKAY;
 
-    /*
-     * In order to fully retain original behavior, defer calling
-     * pv_soft_rdtsc() until after emulation. This may want/need to be
-     * reconsidered.
-     */
+        /*
+         * In order to fully retain original behavior, defer calling
+         * pv_soft_rdtsc() until after emulation. This may want/need to be
+         * reconsidered.
+         */
     case MSR_IA32_TSC:
         poc->tsc |= TSC_BASE;
         goto normal;
@@ -1015,16 +1019,16 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
             *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
         return X86EMUL_OKAY;
 
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL ... MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
         {
             vpmu_msr = true;
             /* fall through */
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+    case MSR_AMD_FAM15H_EVNTSEL0 ... MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
             {
                 if ( vpmu_do_rdmsr(reg, val) )
@@ -1220,15 +1224,15 @@ static int priv_op_write_msr(unsigned int reg, uint64_t val,
         curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
         return X86EMUL_OKAY;
 
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL ... MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
         {
             vpmu_msr = true;
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+    case MSR_AMD_FAM15H_EVNTSEL0 ... MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
             {
                 if ( (vpmu_mode & XENPMU_MODE_ALL) &&
@@ -1484,7 +1488,6 @@ static int read_gate_descriptor(unsigned int gate_sel,
     struct desc_struct desc;
     const struct desc_struct *pdesc;
 
-
     pdesc = (const struct desc_struct *)
         (!(gate_sel & 4) ? GDT_VIRT_START(v) : LDT_VIRT_START(v))
         + (gate_sel >> 3);
@@ -1531,8 +1534,8 @@ static int read_gate_descriptor(unsigned int gate_sel,
     return 1;
 }
 
-static inline int check_stack_limit(unsigned int ar, unsigned int limit,
-                                    unsigned int esp, unsigned int decr)
+static inline bool check_stack_limit(unsigned int ar, unsigned int limit,
+                                     unsigned int esp, unsigned int decr)
 {
     return (((esp - decr) < (esp - 1)) &&
             (!(ar & _SEGMENT_EC) ? (esp - 1) <= limit : (esp - decr) > limit));
@@ -1745,17 +1748,17 @@ void emulate_gate_op(struct cpu_user_regs *regs)
     {
         unsigned int ss, esp, *stkp;
         int rc;
-#define push(item) do \
-        { \
-            --stkp; \
-            esp -= 4; \
-            rc = __put_user(item, stkp); \
-            if ( rc ) \
-            { \
-                pv_inject_page_fault(PFEC_write_access, \
-                                     (unsigned long)(stkp + 1) - rc); \
-                return; \
-            } \
+#define push(item) do                                                   \
+        {                                                               \
+            --stkp;                                                     \
+            esp -= 4;                                                   \
+            rc = __put_user(item, stkp);                                \
+            if ( rc )                                                   \
+            {                                                           \
+                pv_inject_page_fault(PFEC_write_access,                 \
+                                     (unsigned long)(stkp + 1) - rc);   \
+                return;                                                 \
+            }                                                           \
         } while ( 0 )
 
         if ( ((ar >> 13) & 3) < (regs->cs & 3) )
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (4 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 05/22] x86/pv: clean up emulate.c Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:40   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 07/22] x86/traps: move pv_inject_event " Wei Liu
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

The following handlers are moved:
1. do_set_trap_table
2. do_set_debugreg
3. do_get_debugreg
4. do_fpu_taskswitch

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 94 -----------------------------------------------
 2 files changed, 97 insertions(+), 94 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 51125a8d86..350e7a1da4 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -19,9 +19,13 @@
  * Copyright (c) 2017 Citrix Systems Ltd.
  */
 
+#include <xen/event.h>
+#include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xen/sched.h>
 
 #include <asm/apic.h>
+#include <asm/debugreg.h>
 
 void do_entry_int82(struct cpu_user_regs *regs)
 {
@@ -31,6 +35,99 @@ void do_entry_int82(struct cpu_user_regs *regs)
     pv_hypercall(regs);
 }
 
+long do_fpu_taskswitch(int set)
+{
+    struct vcpu *v = current;
+
+    if ( set )
+    {
+        v->arch.pv_vcpu.ctrlreg[0] |= X86_CR0_TS;
+        stts();
+    }
+    else
+    {
+        v->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS;
+        if ( v->fpu_dirtied )
+            clts();
+    }
+
+    return 0;
+}
+
+long do_set_trap_table(XEN_GUEST_HANDLE_PARAM(const_trap_info_t) traps)
+{
+    struct trap_info cur;
+    struct vcpu *curr = current;
+    struct trap_info *dst = curr->arch.pv_vcpu.trap_ctxt;
+    long rc = 0;
+
+    /* If no table is presented then clear the entire virtual IDT. */
+    if ( guest_handle_is_null(traps) )
+    {
+        memset(dst, 0, NR_VECTORS * sizeof(*dst));
+        init_int80_direct_trap(curr);
+        return 0;
+    }
+
+    for ( ; ; )
+    {
+        if ( copy_from_guest(&cur, traps, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( cur.address == 0 )
+            break;
+
+        if ( !is_canonical_address(cur.address) )
+            return -EINVAL;
+
+        fixup_guest_code_selector(curr->domain, cur.cs);
+
+        memcpy(&dst[cur.vector], &cur, sizeof(cur));
+
+        if ( cur.vector == 0x80 )
+            init_int80_direct_trap(curr);
+
+        guest_handle_add_offset(traps, 1);
+
+        if ( hypercall_preempt_check() )
+        {
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_set_trap_table, "h", traps);
+            break;
+        }
+    }
+
+    return rc;
+}
+
+long do_set_debugreg(int reg, unsigned long value)
+{
+    return set_debugreg(current, reg, value);
+}
+
+unsigned long do_get_debugreg(int reg)
+{
+    struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0 ... 3:
+    case 6:
+        return curr->arch.debugreg[reg];
+    case 7:
+        return (curr->arch.debugreg[7] |
+                curr->arch.debugreg[5]);
+    case 4 ... 5:
+        return ((curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ?
+                curr->arch.debugreg[reg + 2] : 0);
+    }
+
+    return -EINVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index ace346d377..e5a3c9ad1a 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1489,25 +1489,6 @@ void __init do_early_page_fault(struct cpu_user_regs *regs)
     }
 }
 
-long do_fpu_taskswitch(int set)
-{
-    struct vcpu *v = current;
-
-    if ( set )
-    {
-        v->arch.pv_vcpu.ctrlreg[0] |= X86_CR0_TS;
-        stts();
-    }
-    else
-    {
-        v->arch.pv_vcpu.ctrlreg[0] &= ~X86_CR0_TS;
-        if ( v->fpu_dirtied )
-            clts();
-    }
-
-    return 0;
-}
-
 void do_general_protection(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -2126,56 +2107,6 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
     return -EIO;
 }
 
-
-long do_set_trap_table(XEN_GUEST_HANDLE_PARAM(const_trap_info_t) traps)
-{
-    struct trap_info cur;
-    struct vcpu *curr = current;
-    struct trap_info *dst = curr->arch.pv_vcpu.trap_ctxt;
-    long rc = 0;
-
-    /* If no table is presented then clear the entire virtual IDT. */
-    if ( guest_handle_is_null(traps) )
-    {
-        memset(dst, 0, NR_VECTORS * sizeof(*dst));
-        init_int80_direct_trap(curr);
-        return 0;
-    }
-
-    for ( ; ; )
-    {
-        if ( copy_from_guest(&cur, traps, 1) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( cur.address == 0 )
-            break;
-
-        if ( !is_canonical_address(cur.address) )
-            return -EINVAL;
-
-        fixup_guest_code_selector(curr->domain, cur.cs);
-
-        memcpy(&dst[cur.vector], &cur, sizeof(cur));
-
-        if ( cur.vector == 0x80 )
-            init_int80_direct_trap(curr);
-
-        guest_handle_add_offset(traps, 1);
-
-        if ( hypercall_preempt_check() )
-        {
-            rc = hypercall_create_continuation(
-                __HYPERVISOR_set_trap_table, "h", traps);
-            break;
-        }
-    }
-
-    return rc;
-}
-
 void activate_debugregs(const struct vcpu *curr)
 {
     ASSERT(curr == current);
@@ -2299,31 +2230,6 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
     return 0;
 }
 
-long do_set_debugreg(int reg, unsigned long value)
-{
-    return set_debugreg(current, reg, value);
-}
-
-unsigned long do_get_debugreg(int reg)
-{
-    struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0 ... 3:
-    case 6:
-        return curr->arch.debugreg[reg];
-    case 7:
-        return (curr->arch.debugreg[7] |
-                curr->arch.debugreg[5]);
-    case 4 ... 5:
-        return ((curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ?
-                curr->arch.debugreg[reg + 2] : 0);
-    }
-
-    return -EINVAL;
-}
-
 void asm_domain_crash_synchronous(unsigned long addr)
 {
     /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 07/22] x86/traps: move pv_inject_event to pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (5 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:42   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 69 ----------------------------------------------
 2 files changed, 73 insertions(+), 69 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 350e7a1da4..79304704dd 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -22,10 +22,14 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/trace.h>
 
 #include <asm/apic.h>
 #include <asm/debugreg.h>
+#include <asm/shared.h>
+#include <asm/traps.h>
 
 void do_entry_int82(struct cpu_user_regs *regs)
 {
@@ -128,6 +132,75 @@ unsigned long do_get_debugreg(int reg)
     return -EINVAL;
 }
 
+void pv_inject_event(const struct x86_event *event)
+{
+    struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct trap_bounce *tb;
+    const struct trap_info *ti;
+    const uint8_t vector = event->vector;
+    unsigned int error_code = event->error_code;
+    bool use_error_code;
+
+    ASSERT(vector == event->vector); /* Confirm no truncation. */
+    if ( event->type == X86_EVENTTYPE_HW_EXCEPTION )
+    {
+        ASSERT(vector < 32);
+        use_error_code = TRAP_HAVE_EC & (1u << vector);
+    }
+    else
+    {
+        ASSERT(event->type == X86_EVENTTYPE_SW_INTERRUPT);
+        use_error_code = false;
+    }
+    if ( use_error_code )
+        ASSERT(error_code != X86_EVENT_NO_EC);
+    else
+        ASSERT(error_code == X86_EVENT_NO_EC);
+
+    tb = &v->arch.pv_vcpu.trap_bounce;
+    ti = &v->arch.pv_vcpu.trap_ctxt[vector];
+
+    tb->flags = TBF_EXCEPTION;
+    tb->cs    = ti->cs;
+    tb->eip   = ti->address;
+
+    if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
+         vector == TRAP_page_fault )
+    {
+        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
+        arch_set_cr2(v, event->cr2);
+
+        /* Re-set error_code.user flag appropriately for the guest. */
+        error_code &= ~PFEC_user_mode;
+        if ( !guest_kernel_mode(v, regs) )
+            error_code |= PFEC_user_mode;
+
+        trace_pv_page_fault(event->cr2, error_code);
+    }
+    else
+        trace_pv_trap(vector, regs->rip, use_error_code, error_code);
+
+    if ( use_error_code )
+    {
+        tb->flags |= TBF_EXCEPTION_ERRCODE;
+        tb->error_code = error_code;
+    }
+
+    if ( TI_GET_IF(ti) )
+        tb->flags |= TBF_INTERRUPT;
+
+    if ( unlikely(null_trap_bounce(v, tb)) )
+    {
+        gprintk(XENLOG_WARNING,
+                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
+                trapstr(vector), vector, error_code);
+
+        if ( vector == TRAP_page_fault )
+            show_page_walk(event->cr2);
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e5a3c9ad1a..96f3ffffd6 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -627,75 +627,6 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
           (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-void pv_inject_event(const struct x86_event *event)
-{
-    struct vcpu *v = current;
-    struct cpu_user_regs *regs = guest_cpu_user_regs();
-    struct trap_bounce *tb;
-    const struct trap_info *ti;
-    const uint8_t vector = event->vector;
-    unsigned int error_code = event->error_code;
-    bool use_error_code;
-
-    ASSERT(vector == event->vector); /* Confirm no truncation. */
-    if ( event->type == X86_EVENTTYPE_HW_EXCEPTION )
-    {
-        ASSERT(vector < 32);
-        use_error_code = TRAP_HAVE_EC & (1u << vector);
-    }
-    else
-    {
-        ASSERT(event->type == X86_EVENTTYPE_SW_INTERRUPT);
-        use_error_code = false;
-    }
-    if ( use_error_code )
-        ASSERT(error_code != X86_EVENT_NO_EC);
-    else
-        ASSERT(error_code == X86_EVENT_NO_EC);
-
-    tb = &v->arch.pv_vcpu.trap_bounce;
-    ti = &v->arch.pv_vcpu.trap_ctxt[vector];
-
-    tb->flags = TBF_EXCEPTION;
-    tb->cs    = ti->cs;
-    tb->eip   = ti->address;
-
-    if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
-         vector == TRAP_page_fault )
-    {
-        v->arch.pv_vcpu.ctrlreg[2] = event->cr2;
-        arch_set_cr2(v, event->cr2);
-
-        /* Re-set error_code.user flag appropriately for the guest. */
-        error_code &= ~PFEC_user_mode;
-        if ( !guest_kernel_mode(v, regs) )
-            error_code |= PFEC_user_mode;
-
-        trace_pv_page_fault(event->cr2, error_code);
-    }
-    else
-        trace_pv_trap(vector, regs->rip, use_error_code, error_code);
-
-    if ( use_error_code )
-    {
-        tb->flags |= TBF_EXCEPTION_ERRCODE;
-        tb->error_code = error_code;
-    }
-
-    if ( TI_GET_IF(ti) )
-        tb->flags |= TBF_INTERRUPT;
-
-    if ( unlikely(null_trap_bounce(v, tb)) )
-    {
-        gprintk(XENLOG_WARNING,
-                "Unhandled %s fault/trap [#%d, ec=%04x]\n",
-                trapstr(vector), vector, error_code);
-
-        if ( vector == TRAP_page_fault )
-            show_page_walk(event->cr2);
-    }
-}
-
 /*
  * Called from asm to set up the MCE trapbounce info.
  * Returns 0 if no callback is set up, else 1.
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (6 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 07/22] x86/traps: move pv_inject_event " Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:43   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 09/22] x86/traps: move {un, }register_guest_nmi_callback Wei Liu
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 27 +++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 27 ---------------------------
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 79304704dd..d7e8db5820 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -201,6 +201,33 @@ void pv_inject_event(const struct x86_event *event)
     }
 }
 
+/*
+ * Called from asm to set up the MCE trapbounce info.
+ * Returns 0 if no callback is set up, else 1.
+ */
+int set_guest_machinecheck_trapbounce(void)
+{
+    struct vcpu *v = current;
+    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
+
+    pv_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
+    tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
+    return !null_trap_bounce(v, tb);
+}
+
+/*
+ * Called from asm to set up the NMI trapbounce info.
+ * Returns 0 if no callback is set up, else 1.
+ */
+int set_guest_nmi_trapbounce(void)
+{
+    struct vcpu *v = current;
+    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
+    pv_inject_hw_exception(TRAP_nmi, X86_EVENT_NO_EC);
+    tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
+    return !null_trap_bounce(v, tb);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 96f3ffffd6..8eb7b31a6c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -627,33 +627,6 @@ void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
           (regs->eflags & X86_EFLAGS_IF) ? "" : ", IN INTERRUPT CONTEXT");
 }
 
-/*
- * Called from asm to set up the MCE trapbounce info.
- * Returns 0 if no callback is set up, else 1.
- */
-int set_guest_machinecheck_trapbounce(void)
-{
-    struct vcpu *v = current;
-    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
- 
-    pv_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
-    tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
-    return !null_trap_bounce(v, tb);
-}
-
-/*
- * Called from asm to set up the NMI trapbounce info.
- * Returns 0 if no callback is set up, else 1.
- */
-int set_guest_nmi_trapbounce(void)
-{
-    struct vcpu *v = current;
-    struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
-    pv_inject_hw_exception(TRAP_nmi, X86_EVENT_NO_EC);
-    tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
-    return !null_trap_bounce(v, tb);
-}
-
 void do_reserved_trap(struct cpu_user_regs *regs)
 {
     unsigned int trapnr = regs->entry_vector;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 09/22] x86/traps: move {un, }register_guest_nmi_callback
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (7 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-18 17:09 ` [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap Wei Liu
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 36 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 36 ------------------------------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index d7e8db5820..ff7dd1905f 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -228,6 +228,42 @@ int set_guest_nmi_trapbounce(void)
     return !null_trap_bounce(v, tb);
 }
 
+long register_guest_nmi_callback(unsigned long address)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
+
+    if ( !is_canonical_address(address) )
+        return -EINVAL;
+
+    t->vector  = TRAP_nmi;
+    t->flags   = 0;
+    t->cs      = (is_pv_32bit_domain(d) ?
+                  FLAT_COMPAT_KERNEL_CS : FLAT_KERNEL_CS);
+    t->address = address;
+    TI_SET_IF(t, 1);
+
+    /*
+     * If no handler was registered we can 'lose the NMI edge'. Re-assert it
+     * now.
+     */
+    if ( (v->vcpu_id == 0) && (arch_get_nmi_reason(d) != 0) )
+        v->nmi_pending = 1;
+
+    return 0;
+}
+
+long unregister_guest_nmi_callback(void)
+{
+    struct vcpu *v = current;
+    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
+
+    memset(t, 0, sizeof(*t));
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8eb7b31a6c..e6028a4403 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1910,42 +1910,6 @@ void __init trap_init(void)
     open_softirq(PCI_SERR_SOFTIRQ, pci_serr_softirq);
 }
 
-long register_guest_nmi_callback(unsigned long address)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
-
-    if ( !is_canonical_address(address) )
-        return -EINVAL;
-
-    t->vector  = TRAP_nmi;
-    t->flags   = 0;
-    t->cs      = (is_pv_32bit_domain(d) ?
-                  FLAT_COMPAT_KERNEL_CS : FLAT_KERNEL_CS);
-    t->address = address;
-    TI_SET_IF(t, 1);
-
-    /*
-     * If no handler was registered we can 'lose the NMI edge'. Re-assert it
-     * now.
-     */
-    if ( (v->vcpu_id == 0) && (arch_get_nmi_reason(d) != 0) )
-        v->nmi_pending = 1;
-
-    return 0;
-}
-
-long unregister_guest_nmi_callback(void)
-{
-    struct vcpu *v = current;
-    struct trap_info *t = &v->arch.pv_vcpu.trap_ctxt[TRAP_nmi];
-
-    memset(t, 0, sizeof(*t));
-
-    return 0;
-}
-
 int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
 {
     struct vcpu *v;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (8 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 09/22] x86/traps: move {un, }register_guest_nmi_callback Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:49   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

It needs to be non-static when we split PV specific code out.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c        | 2 +-
 xen/include/asm-x86/traps.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e6028a4403..a26b002173 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1478,7 +1478,7 @@ void do_general_protection(struct cpu_user_regs *regs)
     panic("GENERAL PROTECTION FAULT\n[error_code=%04x]", regs->error_code);
 }
 
-static DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
+DEFINE_PER_CPU(struct softirq_trap, softirq_trap);
 
 static void nmi_mce_softirq(void)
 {
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index f1d2513e6b..4e8760482f 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -24,6 +24,7 @@ struct softirq_trap {
 	struct vcpu *vcpu;	/* vcpu to inject trap */
 	int processor;		/* physical cpu to inject trap */
 };
+DECLARE_PER_CPU(struct softirq_trap, softirq_trap);
 
 struct cpu_user_regs;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (9 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:54   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 12/22] x86/traps: move send_guest_trap " Wei Liu
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 18 ++++++++++++++++++
 xen/arch/x86/traps.c    | 18 ------------------
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index ff7dd1905f..6876150969 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -264,6 +264,24 @@ long unregister_guest_nmi_callback(void)
     return 0;
 }
 
+int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
+                            unsigned int trap_nr)
+{
+    struct vcpu *v;
+    struct trap_info *t;
+
+    BUG_ON(d == NULL);
+    BUG_ON(vcpuid >= d->max_vcpus);
+
+    /* Sanity check - XXX should be more fine grained. */
+    BUG_ON(trap_nr >= NR_VECTORS);
+
+    v = d->vcpu[vcpuid];
+    t = &v->arch.pv_vcpu.trap_ctxt[trap_nr];
+
+    return (t->address != 0);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index a26b002173..2a4dc159cf 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1910,24 +1910,6 @@ void __init trap_init(void)
     open_softirq(PCI_SERR_SOFTIRQ, pci_serr_softirq);
 }
 
-int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
-{
-    struct vcpu *v;
-    struct trap_info *t;
-
-    BUG_ON(d == NULL);
-    BUG_ON(vcpuid >= d->max_vcpus);
-
-    /* Sanity check - XXX should be more fine grained. */
-    BUG_ON(trap_nr >= NR_VECTORS);
-
-    v = d->vcpu[vcpuid];
-    t = &v->arch.pv_vcpu.trap_ctxt[trap_nr];
-
-    return (t->address != 0);
-}
-
-
 int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
 {
     struct vcpu *v;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 12/22] x86/traps: move send_guest_trap to pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (10 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 15:55   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode Wei Liu
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Fixed some coding style issues while moving.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c    | 47 ----------------------------------------------
 2 files changed, 50 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 6876150969..7c841e04cc 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -282,6 +282,56 @@ int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
     return (t->address != 0);
 }
 
+int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
+{
+    struct vcpu *v;
+    struct softirq_trap *st = &per_cpu(softirq_trap, smp_processor_id());
+
+    BUG_ON(d == NULL);
+    BUG_ON(vcpuid >= d->max_vcpus);
+    v = d->vcpu[vcpuid];
+
+    switch (trap_nr)
+    {
+    case TRAP_nmi:
+        if ( cmpxchgptr(&st->vcpu, NULL, v) )
+            return -EBUSY;
+        if ( !test_and_set_bool(v->nmi_pending) )
+        {
+               st->domain = d;
+               st->processor = v->processor;
+
+               /* not safe to wake up a vcpu here */
+               raise_softirq(NMI_MCE_SOFTIRQ);
+               return 0;
+        }
+        st->vcpu = NULL;
+        break;
+
+    case TRAP_machine_check:
+        if ( cmpxchgptr(&st->vcpu, NULL, v) )
+            return -EBUSY;
+
+        /* We are called by the machine check (exception or polling) handlers
+         * on the physical CPU that reported a machine check error. */
+
+        if ( !test_and_set_bool(v->mce_pending) )
+        {
+                st->domain = d;
+                st->processor = v->processor;
+
+                /* not safe to wake up a vcpu here */
+                raise_softirq(NMI_MCE_SOFTIRQ);
+                return 0;
+        }
+        st->vcpu = NULL;
+        break;
+    }
+
+    /* delivery failed */
+    return -EIO;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2a4dc159cf..3cc4b5b78b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1910,53 +1910,6 @@ void __init trap_init(void)
     open_softirq(PCI_SERR_SOFTIRQ, pci_serr_softirq);
 }
 
-int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
-{
-    struct vcpu *v;
-    struct softirq_trap *st = &per_cpu(softirq_trap, smp_processor_id());
-
-    BUG_ON(d == NULL);
-    BUG_ON(vcpuid >= d->max_vcpus);
-    v = d->vcpu[vcpuid];
-
-    switch (trap_nr) {
-    case TRAP_nmi:
-        if ( cmpxchgptr(&st->vcpu, NULL, v) )
-            return -EBUSY;
-        if ( !test_and_set_bool(v->nmi_pending) ) {
-               st->domain = d;
-               st->processor = v->processor;
-
-               /* not safe to wake up a vcpu here */
-               raise_softirq(NMI_MCE_SOFTIRQ);
-               return 0;
-        }
-        st->vcpu = NULL;
-        break;
-
-    case TRAP_machine_check:
-        if ( cmpxchgptr(&st->vcpu, NULL, v) )
-            return -EBUSY;
-
-        /* We are called by the machine check (exception or polling) handlers
-         * on the physical CPU that reported a machine check error. */
-
-        if ( !test_and_set_bool(v->mce_pending) ) {
-                st->domain = d;
-                st->processor = v->processor;
-
-                /* not safe to wake up a vcpu here */
-                raise_softirq(NMI_MCE_SOFTIRQ);
-                return 0;
-        }
-        st->vcpu = NULL;
-        break;
-    }
-
-    /* delivery failed */
-    return -EIO;
-}
-
 void activate_debugregs(const struct vcpu *curr)
 {
     ASSERT(curr == current);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (11 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 12/22] x86/traps: move send_guest_trap " Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 16:05   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c Wei Liu
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Move from x86_64/traps.c to pv/traps.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c     | 30 ++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/traps.c | 30 ------------------------------
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 7c841e04cc..5b84a617e6 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -332,6 +332,36 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
     return -EIO;
 }
 
+void toggle_guest_mode(struct vcpu *v)
+{
+    if ( is_pv_32bit_vcpu(v) )
+        return;
+    if ( cpu_has_fsgsbase )
+    {
+        if ( v->arch.flags & TF_kernel_mode )
+            v->arch.pv_vcpu.gs_base_kernel = __rdgsbase();
+        else
+            v->arch.pv_vcpu.gs_base_user = __rdgsbase();
+    }
+    v->arch.flags ^= TF_kernel_mode;
+    asm volatile ( "swapgs" );
+    update_cr3(v);
+    /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
+    asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
+
+    if ( !(v->arch.flags & TF_kernel_mode) )
+        return;
+
+    if ( v->arch.pv_vcpu.need_update_runstate_area &&
+         update_runstate_area(v) )
+        v->arch.pv_vcpu.need_update_runstate_area = 0;
+
+    if ( v->arch.pv_vcpu.pending_system_time.version &&
+         update_secondary_system_time(v,
+                                      &v->arch.pv_vcpu.pending_system_time) )
+        v->arch.pv_vcpu.pending_system_time.version = 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 78f410517c..36b694c605 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -254,36 +254,6 @@ void do_double_fault(struct cpu_user_regs *regs)
     panic("DOUBLE FAULT -- system shutdown");
 }
 
-void toggle_guest_mode(struct vcpu *v)
-{
-    if ( is_pv_32bit_vcpu(v) )
-        return;
-    if ( cpu_has_fsgsbase )
-    {
-        if ( v->arch.flags & TF_kernel_mode )
-            v->arch.pv_vcpu.gs_base_kernel = __rdgsbase();
-        else
-            v->arch.pv_vcpu.gs_base_user = __rdgsbase();
-    }
-    v->arch.flags ^= TF_kernel_mode;
-    asm volatile ( "swapgs" );
-    update_cr3(v);
-    /* Don't flush user global mappings from the TLB. Don't tick TLB clock. */
-    asm volatile ( "mov %0, %%cr3" : : "r" (v->arch.cr3) : "memory" );
-
-    if ( !(v->arch.flags & TF_kernel_mode) )
-        return;
-
-    if ( v->arch.pv_vcpu.need_update_runstate_area &&
-         update_runstate_area(v) )
-        v->arch.pv_vcpu.need_update_runstate_area = 0;
-
-    if ( v->arch.pv_vcpu.pending_system_time.version &&
-         update_secondary_system_time(v,
-                                      &v->arch.pv_vcpu.pending_system_time) )
-        v->arch.pv_vcpu.pending_system_time.version = 0;
-}
-
 unsigned long do_iret(void)
 {
     struct cpu_user_regs *regs = guest_cpu_user_regs();
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (12 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 16:07   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap Wei Liu
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c     | 56 +++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/traps.c | 56 ---------------------------------------------
 2 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 5b84a617e6..b69990c6b7 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -362,6 +362,62 @@ void toggle_guest_mode(struct vcpu *v)
         v->arch.pv_vcpu.pending_system_time.version = 0;
 }
 
+unsigned long do_iret(void)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct iret_context iret_saved;
+    struct vcpu *v = current;
+
+    if ( unlikely(copy_from_user(&iret_saved, (void *)regs->rsp,
+                                 sizeof(iret_saved))) )
+    {
+        gprintk(XENLOG_ERR,
+                "Fault while reading IRET context from guest stack\n");
+        goto exit_and_crash;
+    }
+
+    /* Returning to user mode? */
+    if ( (iret_saved.cs & 3) == 3 )
+    {
+        if ( unlikely(pagetable_is_null(v->arch.guest_table_user)) )
+        {
+            gprintk(XENLOG_ERR,
+                    "Guest switching to user mode with no user page tables\n");
+            goto exit_and_crash;
+        }
+        toggle_guest_mode(v);
+    }
+
+    if ( VM_ASSIST(v->domain, architectural_iopl) )
+        v->arch.pv_vcpu.iopl = iret_saved.rflags & X86_EFLAGS_IOPL;
+
+    regs->rip    = iret_saved.rip;
+    regs->cs     = iret_saved.cs | 3; /* force guest privilege */
+    regs->rflags = ((iret_saved.rflags & ~(X86_EFLAGS_IOPL|X86_EFLAGS_VM))
+                    | X86_EFLAGS_IF);
+    regs->rsp    = iret_saved.rsp;
+    regs->ss     = iret_saved.ss | 3; /* force guest privilege */
+
+    if ( !(iret_saved.flags & VGCF_in_syscall) )
+    {
+        regs->entry_vector &= ~TRAP_syscall;
+        regs->r11 = iret_saved.r11;
+        regs->rcx = iret_saved.rcx;
+    }
+
+    /* Restore upcall mask from supplied EFLAGS.IF. */
+    vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & X86_EFLAGS_IF);
+
+    async_exception_cleanup(v);
+
+    /* Saved %rax gets written back to regs->rax in entry.S. */
+    return iret_saved.rax;
+
+ exit_and_crash:
+    domain_crash(v->domain);
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 36b694c605..4641bc6d06 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -254,62 +254,6 @@ void do_double_fault(struct cpu_user_regs *regs)
     panic("DOUBLE FAULT -- system shutdown");
 }
 
-unsigned long do_iret(void)
-{
-    struct cpu_user_regs *regs = guest_cpu_user_regs();
-    struct iret_context iret_saved;
-    struct vcpu *v = current;
-
-    if ( unlikely(copy_from_user(&iret_saved, (void *)regs->rsp,
-                                 sizeof(iret_saved))) )
-    {
-        gprintk(XENLOG_ERR,
-                "Fault while reading IRET context from guest stack\n");
-        goto exit_and_crash;
-    }
-
-    /* Returning to user mode? */
-    if ( (iret_saved.cs & 3) == 3 )
-    {
-        if ( unlikely(pagetable_is_null(v->arch.guest_table_user)) )
-        {
-            gprintk(XENLOG_ERR,
-                    "Guest switching to user mode with no user page tables\n");
-            goto exit_and_crash;
-        }
-        toggle_guest_mode(v);
-    }
-
-    if ( VM_ASSIST(v->domain, architectural_iopl) )
-        v->arch.pv_vcpu.iopl = iret_saved.rflags & X86_EFLAGS_IOPL;
-
-    regs->rip    = iret_saved.rip;
-    regs->cs     = iret_saved.cs | 3; /* force guest privilege */
-    regs->rflags = ((iret_saved.rflags & ~(X86_EFLAGS_IOPL|X86_EFLAGS_VM))
-                    | X86_EFLAGS_IF);
-    regs->rsp    = iret_saved.rsp;
-    regs->ss     = iret_saved.ss | 3; /* force guest privilege */
-
-    if ( !(iret_saved.flags & VGCF_in_syscall) )
-    {
-        regs->entry_vector &= ~TRAP_syscall;
-        regs->r11 = iret_saved.r11;
-        regs->rcx = iret_saved.rcx;
-    }
-
-    /* Restore upcall mask from supplied EFLAGS.IF. */
-    vcpu_info(v, evtchn_upcall_mask) = !(iret_saved.rflags & X86_EFLAGS_IF);
-
-    async_exception_cleanup(v);
-
-    /* Saved %rax gets written back to regs->rax in entry.S. */
-    return iret_saved.rax;
-
- exit_and_crash:
-    domain_crash(v->domain);
-    return 0;
-}
-
 static unsigned int write_stub_trampoline(
     unsigned char *stub, unsigned long stub_va,
     unsigned long stack_bottom, unsigned long target_va)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (13 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 16:07   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 16/22] x86/traps: move callback_op code Wei Liu
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c     | 14 ++++++++++++++
 xen/arch/x86/x86_64/traps.c | 14 --------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index b69990c6b7..cb9b3b1425 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -418,6 +418,20 @@ unsigned long do_iret(void)
     return 0;
 }
 
+void init_int80_direct_trap(struct vcpu *v)
+{
+    struct trap_info *ti = &v->arch.pv_vcpu.trap_ctxt[0x80];
+    struct trap_bounce *tb = &v->arch.pv_vcpu.int80_bounce;
+
+    tb->cs    = ti->cs;
+    tb->eip   = ti->address;
+
+    if ( null_trap_bounce(v, tb) )
+        tb->flags = 0;
+    else
+        tb->flags = TBF_EXCEPTION | (TI_GET_IF(ti) ? TBF_INTERRUPT : 0);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 4641bc6d06..a4ea09b1ac 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -336,20 +336,6 @@ void subarch_percpu_traps_init(void)
     wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
 }
 
-void init_int80_direct_trap(struct vcpu *v)
-{
-    struct trap_info *ti = &v->arch.pv_vcpu.trap_ctxt[0x80];
-    struct trap_bounce *tb = &v->arch.pv_vcpu.int80_bounce;
-
-    tb->cs    = ti->cs;
-    tb->eip   = ti->address;
-
-    if ( null_trap_bounce(v, tb) )
-        tb->flags = 0;
-    else
-        tb->flags = TBF_EXCEPTION | (TI_GET_IF(ti) ? TBF_INTERRUPT : 0);
-}
-
 static long register_guest_callback(struct callback_register *reg)
 {
     long ret = 0;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 16/22] x86/traps: move callback_op code
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (14 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 16:09   ` Jan Beulich
  2017-05-18 17:09 ` [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel Wei Liu
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c     | 148 +++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/traps.c | 149 --------------------------------------------
 2 files changed, 148 insertions(+), 149 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index cb9b3b1425..7cdec77618 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -31,6 +31,8 @@
 #include <asm/shared.h>
 #include <asm/traps.h>
 
+#include <public/callback.h>
+
 void do_entry_int82(struct cpu_user_regs *regs)
 {
     if ( unlikely(untrusted_msi) )
@@ -432,6 +434,152 @@ void init_int80_direct_trap(struct vcpu *v)
         tb->flags = TBF_EXCEPTION | (TI_GET_IF(ti) ? TBF_INTERRUPT : 0);
 }
 
+static long register_guest_callback(struct callback_register *reg)
+{
+    long ret = 0;
+    struct vcpu *v = current;
+
+    if ( !is_canonical_address(reg->address) )
+        return -EINVAL;
+
+    switch ( reg->type )
+    {
+    case CALLBACKTYPE_event:
+        v->arch.pv_vcpu.event_callback_eip    = reg->address;
+        break;
+
+    case CALLBACKTYPE_failsafe:
+        v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
+        if ( reg->flags & CALLBACKF_mask_events )
+            set_bit(_VGCF_failsafe_disables_events,
+                    &v->arch.vgc_flags);
+        else
+            clear_bit(_VGCF_failsafe_disables_events,
+                      &v->arch.vgc_flags);
+        break;
+
+    case CALLBACKTYPE_syscall:
+        v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
+        if ( reg->flags & CALLBACKF_mask_events )
+            set_bit(_VGCF_syscall_disables_events,
+                    &v->arch.vgc_flags);
+        else
+            clear_bit(_VGCF_syscall_disables_events,
+                      &v->arch.vgc_flags);
+        break;
+
+    case CALLBACKTYPE_syscall32:
+        v->arch.pv_vcpu.syscall32_callback_eip = reg->address;
+        v->arch.pv_vcpu.syscall32_disables_events =
+            !!(reg->flags & CALLBACKF_mask_events);
+        break;
+
+    case CALLBACKTYPE_sysenter:
+        v->arch.pv_vcpu.sysenter_callback_eip = reg->address;
+        v->arch.pv_vcpu.sysenter_disables_events =
+            !!(reg->flags & CALLBACKF_mask_events);
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = register_guest_nmi_callback(reg->address);
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+static long unregister_guest_callback(struct callback_unregister *unreg)
+{
+    long ret;
+
+    switch ( unreg->type )
+    {
+    case CALLBACKTYPE_event:
+    case CALLBACKTYPE_failsafe:
+    case CALLBACKTYPE_syscall:
+    case CALLBACKTYPE_syscall32:
+    case CALLBACKTYPE_sysenter:
+        ret = -EINVAL;
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = unregister_guest_nmi_callback();
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+long do_callback_op(int cmd, XEN_GUEST_HANDLE_PARAM(const_void) arg)
+{
+    long ret;
+
+    switch ( cmd )
+    {
+    case CALLBACKOP_register:
+    {
+        struct callback_register reg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&reg, arg, 1) )
+            break;
+
+        ret = register_guest_callback(&reg);
+    }
+    break;
+
+    case CALLBACKOP_unregister:
+    {
+        struct callback_unregister unreg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&unreg, arg, 1) )
+            break;
+
+        ret = unregister_guest_callback(&unreg);
+    }
+    break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+long do_set_callbacks(unsigned long event_address,
+                      unsigned long failsafe_address,
+                      unsigned long syscall_address)
+{
+    struct callback_register event = {
+        .type = CALLBACKTYPE_event,
+        .address = event_address,
+    };
+    struct callback_register failsafe = {
+        .type = CALLBACKTYPE_failsafe,
+        .address = failsafe_address,
+    };
+    struct callback_register syscall = {
+        .type = CALLBACKTYPE_syscall,
+        .address = syscall_address,
+    };
+
+    register_guest_callback(&event);
+    register_guest_callback(&failsafe);
+    register_guest_callback(&syscall);
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index a4ea09b1ac..db2037509e 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -23,8 +23,6 @@
 #include <asm/shared.h>
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
-#include <public/callback.h>
-
 
 static void print_xen_info(void)
 {
@@ -336,153 +334,6 @@ void subarch_percpu_traps_init(void)
     wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
 }
 
-static long register_guest_callback(struct callback_register *reg)
-{
-    long ret = 0;
-    struct vcpu *v = current;
-
-    if ( !is_canonical_address(reg->address) )
-        return -EINVAL;
-
-    switch ( reg->type )
-    {
-    case CALLBACKTYPE_event:
-        v->arch.pv_vcpu.event_callback_eip    = reg->address;
-        break;
-
-    case CALLBACKTYPE_failsafe:
-        v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
-        if ( reg->flags & CALLBACKF_mask_events )
-            set_bit(_VGCF_failsafe_disables_events,
-                    &v->arch.vgc_flags);
-        else
-            clear_bit(_VGCF_failsafe_disables_events,
-                      &v->arch.vgc_flags);
-        break;
-
-    case CALLBACKTYPE_syscall:
-        v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
-        if ( reg->flags & CALLBACKF_mask_events )
-            set_bit(_VGCF_syscall_disables_events,
-                    &v->arch.vgc_flags);
-        else
-            clear_bit(_VGCF_syscall_disables_events,
-                      &v->arch.vgc_flags);
-        break;
-
-    case CALLBACKTYPE_syscall32:
-        v->arch.pv_vcpu.syscall32_callback_eip = reg->address;
-        v->arch.pv_vcpu.syscall32_disables_events =
-            !!(reg->flags & CALLBACKF_mask_events);
-        break;
-
-    case CALLBACKTYPE_sysenter:
-        v->arch.pv_vcpu.sysenter_callback_eip = reg->address;
-        v->arch.pv_vcpu.sysenter_disables_events =
-            !!(reg->flags & CALLBACKF_mask_events);
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = register_guest_nmi_callback(reg->address);
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-static long unregister_guest_callback(struct callback_unregister *unreg)
-{
-    long ret;
-
-    switch ( unreg->type )
-    {
-    case CALLBACKTYPE_event:
-    case CALLBACKTYPE_failsafe:
-    case CALLBACKTYPE_syscall:
-    case CALLBACKTYPE_syscall32:
-    case CALLBACKTYPE_sysenter:
-        ret = -EINVAL;
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = unregister_guest_nmi_callback();
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-
-long do_callback_op(int cmd, XEN_GUEST_HANDLE_PARAM(const_void) arg)
-{
-    long ret;
-
-    switch ( cmd )
-    {
-    case CALLBACKOP_register:
-    {
-        struct callback_register reg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&reg, arg, 1) )
-            break;
-
-        ret = register_guest_callback(&reg);
-    }
-    break;
-
-    case CALLBACKOP_unregister:
-    {
-        struct callback_unregister unreg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&unreg, arg, 1) )
-            break;
-
-        ret = unregister_guest_callback(&unreg);
-    }
-    break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-long do_set_callbacks(unsigned long event_address,
-                      unsigned long failsafe_address,
-                      unsigned long syscall_address)
-{
-    struct callback_register event = {
-        .type = CALLBACKTYPE_event,
-        .address = event_address,
-    };
-    struct callback_register failsafe = {
-        .type = CALLBACKTYPE_failsafe,
-        .address = failsafe_address,
-    };
-    struct callback_register syscall = {
-        .type = CALLBACKTYPE_syscall,
-        .address = syscall_address,
-    };
-
-    register_guest_callback(&event);
-    register_guest_callback(&failsafe);
-    register_guest_callback(&syscall);
-
-    return 0;
-}
-
 static void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
 {
     char *p;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (15 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 16/22] x86/traps: move callback_op code Wei Liu
@ 2017-05-18 17:09 ` Wei Liu
  2017-05-29 16:10   ` Jan Beulich
  2017-05-18 17:10 ` [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:09 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

And export it via pv/domain.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c         | 36 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/traps.c     | 37 +------------------------------------
 xen/include/asm-x86/pv/domain.h |  5 +++++
 3 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 7cdec77618..4f52d3e4d3 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -580,6 +580,42 @@ long do_set_callbacks(unsigned long event_address,
     return 0;
 }
 
+void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    /* Fill in all the transfer points with template machine code. */
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        if ( i == __HYPERVISOR_iret )
+            continue;
+
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
+        *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
+        *(u8  *)(p+ 3) = 0xb8;    /* mov  $<i>,%eax */
+        *(u32 *)(p+ 4) = i;
+        *(u16 *)(p+ 8) = 0x050f;  /* syscall */
+        *(u16 *)(p+10) = 0x5b41;  /* pop  %r11 */
+        *(u8  *)(p+12) = 0x59;    /* pop  %rcx */
+        *(u8  *)(p+13) = 0xc3;    /* ret */
+    }
+
+    /*
+     * HYPERVISOR_iret is special because it doesn't return and expects a
+     * special stack frame. Guests jump at this transfer point instead of
+     * calling it.
+     */
+    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
+    *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
+    *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
+    *(u8  *)(p+ 3) = 0x50;    /* push %rax */
+    *(u8  *)(p+ 4) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
+    *(u32 *)(p+ 5) = __HYPERVISOR_iret;
+    *(u16 *)(p+ 9) = 0x050f;  /* syscall */
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index db2037509e..7a4dd4458e 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -23,6 +23,7 @@
 #include <asm/shared.h>
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
+#include <asm/pv/domain.h>
 
 static void print_xen_info(void)
 {
@@ -334,42 +335,6 @@ void subarch_percpu_traps_init(void)
     wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
 }
 
-static void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
-{
-    char *p;
-    int i;
-
-    /* Fill in all the transfer points with template machine code. */
-    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
-    {
-        if ( i == __HYPERVISOR_iret )
-            continue;
-
-        p = (char *)(hypercall_page + (i * 32));
-        *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
-        *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
-        *(u8  *)(p+ 3) = 0xb8;    /* mov  $<i>,%eax */
-        *(u32 *)(p+ 4) = i;
-        *(u16 *)(p+ 8) = 0x050f;  /* syscall */
-        *(u16 *)(p+10) = 0x5b41;  /* pop  %r11 */
-        *(u8  *)(p+12) = 0x59;    /* pop  %rcx */
-        *(u8  *)(p+13) = 0xc3;    /* ret */
-    }
-
-    /*
-     * HYPERVISOR_iret is special because it doesn't return and expects a
-     * special stack frame. Guests jump at this transfer point instead of
-     * calling it.
-     */
-    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
-    *(u8  *)(p+ 0) = 0x51;    /* push %rcx */
-    *(u16 *)(p+ 1) = 0x5341;  /* push %r11 */
-    *(u8  *)(p+ 3) = 0x50;    /* push %rax */
-    *(u8  *)(p+ 4) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
-    *(u32 *)(p+ 5) = __HYPERVISOR_iret;
-    *(u16 *)(p+ 9) = 0x050f;  /* syscall */
-}
-
 #include "compat/traps.c"
 
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index acdf140fbd..dfa60b080c 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -29,6 +29,8 @@ void pv_domain_destroy(struct domain *d);
 int pv_domain_initialise(struct domain *d, unsigned int domcr_flags,
                          struct xen_arch_domainconfig *config);
 
+void hypercall_page_initialise_ring3_kernel(void *hypercall_page);
+
 #else  /* !CONFIG_PV */
 
 #include <xen/errno.h>
@@ -42,6 +44,9 @@ static inline int pv_domain_initialise(struct domain *d,
 {
     return -EOPNOTSUPP;
 }
+
+void hypercall_page_initialise_ring3_kernel(void *hypercall_page) {}
+
 #endif	/* CONFIG_PV */
 
 void paravirt_ctxt_switch_from(struct vcpu *v);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (16 preceding siblings ...)
  2017-05-18 17:09 ` [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel Wei Liu
@ 2017-05-18 17:10 ` Wei Liu
  2017-05-29 16:12   ` Jan Beulich
  2017-05-18 17:10 ` [PATCH for-next v3 19/22] x86: clean up pv/traps.c Wei Liu
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Export hypercall_page_initialise_ring1_kernel as the code is moved.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c            | 405 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/traps.c | 416 -------------------------------------
 xen/arch/x86/x86_64/traps.c        |   2 -
 xen/include/asm-x86/pv/domain.h    |   2 +
 4 files changed, 407 insertions(+), 418 deletions(-)
 delete mode 100644 xen/arch/x86/x86_64/compat/traps.c

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index 4f52d3e4d3..db92f6d520 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -31,6 +31,9 @@
 #include <asm/shared.h>
 #include <asm/traps.h>
 
+#include <compat/callback.h>
+#include <compat/arch-x86_32.h>
+
 #include <public/callback.h>
 
 void do_entry_int82(struct cpu_user_regs *regs)
@@ -616,6 +619,408 @@ void hypercall_page_initialise_ring3_kernel(void *hypercall_page)
     *(u16 *)(p+ 9) = 0x050f;  /* syscall */
 }
 
+/* Compat guest interfaces */
+
+void compat_show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs,
+                             int debug_stack_lines)
+{
+    unsigned int i, *stack, addr, mask = STACK_SIZE;
+
+    stack = (unsigned int *)(unsigned long)regs->esp;
+    printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
+
+    if ( !__compat_access_ok(v->domain, stack, sizeof(*stack)) )
+    {
+        printk("Guest-inaccessible memory.\n");
+        return;
+    }
+
+    if ( v != current )
+    {
+        struct vcpu *vcpu;
+        unsigned long mfn;
+
+        ASSERT(guest_kernel_mode(v, regs));
+        mfn = read_cr3() >> PAGE_SHIFT;
+        for_each_vcpu( v->domain, vcpu )
+            if ( pagetable_get_pfn(vcpu->arch.guest_table) == mfn )
+                break;
+        if ( !vcpu )
+        {
+            stack = do_page_walk(v, (unsigned long)stack);
+            if ( (unsigned long)stack < PAGE_SIZE )
+            {
+                printk("Inaccessible guest memory.\n");
+                return;
+            }
+            mask = PAGE_SIZE;
+        }
+    }
+
+    for ( i = 0; i < debug_stack_lines * 8; i++ )
+    {
+        if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
+            break;
+        if ( __get_user(addr, stack) )
+        {
+            if ( i != 0 )
+                printk("\n    ");
+            printk("Fault while accessing guest memory.");
+            i = 1;
+            break;
+        }
+        if ( (i != 0) && ((i % 8) == 0) )
+            printk("\n ");
+        printk(" %08x", addr);
+        stack++;
+    }
+    if ( mask == PAGE_SIZE )
+    {
+        BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
+        unmap_domain_page(stack);
+    }
+    if ( i == 0 )
+        printk("Stack empty.");
+    printk("\n");
+}
+
+unsigned int compat_iret(void)
+{
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    struct vcpu *v = current;
+    u32 eflags;
+
+    /* Trim stack pointer to 32 bits. */
+    regs->rsp = (u32)regs->rsp;
+
+    /* Restore EAX (clobbered by hypercall). */
+    if ( unlikely(__get_user(regs->eax, (u32 *)regs->rsp)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    /* Restore CS and EIP. */
+    if ( unlikely(__get_user(regs->eip, (u32 *)regs->rsp + 1)) ||
+        unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    /*
+     * Fix up and restore EFLAGS. We fix up in a local staging area
+     * to avoid firing the BUG_ON(IOPL) check in arch_get_info_guest.
+     */
+    if ( unlikely(__get_user(eflags, (u32 *)regs->rsp + 3)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    if ( VM_ASSIST(v->domain, architectural_iopl) )
+        v->arch.pv_vcpu.iopl = eflags & X86_EFLAGS_IOPL;
+
+    regs->eflags = (eflags & ~X86_EFLAGS_IOPL) | X86_EFLAGS_IF;
+
+    if ( unlikely(eflags & X86_EFLAGS_VM) )
+    {
+        /*
+         * Cannot return to VM86 mode: inject a GP fault instead. Note that
+         * the GP fault is reported on the first VM86 mode instruction, not on
+         * the IRET (which is why we can simply leave the stack frame as-is
+         * (except for perhaps having to copy it), which in turn seems better
+         * than teaching create_bounce_frame() to needlessly deal with vm86
+         * mode frames).
+         */
+        const struct trap_info *ti;
+        u32 x, ksp = v->arch.pv_vcpu.kernel_sp - 40;
+        unsigned int i;
+        int rc = 0;
+
+        gdprintk(XENLOG_ERR, "VM86 mode unavailable (ksp:%08X->%08X)\n",
+                 regs->esp, ksp);
+        if ( ksp < regs->esp )
+        {
+            for (i = 1; i < 10; ++i)
+            {
+                rc |= __get_user(x, (u32 *)regs->rsp + i);
+                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
+            }
+        }
+        else if ( ksp > regs->esp )
+        {
+            for ( i = 9; i > 0; --i )
+            {
+                rc |= __get_user(x, (u32 *)regs->rsp + i);
+                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
+            }
+        }
+        if ( rc )
+        {
+            domain_crash(v->domain);
+            return 0;
+        }
+        regs->esp = ksp;
+        regs->ss = v->arch.pv_vcpu.kernel_ss;
+
+        ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
+        if ( TI_GET_IF(ti) )
+            eflags &= ~X86_EFLAGS_IF;
+        regs->eflags &= ~(X86_EFLAGS_VM|X86_EFLAGS_RF|
+                          X86_EFLAGS_NT|X86_EFLAGS_TF);
+        if ( unlikely(__put_user(0, (u32 *)regs->rsp)) )
+        {
+            domain_crash(v->domain);
+            return 0;
+        }
+        regs->eip = ti->address;
+        regs->cs = ti->cs;
+    }
+    else if ( unlikely(ring_0(regs)) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+    else if ( ring_1(regs) )
+        regs->esp += 16;
+    /* Return to ring 2/3: restore ESP and SS. */
+    else if ( __get_user(regs->ss, (u32 *)regs->rsp + 5) ||
+              __get_user(regs->esp, (u32 *)regs->rsp + 4) )
+    {
+        domain_crash(v->domain);
+        return 0;
+    }
+
+    /* Restore upcall mask from supplied EFLAGS.IF. */
+    vcpu_info(v, evtchn_upcall_mask) = !(eflags & X86_EFLAGS_IF);
+
+    async_exception_cleanup(v);
+
+    /*
+     * The hypercall exit path will overwrite EAX with this return
+     * value.
+     */
+    return regs->eax;
+}
+
+static long compat_register_guest_callback(
+    struct compat_callback_register *reg)
+{
+    long ret = 0;
+    struct vcpu *v = current;
+
+    fixup_guest_code_selector(v->domain, reg->address.cs);
+
+    switch ( reg->type )
+    {
+    case CALLBACKTYPE_event:
+        v->arch.pv_vcpu.event_callback_cs     = reg->address.cs;
+        v->arch.pv_vcpu.event_callback_eip    = reg->address.eip;
+        break;
+
+    case CALLBACKTYPE_failsafe:
+        v->arch.pv_vcpu.failsafe_callback_cs  = reg->address.cs;
+        v->arch.pv_vcpu.failsafe_callback_eip = reg->address.eip;
+        if ( reg->flags & CALLBACKF_mask_events )
+            set_bit(_VGCF_failsafe_disables_events,
+                    &v->arch.vgc_flags);
+        else
+            clear_bit(_VGCF_failsafe_disables_events,
+                      &v->arch.vgc_flags);
+        break;
+
+    case CALLBACKTYPE_syscall32:
+        v->arch.pv_vcpu.syscall32_callback_cs     = reg->address.cs;
+        v->arch.pv_vcpu.syscall32_callback_eip    = reg->address.eip;
+        v->arch.pv_vcpu.syscall32_disables_events =
+            (reg->flags & CALLBACKF_mask_events) != 0;
+        break;
+
+    case CALLBACKTYPE_sysenter:
+        v->arch.pv_vcpu.sysenter_callback_cs     = reg->address.cs;
+        v->arch.pv_vcpu.sysenter_callback_eip    = reg->address.eip;
+        v->arch.pv_vcpu.sysenter_disables_events =
+            (reg->flags & CALLBACKF_mask_events) != 0;
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = register_guest_nmi_callback(reg->address.eip);
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+static long compat_unregister_guest_callback(
+    struct compat_callback_unregister *unreg)
+{
+    long ret;
+
+    switch ( unreg->type )
+    {
+    case CALLBACKTYPE_event:
+    case CALLBACKTYPE_failsafe:
+    case CALLBACKTYPE_syscall32:
+    case CALLBACKTYPE_sysenter:
+        ret = -EINVAL;
+        break;
+
+    case CALLBACKTYPE_nmi:
+        ret = unregister_guest_nmi_callback();
+        break;
+
+    default:
+        ret = -ENOSYS;
+        break;
+    }
+
+    return ret;
+}
+
+long compat_callback_op(int cmd, XEN_GUEST_HANDLE(void) arg)
+{
+    long ret;
+
+    switch ( cmd )
+    {
+    case CALLBACKOP_register:
+    {
+        struct compat_callback_register reg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&reg, arg, 1) )
+            break;
+
+        ret = compat_register_guest_callback(&reg);
+    }
+    break;
+
+    case CALLBACKOP_unregister:
+    {
+        struct compat_callback_unregister unreg;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&unreg, arg, 1) )
+            break;
+
+        ret = compat_unregister_guest_callback(&unreg);
+    }
+    break;
+
+    default:
+        ret = -EINVAL;
+        break;
+    }
+
+    return ret;
+}
+
+long compat_set_callbacks(unsigned long event_selector,
+                          unsigned long event_address,
+                          unsigned long failsafe_selector,
+                          unsigned long failsafe_address)
+{
+    struct compat_callback_register event = {
+        .type = CALLBACKTYPE_event,
+        .address = {
+            .cs = event_selector,
+            .eip = event_address
+        }
+    };
+    struct compat_callback_register failsafe = {
+        .type = CALLBACKTYPE_failsafe,
+        .address = {
+            .cs = failsafe_selector,
+            .eip = failsafe_address
+        }
+    };
+
+    compat_register_guest_callback(&event);
+    compat_register_guest_callback(&failsafe);
+
+    return 0;
+}
+
+int compat_set_trap_table(XEN_GUEST_HANDLE(trap_info_compat_t) traps)
+{
+    struct compat_trap_info cur;
+    struct trap_info *dst = current->arch.pv_vcpu.trap_ctxt;
+    long rc = 0;
+
+    /* If no table is presented then clear the entire virtual IDT. */
+    if ( guest_handle_is_null(traps) )
+    {
+        memset(dst, 0, NR_VECTORS * sizeof(*dst));
+        return 0;
+    }
+
+    for ( ; ; )
+    {
+        if ( copy_from_guest(&cur, traps, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( cur.address == 0 )
+            break;
+
+        fixup_guest_code_selector(current->domain, cur.cs);
+
+        XLAT_trap_info(dst + cur.vector, &cur);
+
+        if ( cur.vector == 0x80 )
+            init_int80_direct_trap(current);
+
+        guest_handle_add_offset(traps, 1);
+
+        if ( hypercall_preempt_check() )
+        {
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_set_trap_table, "h", traps);
+            break;
+        }
+    }
+
+    return rc;
+}
+
+void hypercall_page_initialise_ring1_kernel(void *hypercall_page)
+{
+    char *p;
+    int i;
+
+    /* Fill in all the transfer points with template machine code. */
+
+    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+    {
+        if ( i == __HYPERVISOR_iret )
+            continue;
+
+        p = (char *)(hypercall_page + (i * 32));
+        *(u8  *)(p+ 0) = 0xb8;    /* mov  $<i>,%eax */
+        *(u32 *)(p+ 1) = i;
+        *(u16 *)(p+ 5) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
+        *(u8  *)(p+ 7) = 0xc3;    /* ret */
+    }
+
+    /*
+     * HYPERVISOR_iret is special because it doesn't return and expects a
+     * special stack frame. Guests jump at this transfer point instead of
+     * calling it.
+     */
+    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
+    *(u8  *)(p+ 0) = 0x50;    /* push %eax */
+    *(u8  *)(p+ 1) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
+    *(u32 *)(p+ 2) = __HYPERVISOR_iret;
+    *(u16 *)(p+ 6) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/compat/traps.c b/xen/arch/x86/x86_64/compat/traps.c
deleted file mode 100644
index 1751ec67e8..0000000000
--- a/xen/arch/x86/x86_64/compat/traps.c
+++ /dev/null
@@ -1,416 +0,0 @@
-#include <xen/event.h>
-#include <asm/regs.h>
-#include <compat/callback.h>
-#include <compat/arch-x86_32.h>
-
-void compat_show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs,
-                             int debug_stack_lines)
-{
-    unsigned int i, *stack, addr, mask = STACK_SIZE;
-
-    stack = (unsigned int *)(unsigned long)regs->esp;
-    printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
-
-    if ( !__compat_access_ok(v->domain, stack, sizeof(*stack)) )
-    {
-        printk("Guest-inaccessible memory.\n");
-        return;
-    }
-
-    if ( v != current )
-    {
-        struct vcpu *vcpu;
-        unsigned long mfn;
-
-        ASSERT(guest_kernel_mode(v, regs));
-        mfn = read_cr3() >> PAGE_SHIFT;
-        for_each_vcpu( v->domain, vcpu )
-            if ( pagetable_get_pfn(vcpu->arch.guest_table) == mfn )
-                break;
-        if ( !vcpu )
-        {
-            stack = do_page_walk(v, (unsigned long)stack);
-            if ( (unsigned long)stack < PAGE_SIZE )
-            {
-                printk("Inaccessible guest memory.\n");
-                return;
-            }
-            mask = PAGE_SIZE;
-        }
-    }
-
-    for ( i = 0; i < debug_stack_lines * 8; i++ )
-    {
-        if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
-            break;
-        if ( __get_user(addr, stack) )
-        {
-            if ( i != 0 )
-                printk("\n    ");
-            printk("Fault while accessing guest memory.");
-            i = 1;
-            break;
-        }
-        if ( (i != 0) && ((i % 8) == 0) )
-            printk("\n ");
-        printk(" %08x", addr);
-        stack++;
-    }
-    if ( mask == PAGE_SIZE )
-    {
-        BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
-        unmap_domain_page(stack);
-    }
-    if ( i == 0 )
-        printk("Stack empty.");
-    printk("\n");
-}
-
-unsigned int compat_iret(void)
-{
-    struct cpu_user_regs *regs = guest_cpu_user_regs();
-    struct vcpu *v = current;
-    u32 eflags;
-
-    /* Trim stack pointer to 32 bits. */
-    regs->rsp = (u32)regs->rsp;
-
-    /* Restore EAX (clobbered by hypercall). */
-    if ( unlikely(__get_user(regs->eax, (u32 *)regs->rsp)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    /* Restore CS and EIP. */
-    if ( unlikely(__get_user(regs->eip, (u32 *)regs->rsp + 1)) ||
-        unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    /*
-     * Fix up and restore EFLAGS. We fix up in a local staging area
-     * to avoid firing the BUG_ON(IOPL) check in arch_get_info_guest.
-     */
-    if ( unlikely(__get_user(eflags, (u32 *)regs->rsp + 3)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    if ( VM_ASSIST(v->domain, architectural_iopl) )
-        v->arch.pv_vcpu.iopl = eflags & X86_EFLAGS_IOPL;
-
-    regs->eflags = (eflags & ~X86_EFLAGS_IOPL) | X86_EFLAGS_IF;
-
-    if ( unlikely(eflags & X86_EFLAGS_VM) )
-    {
-        /*
-         * Cannot return to VM86 mode: inject a GP fault instead. Note that
-         * the GP fault is reported on the first VM86 mode instruction, not on
-         * the IRET (which is why we can simply leave the stack frame as-is
-         * (except for perhaps having to copy it), which in turn seems better
-         * than teaching create_bounce_frame() to needlessly deal with vm86
-         * mode frames).
-         */
-        const struct trap_info *ti;
-        u32 x, ksp = v->arch.pv_vcpu.kernel_sp - 40;
-        unsigned int i;
-        int rc = 0;
-
-        gdprintk(XENLOG_ERR, "VM86 mode unavailable (ksp:%08X->%08X)\n",
-                 regs->esp, ksp);
-        if ( ksp < regs->esp )
-        {
-            for (i = 1; i < 10; ++i)
-            {
-                rc |= __get_user(x, (u32 *)regs->rsp + i);
-                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
-            }
-        }
-        else if ( ksp > regs->esp )
-        {
-            for ( i = 9; i > 0; --i )
-            {
-                rc |= __get_user(x, (u32 *)regs->rsp + i);
-                rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
-            }
-        }
-        if ( rc )
-        {
-            domain_crash(v->domain);
-            return 0;
-        }
-        regs->esp = ksp;
-        regs->ss = v->arch.pv_vcpu.kernel_ss;
-
-        ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
-        if ( TI_GET_IF(ti) )
-            eflags &= ~X86_EFLAGS_IF;
-        regs->eflags &= ~(X86_EFLAGS_VM|X86_EFLAGS_RF|
-                          X86_EFLAGS_NT|X86_EFLAGS_TF);
-        if ( unlikely(__put_user(0, (u32 *)regs->rsp)) )
-        {
-            domain_crash(v->domain);
-            return 0;
-        }
-        regs->eip = ti->address;
-        regs->cs = ti->cs;
-    }
-    else if ( unlikely(ring_0(regs)) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-    else if ( ring_1(regs) )
-        regs->esp += 16;
-    /* Return to ring 2/3: restore ESP and SS. */
-    else if ( __get_user(regs->ss, (u32 *)regs->rsp + 5) ||
-              __get_user(regs->esp, (u32 *)regs->rsp + 4) )
-    {
-        domain_crash(v->domain);
-        return 0;
-    }
-
-    /* Restore upcall mask from supplied EFLAGS.IF. */
-    vcpu_info(v, evtchn_upcall_mask) = !(eflags & X86_EFLAGS_IF);
-
-    async_exception_cleanup(v);
-
-    /*
-     * The hypercall exit path will overwrite EAX with this return
-     * value.
-     */
-    return regs->eax;
-}
-
-static long compat_register_guest_callback(
-    struct compat_callback_register *reg)
-{
-    long ret = 0;
-    struct vcpu *v = current;
-
-    fixup_guest_code_selector(v->domain, reg->address.cs);
-
-    switch ( reg->type )
-    {
-    case CALLBACKTYPE_event:
-        v->arch.pv_vcpu.event_callback_cs     = reg->address.cs;
-        v->arch.pv_vcpu.event_callback_eip    = reg->address.eip;
-        break;
-
-    case CALLBACKTYPE_failsafe:
-        v->arch.pv_vcpu.failsafe_callback_cs  = reg->address.cs;
-        v->arch.pv_vcpu.failsafe_callback_eip = reg->address.eip;
-        if ( reg->flags & CALLBACKF_mask_events )
-            set_bit(_VGCF_failsafe_disables_events,
-                    &v->arch.vgc_flags);
-        else
-            clear_bit(_VGCF_failsafe_disables_events,
-                      &v->arch.vgc_flags);
-        break;
-
-    case CALLBACKTYPE_syscall32:
-        v->arch.pv_vcpu.syscall32_callback_cs     = reg->address.cs;
-        v->arch.pv_vcpu.syscall32_callback_eip    = reg->address.eip;
-        v->arch.pv_vcpu.syscall32_disables_events =
-            (reg->flags & CALLBACKF_mask_events) != 0;
-        break;
-
-    case CALLBACKTYPE_sysenter:
-        v->arch.pv_vcpu.sysenter_callback_cs     = reg->address.cs;
-        v->arch.pv_vcpu.sysenter_callback_eip    = reg->address.eip;
-        v->arch.pv_vcpu.sysenter_disables_events =
-            (reg->flags & CALLBACKF_mask_events) != 0;
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = register_guest_nmi_callback(reg->address.eip);
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-static long compat_unregister_guest_callback(
-    struct compat_callback_unregister *unreg)
-{
-    long ret;
-
-    switch ( unreg->type )
-    {
-    case CALLBACKTYPE_event:
-    case CALLBACKTYPE_failsafe:
-    case CALLBACKTYPE_syscall32:
-    case CALLBACKTYPE_sysenter:
-        ret = -EINVAL;
-        break;
-
-    case CALLBACKTYPE_nmi:
-        ret = unregister_guest_nmi_callback();
-        break;
-
-    default:
-        ret = -ENOSYS;
-        break;
-    }
-
-    return ret;
-}
-
-
-long compat_callback_op(int cmd, XEN_GUEST_HANDLE(void) arg)
-{
-    long ret;
-
-    switch ( cmd )
-    {
-    case CALLBACKOP_register:
-    {
-        struct compat_callback_register reg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&reg, arg, 1) )
-            break;
-
-        ret = compat_register_guest_callback(&reg);
-    }
-    break;
-
-    case CALLBACKOP_unregister:
-    {
-        struct compat_callback_unregister unreg;
-
-        ret = -EFAULT;
-        if ( copy_from_guest(&unreg, arg, 1) )
-            break;
-
-        ret = compat_unregister_guest_callback(&unreg);
-    }
-    break;
-
-    default:
-        ret = -EINVAL;
-        break;
-    }
-
-    return ret;
-}
-
-long compat_set_callbacks(unsigned long event_selector,
-                          unsigned long event_address,
-                          unsigned long failsafe_selector,
-                          unsigned long failsafe_address)
-{
-    struct compat_callback_register event = {
-        .type = CALLBACKTYPE_event,
-        .address = {
-            .cs = event_selector,
-            .eip = event_address
-        }
-    };
-    struct compat_callback_register failsafe = {
-        .type = CALLBACKTYPE_failsafe,
-        .address = {
-            .cs = failsafe_selector,
-            .eip = failsafe_address
-        }
-    };
-
-    compat_register_guest_callback(&event);
-    compat_register_guest_callback(&failsafe);
-
-    return 0;
-}
-
-int compat_set_trap_table(XEN_GUEST_HANDLE(trap_info_compat_t) traps)
-{
-    struct compat_trap_info cur;
-    struct trap_info *dst = current->arch.pv_vcpu.trap_ctxt;
-    long rc = 0;
-
-    /* If no table is presented then clear the entire virtual IDT. */
-    if ( guest_handle_is_null(traps) )
-    {
-        memset(dst, 0, NR_VECTORS * sizeof(*dst));
-        init_int80_direct_trap(current);
-        return 0;
-    }
-
-    for ( ; ; )
-    {
-        if ( copy_from_guest(&cur, traps, 1) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( cur.address == 0 )
-            break;
-
-        fixup_guest_code_selector(current->domain, cur.cs);
-
-        XLAT_trap_info(dst + cur.vector, &cur);
-
-        if ( cur.vector == 0x80 )
-            init_int80_direct_trap(current);
-
-        guest_handle_add_offset(traps, 1);
-
-        if ( hypercall_preempt_check() )
-        {
-            rc = hypercall_create_continuation(
-                __HYPERVISOR_set_trap_table, "h", traps);
-            break;
-        }
-    }
-
-    return rc;
-}
-
-static void hypercall_page_initialise_ring1_kernel(void *hypercall_page)
-{
-    char *p;
-    int i;
-
-    /* Fill in all the transfer points with template machine code. */
-
-    for ( i = 0; i < (PAGE_SIZE / 32); i++ )
-    {
-        if ( i == __HYPERVISOR_iret )
-            continue;
-
-        p = (char *)(hypercall_page + (i * 32));
-        *(u8  *)(p+ 0) = 0xb8;    /* mov  $<i>,%eax */
-        *(u32 *)(p+ 1) = i;
-        *(u16 *)(p+ 5) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
-        *(u8  *)(p+ 7) = 0xc3;    /* ret */
-    }
-
-    /*
-     * HYPERVISOR_iret is special because it doesn't return and expects a
-     * special stack frame. Guests jump at this transfer point instead of
-     * calling it.
-     */
-    p = (char *)(hypercall_page + (__HYPERVISOR_iret * 32));
-    *(u8  *)(p+ 0) = 0x50;    /* push %eax */
-    *(u8  *)(p+ 1) = 0xb8;    /* mov  $__HYPERVISOR_iret,%eax */
-    *(u32 *)(p+ 2) = __HYPERVISOR_iret;
-    *(u16 *)(p+ 6) = (HYPERCALL_VECTOR << 8) | 0xcd; /* int  $xx */
-}
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * tab-width: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 7a4dd4458e..deca2ca1f6 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -335,8 +335,6 @@ void subarch_percpu_traps_init(void)
     wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
 }
 
-#include "compat/traps.c"
-
 void hypercall_page_initialise(struct domain *d, void *hypercall_page)
 {
     memset(hypercall_page, 0xCC, PAGE_SIZE);
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index dfa60b080c..67e370ebf3 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -30,6 +30,7 @@ int pv_domain_initialise(struct domain *d, unsigned int domcr_flags,
                          struct xen_arch_domainconfig *config);
 
 void hypercall_page_initialise_ring3_kernel(void *hypercall_page);
+void hypercall_page_initialise_ring1_kernel(void *hypercall_page);
 
 #else  /* !CONFIG_PV */
 
@@ -46,6 +47,7 @@ static inline int pv_domain_initialise(struct domain *d,
 }
 
 void hypercall_page_initialise_ring3_kernel(void *hypercall_page) {}
+void hypercall_page_initialise_ring1_kernel(void *hypercall_page) {}
 
 #endif	/* CONFIG_PV */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 19/22] x86: clean up pv/traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (17 preceding siblings ...)
  2017-05-18 17:10 ` [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
@ 2017-05-18 17:10 ` Wei Liu
  2017-05-29 16:18   ` Jan Beulich
  2017-05-18 17:10 ` [PATCH for-next v3 20/22] x86: guest_has_trap_callback should return bool Wei Liu
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Fix coding style issues. No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c | 62 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index db92f6d520..ea5b543247 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -148,6 +148,7 @@ void pv_inject_event(const struct x86_event *event)
     bool use_error_code;
 
     ASSERT(vector == event->vector); /* Confirm no truncation. */
+
     if ( event->type == X86_EVENTTYPE_HW_EXCEPTION )
     {
         ASSERT(vector < 32);
@@ -158,6 +159,7 @@ void pv_inject_event(const struct x86_event *event)
         ASSERT(event->type == X86_EVENTTYPE_SW_INTERRUPT);
         use_error_code = false;
     }
+
     if ( use_error_code )
         ASSERT(error_code != X86_EVENT_NO_EC);
     else
@@ -217,6 +219,7 @@ int set_guest_machinecheck_trapbounce(void)
 
     pv_inject_hw_exception(TRAP_machine_check, X86_EVENT_NO_EC);
     tb->flags &= ~TBF_EXCEPTION; /* not needed for MCE delivery path */
+
     return !null_trap_bounce(v, tb);
 }
 
@@ -228,8 +231,10 @@ int set_guest_nmi_trapbounce(void)
 {
     struct vcpu *v = current;
     struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce;
+
     pv_inject_hw_exception(TRAP_nmi, X86_EVENT_NO_EC);
     tb->flags &= ~TBF_EXCEPTION; /* not needed for NMI delivery path */
+
     return !null_trap_bounce(v, tb);
 }
 
@@ -301,15 +306,17 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
     case TRAP_nmi:
         if ( cmpxchgptr(&st->vcpu, NULL, v) )
             return -EBUSY;
+
         if ( !test_and_set_bool(v->nmi_pending) )
         {
-               st->domain = d;
-               st->processor = v->processor;
+            st->domain = d;
+            st->processor = v->processor;
 
-               /* not safe to wake up a vcpu here */
-               raise_softirq(NMI_MCE_SOFTIRQ);
-               return 0;
+            /* not safe to wake up a vcpu here */
+            raise_softirq(NMI_MCE_SOFTIRQ);
+            return 0;
         }
+
         st->vcpu = NULL;
         break;
 
@@ -318,17 +325,19 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
             return -EBUSY;
 
         /* We are called by the machine check (exception or polling) handlers
-         * on the physical CPU that reported a machine check error. */
+         * on the physical CPU that reported a machine check error.
+         */
 
         if ( !test_and_set_bool(v->mce_pending) )
         {
-                st->domain = d;
-                st->processor = v->processor;
+            st->domain = d;
+            st->processor = v->processor;
 
-                /* not safe to wake up a vcpu here */
-                raise_softirq(NMI_MCE_SOFTIRQ);
-                return 0;
+            /* not safe to wake up a vcpu here */
+            raise_softirq(NMI_MCE_SOFTIRQ);
+            return 0;
         }
+
         st->vcpu = NULL;
         break;
     }
@@ -341,6 +350,7 @@ void toggle_guest_mode(struct vcpu *v)
 {
     if ( is_pv_32bit_vcpu(v) )
         return;
+
     if ( cpu_has_fsgsbase )
     {
         if ( v->arch.flags & TF_kernel_mode )
@@ -348,6 +358,7 @@ void toggle_guest_mode(struct vcpu *v)
         else
             v->arch.pv_vcpu.gs_base_user = __rdgsbase();
     }
+
     v->arch.flags ^= TF_kernel_mode;
     asm volatile ( "swapgs" );
     update_cr3(v);
@@ -362,8 +373,7 @@ void toggle_guest_mode(struct vcpu *v)
         v->arch.pv_vcpu.need_update_runstate_area = 0;
 
     if ( v->arch.pv_vcpu.pending_system_time.version &&
-         update_secondary_system_time(v,
-                                      &v->arch.pv_vcpu.pending_system_time) )
+         update_secondary_system_time(v, &v->arch.pv_vcpu.pending_system_time) )
         v->arch.pv_vcpu.pending_system_time.version = 0;
 }
 
@@ -428,8 +438,8 @@ void init_int80_direct_trap(struct vcpu *v)
     struct trap_info *ti = &v->arch.pv_vcpu.trap_ctxt[0x80];
     struct trap_bounce *tb = &v->arch.pv_vcpu.int80_bounce;
 
-    tb->cs    = ti->cs;
-    tb->eip   = ti->address;
+    tb->cs  = ti->cs;
+    tb->eip = ti->address;
 
     if ( null_trap_bounce(v, tb) )
         tb->flags = 0;
@@ -448,27 +458,31 @@ static long register_guest_callback(struct callback_register *reg)
     switch ( reg->type )
     {
     case CALLBACKTYPE_event:
-        v->arch.pv_vcpu.event_callback_eip    = reg->address;
+        v->arch.pv_vcpu.event_callback_eip = reg->address;
         break;
 
     case CALLBACKTYPE_failsafe:
         v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
+
         if ( reg->flags & CALLBACKF_mask_events )
             set_bit(_VGCF_failsafe_disables_events,
                     &v->arch.vgc_flags);
         else
             clear_bit(_VGCF_failsafe_disables_events,
                       &v->arch.vgc_flags);
+
         break;
 
     case CALLBACKTYPE_syscall:
         v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
+
         if ( reg->flags & CALLBACKF_mask_events )
             set_bit(_VGCF_syscall_disables_events,
                     &v->arch.vgc_flags);
         else
             clear_bit(_VGCF_syscall_disables_events,
                       &v->arch.vgc_flags);
+
         break;
 
     case CALLBACKTYPE_syscall32:
@@ -674,13 +688,16 @@ void compat_show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs,
         printk(" %08x", addr);
         stack++;
     }
+
     if ( mask == PAGE_SIZE )
     {
         BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
         unmap_domain_page(stack);
     }
+
     if ( i == 0 )
         printk("Stack empty.");
+
     printk("\n");
 }
 
@@ -702,7 +719,7 @@ unsigned int compat_iret(void)
 
     /* Restore CS and EIP. */
     if ( unlikely(__get_user(regs->eip, (u32 *)regs->rsp + 1)) ||
-        unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
+         unlikely(__get_user(regs->cs, (u32 *)regs->rsp + 2)) )
     {
         domain_crash(v->domain);
         return 0;
@@ -740,6 +757,7 @@ unsigned int compat_iret(void)
 
         gdprintk(XENLOG_ERR, "VM86 mode unavailable (ksp:%08X->%08X)\n",
                  regs->esp, ksp);
+
         if ( ksp < regs->esp )
         {
             for (i = 1; i < 10; ++i)
@@ -756,24 +774,29 @@ unsigned int compat_iret(void)
                 rc |= __put_user(x, (u32 *)(unsigned long)ksp + i);
             }
         }
+
         if ( rc )
         {
             domain_crash(v->domain);
             return 0;
         }
+
         regs->esp = ksp;
         regs->ss = v->arch.pv_vcpu.kernel_ss;
 
         ti = &v->arch.pv_vcpu.trap_ctxt[TRAP_gp_fault];
         if ( TI_GET_IF(ti) )
             eflags &= ~X86_EFLAGS_IF;
+
         regs->eflags &= ~(X86_EFLAGS_VM|X86_EFLAGS_RF|
                           X86_EFLAGS_NT|X86_EFLAGS_TF);
+
         if ( unlikely(__put_user(0, (u32 *)regs->rsp)) )
         {
             domain_crash(v->domain);
             return 0;
         }
+
         regs->eip = ti->address;
         regs->cs = ti->cs;
     }
@@ -804,8 +827,7 @@ unsigned int compat_iret(void)
     return regs->eax;
 }
 
-static long compat_register_guest_callback(
-    struct compat_callback_register *reg)
+static long compat_register_guest_callback(struct compat_callback_register *reg)
 {
     long ret = 0;
     struct vcpu *v = current;
@@ -822,12 +844,14 @@ static long compat_register_guest_callback(
     case CALLBACKTYPE_failsafe:
         v->arch.pv_vcpu.failsafe_callback_cs  = reg->address.cs;
         v->arch.pv_vcpu.failsafe_callback_eip = reg->address.eip;
+
         if ( reg->flags & CALLBACKF_mask_events )
             set_bit(_VGCF_failsafe_disables_events,
                     &v->arch.vgc_flags);
         else
             clear_bit(_VGCF_failsafe_disables_events,
                       &v->arch.vgc_flags);
+
         break;
 
     case CALLBACKTYPE_syscall32:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 20/22] x86: guest_has_trap_callback should return bool
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (18 preceding siblings ...)
  2017-05-18 17:10 ` [PATCH for-next v3 19/22] x86: clean up pv/traps.c Wei Liu
@ 2017-05-18 17:10 ` Wei Liu
  2017-05-18 17:10 ` [PATCH for-next v3 21/22] x86: fix coding style issues in asm-x86/traps.h Wei Liu
  2017-05-18 17:10 ` [PATCH for-next v3 22/22] x86: clean up traps.c Wei Liu
  21 siblings, 0 replies; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/traps.c     | 4 ++--
 xen/include/asm-x86/traps.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
index ea5b543247..c36b650f5f 100644
--- a/xen/arch/x86/pv/traps.c
+++ b/xen/arch/x86/pv/traps.c
@@ -274,8 +274,8 @@ long unregister_guest_nmi_callback(void)
     return 0;
 }
 
-int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
-                            unsigned int trap_nr)
+bool guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
+                             unsigned int trap_nr)
 {
     struct vcpu *v;
     struct trap_info *t;
diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 4e8760482f..7f36f6c1a7 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -33,10 +33,10 @@ void async_exception_cleanup(struct vcpu *);
 /**
  * guest_has_trap_callback
  *
- * returns true (non-zero) if guest registered a trap handler
+ * returns true if guest registered a trap handler
  */
-extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
-				unsigned int trap_nr);
+extern bool guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
+                                    unsigned int trap_nr);
 
 /**
  * send_guest_trap
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 21/22] x86: fix coding style issues in asm-x86/traps.h
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (19 preceding siblings ...)
  2017-05-18 17:10 ` [PATCH for-next v3 20/22] x86: guest_has_trap_callback should return bool Wei Liu
@ 2017-05-18 17:10 ` Wei Liu
  2017-05-18 17:10 ` [PATCH for-next v3 22/22] x86: clean up traps.c Wei Liu
  21 siblings, 0 replies; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

And provide an Emacs block.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/asm-x86/traps.h | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h
index 7f36f6c1a7..0676e81d1a 100644
--- a/xen/include/asm-x86/traps.h
+++ b/xen/include/asm-x86/traps.h
@@ -20,16 +20,16 @@
 #define ASM_TRAP_H
 
 struct softirq_trap {
-	struct domain *domain;  /* domain to inject trap */
-	struct vcpu *vcpu;	/* vcpu to inject trap */
-	int processor;		/* physical cpu to inject trap */
+    struct domain *domain;  /* domain to inject trap */
+    struct vcpu *vcpu;      /* vcpu to inject trap */
+    int processor;          /* physical cpu to inject trap */
 };
 DECLARE_PER_CPU(struct softirq_trap, softirq_trap);
 
 struct cpu_user_regs;
 
 void async_exception_cleanup(struct vcpu *);
- 
+
 /**
  * guest_has_trap_callback
  *
@@ -45,7 +45,7 @@ extern bool guest_has_trap_callback(struct domain *d, uint16_t vcpuid,
  * return 0 on successful delivery
  */
 extern int send_guest_trap(struct domain *d, uint16_t vcpuid,
-				unsigned int trap_nr);
+                           unsigned int trap_nr);
 
 uint32_t guest_io_read(unsigned int port, unsigned int bytes,
                        struct domain *);
@@ -55,3 +55,13 @@ void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
 const char *trapstr(unsigned int trapnr);
 
 #endif /* ASM_TRAP_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH for-next v3 22/22] x86: clean up traps.c
  2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
                   ` (20 preceding siblings ...)
  2017-05-18 17:10 ` [PATCH for-next v3 21/22] x86: fix coding style issues in asm-x86/traps.h Wei Liu
@ 2017-05-18 17:10 ` Wei Liu
  2017-05-29 16:21   ` Jan Beulich
  21 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

Replace bool_t with bool. Delete trailing white spaces. Fix some coding
style issues.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/traps.c | 77 +++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3cc4b5b78b..c1e28cd926 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1,18 +1,18 @@
 /******************************************************************************
  * arch/x86/traps.c
- * 
+ *
  * Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
- * 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
- * 
+ *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
- * 
+ *
  * You should have received a copy of the GNU General Public License
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
@@ -113,7 +113,7 @@ void (*ioemul_handle_quirk)(
 static int debug_stack_lines = 20;
 integer_param("debug_stack_lines", debug_stack_lines);
 
-static bool_t opt_ler;
+static bool opt_ler;
 boolean_param("ler", opt_ler);
 
 #define stack_words_per_line 4
@@ -528,7 +528,7 @@ void vcpu_show_execution_state(struct vcpu *v)
 }
 
 static cpumask_t show_state_mask;
-static bool_t opt_show_all;
+static bool opt_show_all;
 boolean_param("async-show-all", opt_show_all);
 
 static int nmi_show_execution_state(const struct cpu_user_regs *regs, int cpu)
@@ -539,8 +539,8 @@ static int nmi_show_execution_state(const struct cpu_user_regs *regs, int cpu)
     if ( opt_show_all )
         show_execution_state(regs);
     else
-        printk(XENLOG_ERR "CPU%d @ %04x:%08lx (%pS)\n", cpu, regs->cs, regs->rip,
-               guest_mode(regs) ? _p(regs->rip) : NULL);
+        printk(XENLOG_ERR "CPU%d @ %04x:%08lx (%pS)\n", cpu, regs->cs,
+               regs->rip, guest_mode(regs) ? _p(regs->rip) : NULL);
     cpumask_clear_cpu(cpu, &show_state_mask);
 
     return 1;
@@ -565,7 +565,7 @@ const char *trapstr(unsigned int trapnr)
  * are disabled). In such situations we can't do much that is safe. We try to
  * print out some tracing and then we just spin.
  */
-void fatal_trap(const struct cpu_user_regs *regs, bool_t show_remote)
+void fatal_trap(const struct cpu_user_regs *regs, bool show_remote)
 {
     static DEFINE_PER_CPU(char, depth);
     unsigned int trapnr = regs->entry_vector;
@@ -1018,8 +1018,8 @@ void do_int3(struct cpu_user_regs *regs)
     pv_inject_hw_exception(TRAP_int3, X86_EVENT_NO_EC);
 }
 
-static void reserved_bit_page_fault(
-    unsigned long addr, struct cpu_user_regs *regs)
+static void reserved_bit_page_fault(unsigned long addr,
+                                    struct cpu_user_regs *regs)
 {
     printk("%pv: reserved bit in page table (ec=%04X)\n",
            current, regs->error_code);
@@ -1027,8 +1027,8 @@ static void reserved_bit_page_fault(
     show_execution_state(regs);
 }
 
-static int handle_gdt_ldt_mapping_fault(
-    unsigned long offset, struct cpu_user_regs *regs)
+static int handle_gdt_ldt_mapping_fault(unsigned long offset,
+                                        struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
     /* Which vcpu's area did we fault in, and is it in the ldt sub-area? */
@@ -1096,8 +1096,8 @@ enum pf_type {
     spurious_fault
 };
 
-static enum pf_type __page_fault_type(
-    unsigned long addr, const struct cpu_user_regs *regs)
+static enum pf_type __page_fault_type(unsigned long addr,
+                                      const struct cpu_user_regs *regs)
 {
     unsigned long mfn, cr3 = read_cr3();
     l4_pgentry_t l4e, *l4t;
@@ -1203,8 +1203,8 @@ leaf:
     return spurious_fault;
 }
 
-static enum pf_type spurious_page_fault(
-    unsigned long addr, const struct cpu_user_regs *regs)
+static enum pf_type spurious_page_fault(unsigned long addr,
+                                        const struct cpu_user_regs *regs)
 {
     unsigned long flags;
     enum pf_type pf_type;
@@ -1313,7 +1313,8 @@ void do_page_fault(struct cpu_user_regs *regs)
         if ( (pf_type == smep_fault) || (pf_type == smap_fault) )
         {
             console_start_sync();
-            printk("Xen SM%cP violation\n", (pf_type == smep_fault) ? 'E' : 'A');
+            printk("Xen SM%cP violation\n",
+                   (pf_type == smep_fault) ? 'E' : 'A');
             fatal_trap(regs, 0);
         }
 
@@ -1363,9 +1364,9 @@ void do_page_fault(struct cpu_user_regs *regs)
 
 /*
  * Early #PF handler to print CR2, error code, and stack.
- * 
+ *
  * We also deal with spurious faults here, even though they should never happen
- * during early boot (an issue was seen once, but was most likely a hardware 
+ * during early boot (an issue was seen once, but was most likely a hardware
  * problem).
  */
 void __init do_early_page_fault(struct cpu_user_regs *regs)
@@ -1409,7 +1410,7 @@ void do_general_protection(struct cpu_user_regs *regs)
 
     /*
      * Cunning trick to allow arbitrary "INT n" handling.
-     * 
+     *
      * We set DPL == 0 on all vectors in the IDT. This prevents any INT <n>
      * instruction from trapping to the appropriate vector, when that might not
      * be expected by Xen or the guest OS. For example, that entry might be for
@@ -1417,12 +1418,12 @@ void do_general_protection(struct cpu_user_regs *regs)
      * expect an error code on the stack (which a software trap never
      * provides), or might be a hardware interrupt handler that doesn't like
      * being called spuriously.
-     * 
+     *
      * Instead, a GPF occurs with the faulting IDT vector in the error code.
-     * Bit 1 is set to indicate that an IDT entry caused the fault. Bit 0 is 
+     * Bit 1 is set to indicate that an IDT entry caused the fault. Bit 0 is
      * clear (which got already checked above) to indicate that it's a software
      * fault, not a hardware one.
-     * 
+     *
      * NOTE: Vectors 3 and 4 are dealt with from their own handler. This is
      * okay because they can only be triggered by an explicit DPL-checked
      * instruction. The DPL specified by the guest OS for these vectors is NOT
@@ -1601,7 +1602,8 @@ static void io_check_error(const struct cpu_user_regs *regs)
     outb((inb(0x61) & 0x07) | 0x00, 0x61); /* enable IOCK */
 }
 
-static void unknown_nmi_error(const struct cpu_user_regs *regs, unsigned char reason)
+static void unknown_nmi_error(const struct cpu_user_regs *regs,
+                              unsigned char reason)
 {
     switch ( opt_nmi[0] )
     {
@@ -1621,14 +1623,14 @@ static int dummy_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
     return 0;
 }
- 
+
 static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
     unsigned char reason;
-    bool_t handle_unknown = 0;
+    bool handle_unknown = false;
 
     ++nmi_count(cpu);
 
@@ -1637,7 +1639,7 @@ void do_nmi(const struct cpu_user_regs *regs)
 
     if ( (nmi_watchdog == NMI_NONE) ||
          (!nmi_watchdog_tick(regs) && watchdog_force) )
-        handle_unknown = 1;
+        handle_unknown = true;
 
     /* Only the BSP gets external NMIs from the system. */
     if ( cpu == 0 )
@@ -1757,7 +1759,8 @@ void do_debug(struct cpu_user_regs *regs)
     return;
 }
 
-static void __init noinline __set_intr_gate(unsigned int n, uint32_t dpl, void *addr)
+static void __init noinline __set_intr_gate(unsigned int n,
+                                            uint32_t dpl, void *addr)
 {
     _set_gate(&idt_table[n], SYS_DESC_irq_gate, dpl, addr);
 }
@@ -1944,28 +1947,28 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
 
     switch ( reg )
     {
-    case 0: 
+    case 0:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(0, value);
         break;
-    case 1: 
+    case 1:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(1, value);
         break;
-    case 2: 
+    case 2:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(2, value);
         break;
     case 3:
         if ( !access_ok(value, sizeof(long)) )
             return -EPERM;
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(3, value);
         break;
     case 6:
@@ -1975,7 +1978,7 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
          */
         value &= ~DR_STATUS_RESERVED_ZERO; /* reserved bits => 0 */
         value |=  DR_STATUS_RESERVED_ONE;  /* reserved bits => 1 */
-        if ( v == curr ) 
+        if ( v == curr )
             write_debugreg(6, value);
         break;
     case 7:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-18 17:09 ` [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code Wei Liu
@ 2017-05-18 17:28   ` Wei Liu
  2017-05-29 15:14     ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-18 17:28 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich

I forgot to move gpr_switch.S. Here is an updated version.

---8<---
From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@citrix.com>
Date: Thu, 18 May 2017 16:18:56 +0100
Subject: [PATCH] x86/traps: move privilege instruction emulation code

Move relevant code to pv/emulate.c. Export emulate_privileged_op in
pv/traps.h.

Note that read_descriptor is duplicated in emulate.c. The duplication
will be gone once all emulation code is moved.

Also move gpr_switch.S to pv/ because the code in that file is only
used by privilege instruction emulation.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/Makefile                 |    2 +
 xen/arch/x86/pv/emulate.c                | 1470 ++++++++++++++++++++++++++++++
 xen/arch/x86/{x86_64 => pv}/gpr_switch.S |    0
 xen/arch/x86/traps.c                     | 1358 +--------------------------
 xen/arch/x86/x86_64/Makefile             |    1 -
 xen/include/asm-x86/pv/traps.h           |   48 +
 6 files changed, 1522 insertions(+), 1357 deletions(-)
 create mode 100644 xen/arch/x86/pv/emulate.c
 rename xen/arch/x86/{x86_64 => pv}/gpr_switch.S (100%)
 create mode 100644 xen/include/asm-x86/pv/traps.h

diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 489a9f59cb..f272f607d4 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -3,3 +3,5 @@ obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
 obj-y += domain.o
+obj-y += emulate.o
+obj-bin-y += gpr_switch.o
diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
new file mode 100644
index 0000000000..fb0d066a3b
--- /dev/null
+++ b/xen/arch/x86/pv/emulate.c
@@ -0,0 +1,1470 @@
+/******************************************************************************
+ * arch/x86/pv/emulate.c
+ *
+ * PV emulation code
+ *
+ * Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/errno.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/iocap.h>
+#include <xen/spinlock.h>
+#include <xen/trace.h>
+
+#include <asm/apic.h>
+#include <asm/debugreg.h>
+#include <asm/hpet.h>
+#include <asm/hypercall.h>
+#include <asm/mc146818rtc.h>
+#include <asm/p2m.h>
+#include <asm/pv/traps.h>
+#include <asm/shared.h>
+#include <asm/traps.h>
+#include <asm/x86_emulate.h>
+
+#include <xsm/xsm.h>
+
+#include "../x86_64/mmconfig.h"
+
+/******************
+ * Helper functions
+ */
+
+static int read_descriptor(unsigned int sel,
+                           const struct vcpu *v,
+                           unsigned long *base,
+                           unsigned long *limit,
+                           unsigned int *ar,
+                           bool_t insn_fetch)
+{
+    struct desc_struct desc;
+
+    if ( sel < 4)
+        desc.b = desc.a = 0;
+    else if ( __get_user(desc,
+                         (const struct desc_struct *)(!(sel & 4)
+                                                      ? GDT_VIRT_START(v)
+                                                      : LDT_VIRT_START(v))
+                         + (sel >> 3)) )
+        return 0;
+    if ( !insn_fetch )
+        desc.b &= ~_SEGMENT_L;
+
+    *ar = desc.b & 0x00f0ff00;
+    if ( !(desc.b & _SEGMENT_L) )
+    {
+        *base = ((desc.a >> 16) + ((desc.b & 0xff) << 16) +
+                 (desc.b & 0xff000000));
+        *limit = (desc.a & 0xffff) | (desc.b & 0x000f0000);
+        if ( desc.b & _SEGMENT_G )
+            *limit = ((*limit + 1) << 12) - 1;
+#ifndef NDEBUG
+        if ( sel > 3 )
+        {
+            unsigned int a, l;
+            unsigned char valid;
+
+            asm volatile (
+                "larl %2,%0 ; setz %1"
+                : "=r" (a), "=qm" (valid) : "rm" (sel));
+            BUG_ON(valid && ((a & 0x00f0ff00) != *ar));
+            asm volatile (
+                "lsll %2,%0 ; setz %1"
+                : "=r" (l), "=qm" (valid) : "rm" (sel));
+            BUG_ON(valid && (l != *limit));
+        }
+#endif
+    }
+    else
+    {
+        *base = 0UL;
+        *limit = ~0UL;
+    }
+
+    return 1;
+}
+
+/***********************
+ * I/O emulation support
+ */
+
+struct priv_op_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    struct {
+        unsigned long base, limit;
+    } cs;
+    char *io_emul_stub;
+    unsigned int bpmatch;
+    unsigned int tsc;
+#define TSC_BASE 1
+#define TSC_AUX 2
+};
+
+/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
+void host_to_guest_gpr_switch(struct cpu_user_regs *);
+unsigned long guest_to_host_gpr_switch(unsigned long);
+
+void (*pv_post_outb_hook)(unsigned int port, u8 value);
+
+typedef void io_emul_stub_t(struct cpu_user_regs *);
+
+static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
+                                          unsigned int port, unsigned int bytes)
+{
+    if ( !ctxt->io_emul_stub )
+        ctxt->io_emul_stub = map_domain_page(_mfn(this_cpu(stubs.mfn))) +
+                                             (this_cpu(stubs.addr) &
+                                              ~PAGE_MASK) +
+                                             STUB_BUF_SIZE / 2;
+
+    /* movq $host_to_guest_gpr_switch,%rcx */
+    ctxt->io_emul_stub[0] = 0x48;
+    ctxt->io_emul_stub[1] = 0xb9;
+    *(void **)&ctxt->io_emul_stub[2] = (void *)host_to_guest_gpr_switch;
+    /* callq *%rcx */
+    ctxt->io_emul_stub[10] = 0xff;
+    ctxt->io_emul_stub[11] = 0xd1;
+    /* data16 or nop */
+    ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
+    /* <io-access opcode> */
+    ctxt->io_emul_stub[13] = opcode;
+    /* imm8 or nop */
+    ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
+    /* ret (jumps to guest_to_host_gpr_switch) */
+    ctxt->io_emul_stub[15] = 0xc3;
+    BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
+
+    if ( ioemul_handle_quirk )
+        ioemul_handle_quirk(opcode, &ctxt->io_emul_stub[12], ctxt->ctxt.regs);
+
+    /* Handy function-typed pointer to the stub. */
+    return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
+}
+
+
+/* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
+static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
+{
+    unsigned int cpl = guest_kernel_mode(v, regs) ?
+        (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
+
+    ASSERT((v->arch.pv_vcpu.iopl & ~X86_EFLAGS_IOPL) == 0);
+
+    return IOPL(cpl) <= v->arch.pv_vcpu.iopl;
+}
+
+/* Has the guest requested sufficient permission for this I/O access? */
+static int guest_io_okay(
+    unsigned int port, unsigned int bytes,
+    struct vcpu *v, struct cpu_user_regs *regs)
+{
+    /* If in user mode, switch to kernel mode just to read I/O bitmap. */
+    int user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+    if ( iopl_ok(v, regs) )
+        return 1;
+
+    if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
+    {
+        union { uint8_t bytes[2]; uint16_t mask; } x;
+
+        /*
+         * Grab permission bytes from guest space. Inaccessible bytes are
+         * read as 0xff (no access allowed).
+         */
+        TOGGLE_MODE();
+        switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
+                                          port>>3, 2) )
+        {
+        default: x.bytes[0] = ~0;
+            /* fallthrough */
+        case 1:  x.bytes[1] = ~0;
+            /* fallthrough */
+        case 0:  break;
+        }
+        TOGGLE_MODE();
+
+        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
+            return 1;
+    }
+
+    return 0;
+}
+
+/* Has the administrator granted sufficient permission for this I/O access? */
+static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
+                            const struct domain *d)
+{
+    /*
+     * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
+     * We never permit direct access to that register.
+     */
+    if ( (port == 0xcf8) && (bytes == 4) )
+        return 0;
+
+    /* We also never permit direct access to the RTC/CMOS registers. */
+    if ( ((port & ~1) == RTC_PORT(0)) )
+        return 0;
+
+    return ioports_access_permitted(d, port, port + bytes - 1);
+}
+
+static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
+                         unsigned int size, uint32_t *write)
+{
+    uint32_t machine_bdf;
+
+    if ( !is_hardware_domain(currd) )
+        return 0;
+
+    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
+        return 1;
+
+    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
+    if ( write )
+    {
+        const unsigned long *ro_map = pci_get_ro_map(0);
+
+        if ( ro_map && test_bit(machine_bdf, ro_map) )
+            return 0;
+    }
+    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
+    /* AMD extended configuration space access? */
+    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
+         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
+    {
+        uint64_t msr_val;
+
+        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
+            return 0;
+        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
+            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
+    }
+
+    return !write ?
+           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
+                                     start, start + size - 1, 0) == 0 :
+           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
+}
+
+uint32_t guest_io_read(unsigned int port, unsigned int bytes,
+                       struct domain *currd)
+{
+    uint32_t data = 0;
+    unsigned int shift = 0;
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        switch ( bytes )
+        {
+        case 1: return inb(port);
+        case 2: return inw(port);
+        case 4: return inl(port);
+        }
+    }
+
+    while ( bytes != 0 )
+    {
+        unsigned int size = 1;
+        uint32_t sub_data = ~0;
+
+        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
+        {
+            sub_data = pv_pit_handler(port, 0, 0);
+        }
+        else if ( port == RTC_PORT(0) )
+        {
+            sub_data = currd->arch.cmos_idx;
+        }
+        else if ( (port == RTC_PORT(1)) &&
+                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
+        {
+            unsigned long flags;
+
+            spin_lock_irqsave(&rtc_lock, flags);
+            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+            sub_data = inb(RTC_PORT(1));
+            spin_unlock_irqrestore(&rtc_lock, flags);
+        }
+        else if ( (port == 0xcf8) && (bytes == 4) )
+        {
+            size = 4;
+            sub_data = currd->arch.pci_cf8;
+        }
+        else if ( (port & 0xfffc) == 0xcfc )
+        {
+            size = min(bytes, 4 - (port & 3));
+            if ( size == 3 )
+                size = 2;
+            if ( pci_cfg_ok(currd, port & 3, size, NULL) )
+                sub_data = pci_conf_read(currd->arch.pci_cf8, port & 3, size);
+        }
+
+        if ( size == 4 )
+            return sub_data;
+
+        data |= (sub_data & ((1u << (size * 8)) - 1)) << shift;
+        shift += size * 8;
+        port += size;
+        bytes -= size;
+    }
+
+    return data;
+}
+
+static unsigned int check_guest_io_breakpoint(struct vcpu *v,
+    unsigned int port, unsigned int len)
+{
+    unsigned int width, i, match = 0;
+    unsigned long start;
+
+    if ( !(v->arch.debugreg[5]) ||
+         !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
+        return 0;
+
+    for ( i = 0; i < 4; i++ )
+    {
+        if ( !(v->arch.debugreg[5] &
+               (3 << (i * DR_ENABLE_SIZE))) )
+            continue;
+
+        start = v->arch.debugreg[i];
+        width = 0;
+
+        switch ( (v->arch.debugreg[7] >>
+                  (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
+        {
+        case DR_LEN_1: width = 1; break;
+        case DR_LEN_2: width = 2; break;
+        case DR_LEN_4: width = 4; break;
+        case DR_LEN_8: width = 8; break;
+        }
+
+        if ( (start < (port + len)) && ((start + width) > port) )
+            match |= 1 << i;
+    }
+
+    return match;
+}
+
+static int priv_op_read_io(unsigned int port, unsigned int bytes,
+                           unsigned long *val, struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+
+    /* INS must not come here. */
+    ASSERT((ctxt->opcode & ~9) == 0xe4);
+
+    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        io_emul_stub_t *io_emul =
+            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
+
+        mark_regs_dirty(ctxt->regs);
+        io_emul(ctxt->regs);
+        return X86EMUL_DONE;
+    }
+
+    *val = guest_io_read(port, bytes, currd);
+
+    return X86EMUL_OKAY;
+}
+
+void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
+                    struct domain *currd)
+{
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        switch ( bytes ) {
+        case 1:
+            outb((uint8_t)data, port);
+            if ( pv_post_outb_hook )
+                pv_post_outb_hook(port, (uint8_t)data);
+            break;
+        case 2:
+            outw((uint16_t)data, port);
+            break;
+        case 4:
+            outl(data, port);
+            break;
+        }
+        return;
+    }
+
+    while ( bytes != 0 )
+    {
+        unsigned int size = 1;
+
+        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
+        {
+            pv_pit_handler(port, (uint8_t)data, 1);
+        }
+        else if ( port == RTC_PORT(0) )
+        {
+            currd->arch.cmos_idx = data;
+        }
+        else if ( (port == RTC_PORT(1)) &&
+                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
+        {
+            unsigned long flags;
+
+            if ( pv_rtc_handler )
+                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
+            spin_lock_irqsave(&rtc_lock, flags);
+            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+            outb(data, RTC_PORT(1));
+            spin_unlock_irqrestore(&rtc_lock, flags);
+        }
+        else if ( (port == 0xcf8) && (bytes == 4) )
+        {
+            size = 4;
+            currd->arch.pci_cf8 = data;
+        }
+        else if ( (port & 0xfffc) == 0xcfc )
+        {
+            size = min(bytes, 4 - (port & 3));
+            if ( size == 3 )
+                size = 2;
+            if ( pci_cfg_ok(currd, port & 3, size, &data) )
+                pci_conf_write(currd->arch.pci_cf8, port & 3, size, data);
+        }
+
+        if ( size == 4 )
+            return;
+
+        port += size;
+        bytes -= size;
+        data >>= size * 8;
+    }
+}
+
+static int priv_op_write_io(unsigned int port, unsigned int bytes,
+                            unsigned long val, struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+
+    /* OUTS must not come here. */
+    ASSERT((ctxt->opcode & ~9) == 0xe6);
+
+    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+
+    if ( admin_io_okay(port, bytes, currd) )
+    {
+        io_emul_stub_t *io_emul =
+            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
+
+        mark_regs_dirty(ctxt->regs);
+        io_emul(ctxt->regs);
+        if ( (bytes == 1) && pv_post_outb_hook )
+            pv_post_outb_hook(port, val);
+        return X86EMUL_DONE;
+    }
+
+    guest_io_write(port, bytes, val, currd);
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_read_segment(enum x86_segment seg,
+                                struct segment_register *reg,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    /* Check if this is an attempt to access the I/O bitmap. */
+    if ( seg == x86_seg_tr )
+    {
+        switch ( ctxt->opcode )
+        {
+        case 0x6c ... 0x6f: /* ins / outs */
+        case 0xe4 ... 0xe7: /* in / out (immediate port) */
+        case 0xec ... 0xef: /* in / out (port in %dx) */
+            /* Defer the check to priv_op_{read,write}_io(). */
+            return X86EMUL_DONE;
+        }
+    }
+
+    if ( ctxt->addr_size < 64 )
+    {
+        unsigned long limit;
+        unsigned int sel, ar;
+
+        switch ( seg )
+        {
+        case x86_seg_cs: sel = ctxt->regs->cs; break;
+        case x86_seg_ds: sel = read_sreg(ds);  break;
+        case x86_seg_es: sel = read_sreg(es);  break;
+        case x86_seg_fs: sel = read_sreg(fs);  break;
+        case x86_seg_gs: sel = read_sreg(gs);  break;
+        case x86_seg_ss: sel = ctxt->regs->ss; break;
+        default: return X86EMUL_UNHANDLEABLE;
+        }
+
+        if ( !read_descriptor(sel, current, &reg->base, &limit, &ar, 0) )
+            return X86EMUL_UNHANDLEABLE;
+
+        reg->limit = limit;
+        reg->attr.bytes = ar >> 8;
+    }
+    else
+    {
+        switch ( seg )
+        {
+        default:
+            if ( !is_x86_user_segment(seg) )
+                return X86EMUL_UNHANDLEABLE;
+            reg->base = 0;
+            break;
+        case x86_seg_fs:
+            reg->base = rdfsbase();
+            break;
+        case x86_seg_gs:
+            reg->base = rdgsbase();
+            break;
+        }
+
+        reg->limit = ~0U;
+
+        reg->attr.bytes = 0;
+        reg->attr.fields.type = _SEGMENT_WR >> 8;
+        if ( seg == x86_seg_cs )
+        {
+            reg->attr.fields.type |= _SEGMENT_CODE >> 8;
+            reg->attr.fields.l = 1;
+        }
+        else
+            reg->attr.fields.db = 1;
+        reg->attr.fields.s   = 1;
+        reg->attr.fields.dpl = 3;
+        reg->attr.fields.p   = 1;
+        reg->attr.fields.g   = 1;
+    }
+
+    /*
+     * For x86_emulate.c's mode_ring0() to work, fake a DPL of zero.
+     * Also do this for consistency for non-conforming code segments.
+     */
+    if ( (seg == x86_seg_ss ||
+          (seg == x86_seg_cs &&
+           !(reg->attr.fields.type & (_SEGMENT_EC >> 8)))) &&
+         guest_kernel_mode(current, ctxt->regs) )
+        reg->attr.fields.dpl = 0;
+
+    return X86EMUL_OKAY;
+}
+
+static int pv_emul_virt_to_linear(unsigned long base, unsigned long offset,
+                                  unsigned int bytes, unsigned long limit,
+                                  enum x86_segment seg,
+                                  struct x86_emulate_ctxt *ctxt,
+                                  unsigned long *addr)
+{
+    int rc = X86EMUL_OKAY;
+
+    *addr = base + offset;
+
+    if ( ctxt->addr_size < 64 )
+    {
+        if ( limit < bytes - 1 || offset > limit - bytes + 1 )
+            rc = X86EMUL_EXCEPTION;
+        *addr = (uint32_t)*addr;
+    }
+    else if ( !__addr_ok(*addr) )
+        rc = X86EMUL_EXCEPTION;
+
+    if ( unlikely(rc == X86EMUL_EXCEPTION) )
+        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
+                                                : TRAP_stack_error,
+                              0, ctxt);
+
+    return rc;
+}
+
+static int priv_op_rep_ins(uint16_t port,
+                           enum x86_segment seg, unsigned long offset,
+                           unsigned int bytes_per_rep, unsigned long *reps,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+    unsigned long goal = *reps;
+    struct segment_register sreg;
+    int rc;
+
+    ASSERT(seg == x86_seg_es);
+
+    *reps = 0;
+
+    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = priv_op_read_segment(x86_seg_es, &sreg, ctxt);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( !sreg.attr.fields.p )
+        return X86EMUL_UNHANDLEABLE;
+    if ( !sreg.attr.fields.s ||
+         (sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) ||
+         !(sreg.attr.fields.type & (_SEGMENT_WR >> 8)) )
+    {
+        x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+
+    while ( *reps < goal )
+    {
+        unsigned int data = guest_io_read(port, bytes_per_rep, currd);
+        unsigned long addr;
+
+        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
+                                    sreg.limit, x86_seg_es, ctxt, &addr);
+        if ( rc != X86EMUL_OKAY )
+            return rc;
+
+        if ( (rc = __copy_to_user((void *)addr, &data, bytes_per_rep)) != 0 )
+        {
+            x86_emul_pagefault(PFEC_write_access,
+                               addr + bytes_per_rep - rc, ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+
+        ++*reps;
+
+        if ( poc->bpmatch || hypercall_preempt_check() )
+            break;
+
+        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
+        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
+            offset -= bytes_per_rep;
+        else
+            offset += bytes_per_rep;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_rep_outs(enum x86_segment seg, unsigned long offset,
+                            uint16_t port,
+                            unsigned int bytes_per_rep, unsigned long *reps,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    struct vcpu *curr = current;
+    struct domain *currd = current->domain;
+    unsigned long goal = *reps;
+    struct segment_register sreg;
+    int rc;
+
+    *reps = 0;
+
+    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = priv_op_read_segment(seg, &sreg, ctxt);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( !sreg.attr.fields.p )
+        return X86EMUL_UNHANDLEABLE;
+    if ( !sreg.attr.fields.s ||
+         ((sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) &&
+          !(sreg.attr.fields.type & (_SEGMENT_WR >> 8))) )
+    {
+        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
+                                                : TRAP_stack_error,
+                              0, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+
+    while ( *reps < goal )
+    {
+        unsigned int data = 0;
+        unsigned long addr;
+
+        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
+                                    sreg.limit, seg, ctxt, &addr);
+        if ( rc != X86EMUL_OKAY )
+            return rc;
+
+        if ( (rc = __copy_from_user(&data, (void *)addr, bytes_per_rep)) != 0 )
+        {
+            x86_emul_pagefault(0, addr + bytes_per_rep - rc, ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+
+        guest_io_write(port, bytes_per_rep, data, currd);
+
+        ++*reps;
+
+        if ( poc->bpmatch || hypercall_preempt_check() )
+            break;
+
+        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
+        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
+            offset -= bytes_per_rep;
+        else
+            offset += bytes_per_rep;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_read_cr(unsigned int reg, unsigned long *val,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    const struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0: /* Read CR0 */
+        *val = (read_cr0() & ~X86_CR0_TS) | curr->arch.pv_vcpu.ctrlreg[0];
+        return X86EMUL_OKAY;
+
+    case 2: /* Read CR2 */
+    case 4: /* Read CR4 */
+        *val = curr->arch.pv_vcpu.ctrlreg[reg];
+        return X86EMUL_OKAY;
+
+    case 3: /* Read CR3 */
+    {
+        const struct domain *currd = curr->domain;
+        unsigned long mfn;
+
+        if ( !is_pv_32bit_domain(currd) )
+        {
+            mfn = pagetable_get_pfn(curr->arch.guest_table);
+            *val = xen_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
+        }
+        else
+        {
+            l4_pgentry_t *pl4e =
+                map_domain_page(_mfn(pagetable_get_pfn(curr->arch.guest_table)));
+
+            mfn = l4e_get_pfn(*pl4e);
+            unmap_domain_page(pl4e);
+            *val = compat_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
+        }
+        /* PTs should not be shared */
+        BUG_ON(page_get_owner(mfn_to_page(mfn)) == dom_cow);
+        return X86EMUL_OKAY;
+    }
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_write_cr(unsigned int reg, unsigned long val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+
+    switch ( reg )
+    {
+    case 0: /* Write CR0 */
+        if ( (val ^ read_cr0()) & ~X86_CR0_TS )
+        {
+            gdprintk(XENLOG_WARNING,
+                    "Attempt to change unmodifiable CR0 flags\n");
+            break;
+        }
+        do_fpu_taskswitch(!!(val & X86_CR0_TS));
+        return X86EMUL_OKAY;
+
+    case 2: /* Write CR2 */
+        curr->arch.pv_vcpu.ctrlreg[2] = val;
+        arch_set_cr2(curr, val);
+        return X86EMUL_OKAY;
+
+    case 3: /* Write CR3 */
+    {
+        struct domain *currd = curr->domain;
+        unsigned long gfn;
+        struct page_info *page;
+        int rc;
+
+        gfn = !is_pv_32bit_domain(currd)
+              ? xen_cr3_to_pfn(val) : compat_cr3_to_pfn(val);
+        page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
+        if ( !page )
+            break;
+        rc = new_guest_cr3(page_to_mfn(page));
+        put_page(page);
+
+        switch ( rc )
+        {
+        case 0:
+            return X86EMUL_OKAY;
+        case -ERESTART: /* retry after preemption */
+            return X86EMUL_RETRY;
+        }
+        break;
+    }
+
+    case 4: /* Write CR4 */
+        curr->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(curr, val);
+        write_cr4(pv_guest_cr4_to_real_cr4(curr));
+        ctxt_switch_levelling(curr);
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_read_dr(unsigned int reg, unsigned long *val,
+                           struct x86_emulate_ctxt *ctxt)
+{
+    unsigned long res = do_get_debugreg(reg);
+
+    if ( IS_ERR_VALUE(res) )
+        return X86EMUL_UNHANDLEABLE;
+
+    *val = res;
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_write_dr(unsigned int reg, unsigned long val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    return do_set_debugreg(reg, val) == 0
+           ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
+}
+
+static inline uint64_t guest_misc_enable(uint64_t val)
+{
+    val &= ~(MSR_IA32_MISC_ENABLE_PERF_AVAIL |
+             MSR_IA32_MISC_ENABLE_MONITOR_ENABLE);
+    val |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
+           MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
+           MSR_IA32_MISC_ENABLE_XTPR_DISABLE;
+    return val;
+}
+
+static inline bool is_cpufreq_controller(const struct domain *d)
+{
+    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
+            is_hardware_domain(d));
+}
+
+static int priv_op_read_msr(unsigned int reg, uint64_t *val,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
+    const struct vcpu *curr = current;
+    const struct domain *currd = curr->domain;
+    bool vpmu_msr = false;
+
+    switch ( reg )
+    {
+        int rc;
+
+    case MSR_FS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = cpu_has_fsgsbase ? __rdfsbase() : curr->arch.pv_vcpu.fs_base;
+        return X86EMUL_OKAY;
+
+    case MSR_GS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = cpu_has_fsgsbase ? __rdgsbase()
+                                : curr->arch.pv_vcpu.gs_base_kernel;
+        return X86EMUL_OKAY;
+
+    case MSR_SHADOW_GS_BASE:
+        if ( is_pv_32bit_domain(currd) )
+            break;
+        *val = curr->arch.pv_vcpu.gs_base_user;
+        return X86EMUL_OKAY;
+
+    /*
+     * In order to fully retain original behavior, defer calling
+     * pv_soft_rdtsc() until after emulation. This may want/need to be
+     * reconsidered.
+     */
+    case MSR_IA32_TSC:
+        poc->tsc |= TSC_BASE;
+        goto normal;
+
+    case MSR_TSC_AUX:
+        poc->tsc |= TSC_AUX;
+        if ( cpu_has_rdtscp )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_EFER:
+        *val = read_efer();
+        if ( is_pv_32bit_domain(currd) )
+            *val &= ~(EFER_LME | EFER_LMA | EFER_LMSLE);
+        return X86EMUL_OKAY;
+
+    case MSR_K7_FID_VID_CTL:
+    case MSR_K7_FID_VID_STATUS:
+    case MSR_K8_PSTATE_LIMIT:
+    case MSR_K8_PSTATE_CTRL:
+    case MSR_K8_PSTATE_STATUS:
+    case MSR_K8_PSTATE0:
+    case MSR_K8_PSTATE1:
+    case MSR_K8_PSTATE2:
+    case MSR_K8_PSTATE3:
+    case MSR_K8_PSTATE4:
+    case MSR_K8_PSTATE5:
+    case MSR_K8_PSTATE6:
+    case MSR_K8_PSTATE7:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+            break;
+        if ( unlikely(is_cpufreq_controller(currd)) )
+            goto normal;
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_UCODE_REV:
+        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
+                break;
+            /* As documented in the SDM: Do a CPUID 1 here */
+            cpuid_eax(1);
+        }
+        goto normal;
+
+    case MSR_IA32_MISC_ENABLE:
+        if ( rdmsr_safe(reg, *val) )
+            break;
+        *val = guest_misc_enable(*val);
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR0_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
+            break;
+        *val = curr->arch.pv_vcpu.dr_mask[0];
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
+            break;
+        *val = curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1];
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_PERF_CAPABILITIES:
+        /* No extra capabilities are supported. */
+        *val = 0;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_PLATFORM_INFO:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
+            break;
+        *val = 0;
+        if ( this_cpu(cpuid_faulting_enabled) )
+            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
+            break;
+        *val = 0;
+        if ( curr->arch.cpuid_faulting )
+            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
+        return X86EMUL_OKAY;
+
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            vpmu_msr = true;
+            /* fall through */
+    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( vpmu_do_rdmsr(reg, val) )
+                    break;
+                return X86EMUL_OKAY;
+            }
+        }
+        /* fall through */
+    default:
+        if ( rdmsr_hypervisor_regs(reg, val) )
+            return X86EMUL_OKAY;
+
+        rc = vmce_rdmsr(reg, val);
+        if ( rc < 0 )
+            break;
+        if ( rc )
+            return X86EMUL_OKAY;
+        /* fall through */
+    normal:
+        /* Everyone can read the MSR space. */
+        /* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
+        if ( rdmsr_safe(reg, *val) )
+            break;
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_write_msr(unsigned int reg, uint64_t val,
+                             struct x86_emulate_ctxt *ctxt)
+{
+    struct vcpu *curr = current;
+    const struct domain *currd = curr->domain;
+    bool vpmu_msr = false;
+
+    switch ( reg )
+    {
+        uint64_t temp;
+        int rc;
+
+    case MSR_FS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrfsbase(val);
+        curr->arch.pv_vcpu.fs_base = val;
+        return X86EMUL_OKAY;
+
+    case MSR_GS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrgsbase(val);
+        curr->arch.pv_vcpu.gs_base_kernel = val;
+        return X86EMUL_OKAY;
+
+    case MSR_SHADOW_GS_BASE:
+        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
+            break;
+        wrmsrl(MSR_SHADOW_GS_BASE, val);
+        curr->arch.pv_vcpu.gs_base_user = val;
+        return X86EMUL_OKAY;
+
+    case MSR_K7_FID_VID_STATUS:
+    case MSR_K7_FID_VID_CTL:
+    case MSR_K8_PSTATE_LIMIT:
+    case MSR_K8_PSTATE_CTRL:
+    case MSR_K8_PSTATE_STATUS:
+    case MSR_K8_PSTATE0:
+    case MSR_K8_PSTATE1:
+    case MSR_K8_PSTATE2:
+    case MSR_K8_PSTATE3:
+    case MSR_K8_PSTATE4:
+    case MSR_K8_PSTATE5:
+    case MSR_K8_PSTATE6:
+    case MSR_K8_PSTATE7:
+    case MSR_K8_HWCR:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_AMD64_NB_CFG:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
+             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
+             ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
+            goto invalid;
+        if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_FAM10H_MMIO_CONF_BASE:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
+             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( rdmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, temp) != 0 )
+            break;
+        if ( (pci_probe & PCI_PROBE_MASK) == PCI_PROBE_MMCONF ?
+             temp != val :
+             ((temp ^ val) &
+              ~(FAM10H_MMIO_CONF_ENABLE |
+                (FAM10H_MMIO_CONF_BUSRANGE_MASK <<
+                 FAM10H_MMIO_CONF_BUSRANGE_SHIFT) |
+                ((u64)FAM10H_MMIO_CONF_BASE_MASK <<
+                 FAM10H_MMIO_CONF_BASE_SHIFT))) )
+            goto invalid;
+        if ( wrmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_UCODE_REV:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
+            return X86EMUL_OKAY;
+        if ( rdmsr_safe(reg, temp) )
+            break;
+        if ( val )
+            goto invalid;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_MISC_ENABLE:
+        if ( rdmsr_safe(reg, temp) )
+            break;
+        if ( val != guest_misc_enable(temp) )
+            goto invalid;
+        return X86EMUL_OKAY;
+
+    case MSR_IA32_MPERF:
+    case MSR_IA32_APERF:
+        if ( (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) &&
+             (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_PERF_CTL:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( likely(!is_cpufreq_controller(currd)) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_IA32_THERM_CONTROL:
+    case MSR_IA32_ENERGY_PERF_BIAS:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            break;
+        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) ||
+             wrmsr_safe(reg, val) == 0 )
+            return X86EMUL_OKAY;
+        break;
+
+    case MSR_AMD64_DR0_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
+            break;
+        curr->arch.pv_vcpu.dr_mask[0] = val;
+        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
+            wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, val);
+        return X86EMUL_OKAY;
+
+    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
+        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
+            break;
+        curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1] = val;
+        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
+            wrmsrl(reg, val);
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_PLATFORM_INFO:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
+            break;
+        return X86EMUL_OKAY;
+
+    case MSR_INTEL_MISC_FEATURES_ENABLES:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
+             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
+            break;
+        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
+             !this_cpu(cpuid_faulting_enabled) )
+            break;
+        curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
+        return X86EMUL_OKAY;
+
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+        {
+            vpmu_msr = true;
+    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                     !is_hardware_domain(currd) )
+                    return X86EMUL_OKAY;
+
+                if ( vpmu_do_wrmsr(reg, val, 0) )
+                    break;
+                return X86EMUL_OKAY;
+            }
+        }
+        /* fall through */
+    default:
+        if ( wrmsr_hypervisor_regs(reg, val) == 1 )
+            return X86EMUL_OKAY;
+
+        rc = vmce_wrmsr(reg, val);
+        if ( rc < 0 )
+            break;
+        if ( rc )
+            return X86EMUL_OKAY;
+
+        if ( (rdmsr_safe(reg, temp) != 0) || (val != temp) )
+    invalid:
+            gdprintk(XENLOG_WARNING,
+                     "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 0x%016"PRIx64"\n",
+                     reg, temp, val);
+        return X86EMUL_OKAY;
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_wbinvd(struct x86_emulate_ctxt *ctxt)
+{
+    /* Ignore the instruction if unprivileged. */
+    if ( !cache_flush_permitted(current->domain) )
+        /*
+         * Non-physdev domain attempted WBINVD; ignore for now since
+         * newer linux uses this in some start-of-day timing loops.
+         */
+        ;
+    else
+        wbinvd();
+
+    return X86EMUL_OKAY;
+}
+
+int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
+                  struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt)
+{
+    guest_cpuid(current, leaf, subleaf, res);
+
+    return X86EMUL_OKAY;
+}
+
+static int priv_op_validate(const struct x86_emulate_state *state,
+                            struct x86_emulate_ctxt *ctxt)
+{
+    switch ( ctxt->opcode )
+    {
+    case 0x6c ... 0x6f: /* ins / outs */
+    case 0xe4 ... 0xe7: /* in / out (immediate port) */
+    case 0xec ... 0xef: /* in / out (port in %dx) */
+    case X86EMUL_OPC(0x0f, 0x06): /* clts */
+    case X86EMUL_OPC(0x0f, 0x09): /* wbinvd */
+    case X86EMUL_OPC(0x0f, 0x20) ...
+         X86EMUL_OPC(0x0f, 0x23): /* mov to/from cr/dr */
+    case X86EMUL_OPC(0x0f, 0x30): /* wrmsr */
+    case X86EMUL_OPC(0x0f, 0x31): /* rdtsc */
+    case X86EMUL_OPC(0x0f, 0x32): /* rdmsr */
+    case X86EMUL_OPC(0x0f, 0xa2): /* cpuid */
+        return X86EMUL_OKAY;
+
+    case 0xfa: case 0xfb: /* cli / sti */
+        if ( !iopl_ok(current, ctxt->regs) )
+            break;
+        /*
+         * This is just too dangerous to allow, in my opinion. Consider if the
+         * caller then tries to reenable interrupts using POPF: we can't trap
+         * that and we'll end up with hard-to-debug lockups. Fast & loose will
+         * do for us. :-)
+        vcpu_info(current, evtchn_upcall_mask) = (ctxt->opcode == 0xfa);
+         */
+        return X86EMUL_DONE;
+
+    case X86EMUL_OPC(0x0f, 0x01):
+    {
+        unsigned int modrm_rm, modrm_reg;
+
+        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
+             (modrm_rm & 7) != 1 )
+            break;
+        switch ( modrm_reg & 7 )
+        {
+        case 2: /* xsetbv */
+        case 7: /* rdtscp */
+            return X86EMUL_OKAY;
+        }
+        break;
+    }
+    }
+
+    return X86EMUL_UNHANDLEABLE;
+}
+
+static int priv_op_insn_fetch(enum x86_segment seg,
+                              unsigned long offset,
+                              void *p_data,
+                              unsigned int bytes,
+                              struct x86_emulate_ctxt *ctxt)
+{
+    const struct priv_op_ctxt *poc =
+        container_of(ctxt, struct priv_op_ctxt, ctxt);
+    unsigned int rc;
+    unsigned long addr = poc->cs.base + offset;
+
+    ASSERT(seg == x86_seg_cs);
+
+    /* We don't mean to emulate any branches. */
+    if ( !bytes )
+        return X86EMUL_UNHANDLEABLE;
+
+    rc = pv_emul_virt_to_linear(poc->cs.base, offset, bytes, poc->cs.limit,
+                                x86_seg_cs, ctxt, &addr);
+    if ( rc != X86EMUL_OKAY )
+        return rc;
+
+    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) != 0 )
+    {
+        /*
+         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
+         * cpu_has_nx, but we'd then need a "fetch" variant of
+         * __copy_from_user() respecting NX, SMEP, and protection keys.
+         */
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+
+static const struct x86_emulate_ops priv_op_ops = {
+    .insn_fetch          = priv_op_insn_fetch,
+    .read                = x86emul_unhandleable_rw,
+    .validate            = priv_op_validate,
+    .read_io             = priv_op_read_io,
+    .write_io            = priv_op_write_io,
+    .rep_ins             = priv_op_rep_ins,
+    .rep_outs            = priv_op_rep_outs,
+    .read_segment        = priv_op_read_segment,
+    .read_cr             = priv_op_read_cr,
+    .write_cr            = priv_op_write_cr,
+    .read_dr             = priv_op_read_dr,
+    .write_dr            = priv_op_write_dr,
+    .read_msr            = priv_op_read_msr,
+    .write_msr           = priv_op_write_msr,
+    .cpuid               = pv_emul_cpuid,
+    .wbinvd              = priv_op_wbinvd,
+};
+
+int emulate_privileged_op(struct cpu_user_regs *regs)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct priv_op_ctxt ctxt = {
+        .ctxt.regs = regs,
+        .ctxt.vendor = currd->arch.cpuid->x86_vendor,
+        .ctxt.lma = !is_pv_32bit_domain(currd),
+    };
+    int rc;
+    unsigned int eflags, ar;
+
+    if ( !read_descriptor(regs->cs, curr, &ctxt.cs.base, &ctxt.cs.limit,
+                          &ar, 1) ||
+         !(ar & _SEGMENT_S) ||
+         !(ar & _SEGMENT_P) ||
+         !(ar & _SEGMENT_CODE) )
+        return 0;
+
+    /* Mirror virtualized state into EFLAGS. */
+    ASSERT(regs->eflags & X86_EFLAGS_IF);
+    if ( vcpu_info(curr, evtchn_upcall_mask) )
+        regs->eflags &= ~X86_EFLAGS_IF;
+    else
+        regs->eflags |= X86_EFLAGS_IF;
+    ASSERT(!(regs->eflags & X86_EFLAGS_IOPL));
+    regs->eflags |= curr->arch.pv_vcpu.iopl;
+    eflags = regs->eflags;
+
+    ctxt.ctxt.addr_size = ar & _SEGMENT_L ? 64 : ar & _SEGMENT_DB ? 32 : 16;
+    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed. */
+    rc = x86_emulate(&ctxt.ctxt, &priv_op_ops);
+
+    if ( ctxt.io_emul_stub )
+        unmap_domain_page(ctxt.io_emul_stub);
+
+    /*
+     * Un-mirror virtualized state from EFLAGS.
+     * Nothing we allow to be emulated can change anything other than the
+     * arithmetic bits, and the resume flag.
+     */
+    ASSERT(!((regs->eflags ^ eflags) &
+             ~(X86_EFLAGS_RF | X86_EFLAGS_ARITH_MASK)));
+    regs->eflags |= X86_EFLAGS_IF;
+    regs->eflags &= ~X86_EFLAGS_IOPL;
+
+    switch ( rc )
+    {
+    case X86EMUL_OKAY:
+        if ( ctxt.tsc & TSC_BASE )
+        {
+            if ( ctxt.tsc & TSC_AUX )
+                pv_soft_rdtsc(curr, regs, 1);
+            else if ( currd->arch.vtsc )
+                pv_soft_rdtsc(curr, regs, 0);
+            else
+                msr_split(regs, rdtsc());
+        }
+
+        if ( ctxt.ctxt.retire.singlestep )
+            ctxt.bpmatch |= DR_STEP;
+        if ( ctxt.bpmatch )
+        {
+            curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
+            if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
+                pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+        }
+        /* fall through */
+    case X86EMUL_RETRY:
+        return EXCRET_fault_fixed;
+
+    case X86EMUL_EXCEPTION:
+        pv_inject_event(&ctxt.ctxt.event);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/x86_64/gpr_switch.S b/xen/arch/x86/pv/gpr_switch.S
similarity index 100%
rename from xen/arch/x86/x86_64/gpr_switch.S
rename to xen/arch/x86/pv/gpr_switch.S
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index cd8ca20398..cd43e9f44c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -78,6 +78,8 @@
 #include <asm/cpuid.h>
 #include <xsm/xsm.h>
 
+#include <asm/pv/traps.h>
+
 /*
  * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
  *  fatal:  Xen prints diagnostic message and then hangs.
@@ -705,41 +707,6 @@ static void instruction_done(struct cpu_user_regs *regs, unsigned long rip)
     }
 }
 
-static unsigned int check_guest_io_breakpoint(struct vcpu *v,
-    unsigned int port, unsigned int len)
-{
-    unsigned int width, i, match = 0;
-    unsigned long start;
-
-    if ( !(v->arch.debugreg[5]) ||
-         !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) )
-        return 0;
-
-    for ( i = 0; i < 4; i++ )
-    {
-        if ( !(v->arch.debugreg[5] &
-               (3 << (i * DR_ENABLE_SIZE))) )
-            continue;
-
-        start = v->arch.debugreg[i];
-        width = 0;
-
-        switch ( (v->arch.debugreg[7] >>
-                  (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
-        {
-        case DR_LEN_1: width = 1; break;
-        case DR_LEN_2: width = 2; break;
-        case DR_LEN_4: width = 4; break;
-        case DR_LEN_8: width = 8; break;
-        }
-
-        if ( (start < (port + len)) && ((start + width) > port) )
-            match |= 1 << i;
-    }
-
-    return match;
-}
-
 /*
  * Called from asm to set up the MCE trapbounce info.
  * Returns 0 if no callback is set up, else 1.
@@ -1733,1327 +1700,6 @@ static int read_gate_descriptor(unsigned int gate_sel,
     return 1;
 }
 
-static int pv_emul_virt_to_linear(unsigned long base, unsigned long offset,
-                                  unsigned int bytes, unsigned long limit,
-                                  enum x86_segment seg,
-                                  struct x86_emulate_ctxt *ctxt,
-                                  unsigned long *addr)
-{
-    int rc = X86EMUL_OKAY;
-
-    *addr = base + offset;
-
-    if ( ctxt->addr_size < 64 )
-    {
-        if ( limit < bytes - 1 || offset > limit - bytes + 1 )
-            rc = X86EMUL_EXCEPTION;
-        *addr = (uint32_t)*addr;
-    }
-    else if ( !__addr_ok(*addr) )
-        rc = X86EMUL_EXCEPTION;
-
-    if ( unlikely(rc == X86EMUL_EXCEPTION) )
-        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
-                                                : TRAP_stack_error,
-                              0, ctxt);
-
-    return rc;
-}
-
-struct priv_op_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    struct {
-        unsigned long base, limit;
-    } cs;
-    char *io_emul_stub;
-    unsigned int bpmatch;
-    unsigned int tsc;
-#define TSC_BASE 1
-#define TSC_AUX 2
-};
-
-static int priv_op_insn_fetch(enum x86_segment seg,
-                              unsigned long offset,
-                              void *p_data,
-                              unsigned int bytes,
-                              struct x86_emulate_ctxt *ctxt)
-{
-    const struct priv_op_ctxt *poc =
-        container_of(ctxt, struct priv_op_ctxt, ctxt);
-    unsigned int rc;
-    unsigned long addr = poc->cs.base + offset;
-
-    ASSERT(seg == x86_seg_cs);
-
-    /* We don't mean to emulate any branches. */
-    if ( !bytes )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = pv_emul_virt_to_linear(poc->cs.base, offset, bytes, poc->cs.limit,
-                                x86_seg_cs, ctxt, &addr);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( (rc = __copy_from_user(p_data, (void *)addr, bytes)) != 0 )
-    {
-        /*
-         * TODO: This should report PFEC_insn_fetch when goc->insn_fetch &&
-         * cpu_has_nx, but we'd then need a "fetch" variant of
-         * __copy_from_user() respecting NX, SMEP, and protection keys.
-         */
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_read_segment(enum x86_segment seg,
-                                struct segment_register *reg,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    /* Check if this is an attempt to access the I/O bitmap. */
-    if ( seg == x86_seg_tr )
-    {
-        switch ( ctxt->opcode )
-        {
-        case 0x6c ... 0x6f: /* ins / outs */
-        case 0xe4 ... 0xe7: /* in / out (immediate port) */
-        case 0xec ... 0xef: /* in / out (port in %dx) */
-            /* Defer the check to priv_op_{read,write}_io(). */
-            return X86EMUL_DONE;
-        }
-    }
-
-    if ( ctxt->addr_size < 64 )
-    {
-        unsigned long limit;
-        unsigned int sel, ar;
-
-        switch ( seg )
-        {
-        case x86_seg_cs: sel = ctxt->regs->cs; break;
-        case x86_seg_ds: sel = read_sreg(ds);  break;
-        case x86_seg_es: sel = read_sreg(es);  break;
-        case x86_seg_fs: sel = read_sreg(fs);  break;
-        case x86_seg_gs: sel = read_sreg(gs);  break;
-        case x86_seg_ss: sel = ctxt->regs->ss; break;
-        default: return X86EMUL_UNHANDLEABLE;
-        }
-
-        if ( !read_descriptor(sel, current, &reg->base, &limit, &ar, 0) )
-            return X86EMUL_UNHANDLEABLE;
-
-        reg->limit = limit;
-        reg->attr.bytes = ar >> 8;
-    }
-    else
-    {
-        switch ( seg )
-        {
-        default:
-            if ( !is_x86_user_segment(seg) )
-                return X86EMUL_UNHANDLEABLE;
-            reg->base = 0;
-            break;
-        case x86_seg_fs:
-            reg->base = rdfsbase();
-            break;
-        case x86_seg_gs:
-            reg->base = rdgsbase();
-            break;
-        }
-
-        reg->limit = ~0U;
-
-        reg->attr.bytes = 0;
-        reg->attr.fields.type = _SEGMENT_WR >> 8;
-        if ( seg == x86_seg_cs )
-        {
-            reg->attr.fields.type |= _SEGMENT_CODE >> 8;
-            reg->attr.fields.l = 1;
-        }
-        else
-            reg->attr.fields.db = 1;
-        reg->attr.fields.s   = 1;
-        reg->attr.fields.dpl = 3;
-        reg->attr.fields.p   = 1;
-        reg->attr.fields.g   = 1;
-    }
-
-    /*
-     * For x86_emulate.c's mode_ring0() to work, fake a DPL of zero.
-     * Also do this for consistency for non-conforming code segments.
-     */
-    if ( (seg == x86_seg_ss ||
-          (seg == x86_seg_cs &&
-           !(reg->attr.fields.type & (_SEGMENT_EC >> 8)))) &&
-         guest_kernel_mode(current, ctxt->regs) )
-        reg->attr.fields.dpl = 0;
-
-    return X86EMUL_OKAY;
-}
-
-/* Perform IOPL check between the vcpu's shadowed IOPL, and the assumed cpl. */
-static bool_t iopl_ok(const struct vcpu *v, const struct cpu_user_regs *regs)
-{
-    unsigned int cpl = guest_kernel_mode(v, regs) ?
-        (VM_ASSIST(v->domain, architectural_iopl) ? 0 : 1) : 3;
-
-    ASSERT((v->arch.pv_vcpu.iopl & ~X86_EFLAGS_IOPL) == 0);
-
-    return IOPL(cpl) <= v->arch.pv_vcpu.iopl;
-}
-
-/* Has the guest requested sufficient permission for this I/O access? */
-static int guest_io_okay(
-    unsigned int port, unsigned int bytes,
-    struct vcpu *v, struct cpu_user_regs *regs)
-{
-    /* If in user mode, switch to kernel mode just to read I/O bitmap. */
-    int user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
-    if ( iopl_ok(v, regs) )
-        return 1;
-
-    if ( v->arch.pv_vcpu.iobmp_limit > (port + bytes) )
-    {
-        union { uint8_t bytes[2]; uint16_t mask; } x;
-
-        /*
-         * Grab permission bytes from guest space. Inaccessible bytes are
-         * read as 0xff (no access allowed).
-         */
-        TOGGLE_MODE();
-        switch ( __copy_from_guest_offset(x.bytes, v->arch.pv_vcpu.iobmp,
-                                          port>>3, 2) )
-        {
-        default: x.bytes[0] = ~0;
-            /* fallthrough */
-        case 1:  x.bytes[1] = ~0;
-            /* fallthrough */
-        case 0:  break;
-        }
-        TOGGLE_MODE();
-
-        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
-            return 1;
-    }
-
-    return 0;
-}
-
-/* Has the administrator granted sufficient permission for this I/O access? */
-static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
-                            const struct domain *d)
-{
-    /*
-     * Port 0xcf8 (CONFIG_ADDRESS) is only visible for DWORD accesses.
-     * We never permit direct access to that register.
-     */
-    if ( (port == 0xcf8) && (bytes == 4) )
-        return 0;
-
-    /* We also never permit direct access to the RTC/CMOS registers. */
-    if ( ((port & ~1) == RTC_PORT(0)) )
-        return 0;
-
-    return ioports_access_permitted(d, port, port + bytes - 1);
-}
-
-static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
-                         unsigned int size, uint32_t *write)
-{
-    uint32_t machine_bdf;
-
-    if ( !is_hardware_domain(currd) )
-        return 0;
-
-    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
-        return 1;
-
-    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
-    if ( write )
-    {
-        const unsigned long *ro_map = pci_get_ro_map(0);
-
-        if ( ro_map && test_bit(machine_bdf, ro_map) )
-            return 0;
-    }
-    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
-    /* AMD extended configuration space access? */
-    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
-         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
-    {
-        uint64_t msr_val;
-
-        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
-            return 0;
-        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
-            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
-    }
-
-    return !write ?
-           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
-                                     start, start + size - 1, 0) == 0 :
-           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
-}
-
-uint32_t guest_io_read(unsigned int port, unsigned int bytes,
-                       struct domain *currd)
-{
-    uint32_t data = 0;
-    unsigned int shift = 0;
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        switch ( bytes )
-        {
-        case 1: return inb(port);
-        case 2: return inw(port);
-        case 4: return inl(port);
-        }
-    }
-
-    while ( bytes != 0 )
-    {
-        unsigned int size = 1;
-        uint32_t sub_data = ~0;
-
-        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
-        {
-            sub_data = pv_pit_handler(port, 0, 0);
-        }
-        else if ( port == RTC_PORT(0) )
-        {
-            sub_data = currd->arch.cmos_idx;
-        }
-        else if ( (port == RTC_PORT(1)) &&
-                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
-        {
-            unsigned long flags;
-
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            sub_data = inb(RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
-        }
-        else if ( (port == 0xcf8) && (bytes == 4) )
-        {
-            size = 4;
-            sub_data = currd->arch.pci_cf8;
-        }
-        else if ( (port & 0xfffc) == 0xcfc )
-        {
-            size = min(bytes, 4 - (port & 3));
-            if ( size == 3 )
-                size = 2;
-            if ( pci_cfg_ok(currd, port & 3, size, NULL) )
-                sub_data = pci_conf_read(currd->arch.pci_cf8, port & 3, size);
-        }
-
-        if ( size == 4 )
-            return sub_data;
-
-        data |= (sub_data & ((1u << (size * 8)) - 1)) << shift;
-        shift += size * 8;
-        port += size;
-        bytes -= size;
-    }
-
-    return data;
-}
-
-void guest_io_write(unsigned int port, unsigned int bytes, uint32_t data,
-                    struct domain *currd)
-{
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        switch ( bytes ) {
-        case 1:
-            outb((uint8_t)data, port);
-            if ( pv_post_outb_hook )
-                pv_post_outb_hook(port, (uint8_t)data);
-            break;
-        case 2:
-            outw((uint16_t)data, port);
-            break;
-        case 4:
-            outl(data, port);
-            break;
-        }
-        return;
-    }
-
-    while ( bytes != 0 )
-    {
-        unsigned int size = 1;
-
-        if ( (port == 0x42) || (port == 0x43) || (port == 0x61) )
-        {
-            pv_pit_handler(port, (uint8_t)data, 1);
-        }
-        else if ( port == RTC_PORT(0) )
-        {
-            currd->arch.cmos_idx = data;
-        }
-        else if ( (port == RTC_PORT(1)) &&
-                  ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
-        {
-            unsigned long flags;
-
-            if ( pv_rtc_handler )
-                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            outb(data, RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
-        }
-        else if ( (port == 0xcf8) && (bytes == 4) )
-        {
-            size = 4;
-            currd->arch.pci_cf8 = data;
-        }
-        else if ( (port & 0xfffc) == 0xcfc )
-        {
-            size = min(bytes, 4 - (port & 3));
-            if ( size == 3 )
-                size = 2;
-            if ( pci_cfg_ok(currd, port & 3, size, &data) )
-                pci_conf_write(currd->arch.pci_cf8, port & 3, size, data);
-        }
-
-        if ( size == 4 )
-            return;
-
-        port += size;
-        bytes -= size;
-        data >>= size * 8;
-    }
-}
-
-/* I/O emulation support. Helper routines for, and type of, the stack stub.*/
-void host_to_guest_gpr_switch(struct cpu_user_regs *);
-unsigned long guest_to_host_gpr_switch(unsigned long);
-
-void (*pv_post_outb_hook)(unsigned int port, u8 value);
-
-typedef void io_emul_stub_t(struct cpu_user_regs *);
-
-static io_emul_stub_t *io_emul_stub_setup(struct priv_op_ctxt *ctxt, u8 opcode,
-                                          unsigned int port, unsigned int bytes)
-{
-    if ( !ctxt->io_emul_stub )
-        ctxt->io_emul_stub = map_domain_page(_mfn(this_cpu(stubs.mfn))) +
-                                             (this_cpu(stubs.addr) &
-                                              ~PAGE_MASK) +
-                                             STUB_BUF_SIZE / 2;
-
-    /* movq $host_to_guest_gpr_switch,%rcx */
-    ctxt->io_emul_stub[0] = 0x48;
-    ctxt->io_emul_stub[1] = 0xb9;
-    *(void **)&ctxt->io_emul_stub[2] = (void *)host_to_guest_gpr_switch;
-    /* callq *%rcx */
-    ctxt->io_emul_stub[10] = 0xff;
-    ctxt->io_emul_stub[11] = 0xd1;
-    /* data16 or nop */
-    ctxt->io_emul_stub[12] = (bytes != 2) ? 0x90 : 0x66;
-    /* <io-access opcode> */
-    ctxt->io_emul_stub[13] = opcode;
-    /* imm8 or nop */
-    ctxt->io_emul_stub[14] = !(opcode & 8) ? port : 0x90;
-    /* ret (jumps to guest_to_host_gpr_switch) */
-    ctxt->io_emul_stub[15] = 0xc3;
-    BUILD_BUG_ON(STUB_BUF_SIZE / 2 < 16);
-
-    if ( ioemul_handle_quirk )
-        ioemul_handle_quirk(opcode, &ctxt->io_emul_stub[12], ctxt->ctxt.regs);
-
-    /* Handy function-typed pointer to the stub. */
-    return (void *)(this_cpu(stubs.addr) + STUB_BUF_SIZE / 2);
-}
-
-static int priv_op_read_io(unsigned int port, unsigned int bytes,
-                           unsigned long *val, struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-
-    /* INS must not come here. */
-    ASSERT((ctxt->opcode & ~9) == 0xe4);
-
-    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        io_emul_stub_t *io_emul =
-            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
-
-        mark_regs_dirty(ctxt->regs);
-        io_emul(ctxt->regs);
-        return X86EMUL_DONE;
-    }
-
-    *val = guest_io_read(port, bytes, currd);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_write_io(unsigned int port, unsigned int bytes,
-                            unsigned long val, struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-
-    /* OUTS must not come here. */
-    ASSERT((ctxt->opcode & ~9) == 0xe6);
-
-    if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
-
-    if ( admin_io_okay(port, bytes, currd) )
-    {
-        io_emul_stub_t *io_emul =
-            io_emul_stub_setup(poc, ctxt->opcode, port, bytes);
-
-        mark_regs_dirty(ctxt->regs);
-        io_emul(ctxt->regs);
-        if ( (bytes == 1) && pv_post_outb_hook )
-            pv_post_outb_hook(port, val);
-        return X86EMUL_DONE;
-    }
-
-    guest_io_write(port, bytes, val, currd);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_rep_ins(uint16_t port,
-                           enum x86_segment seg, unsigned long offset,
-                           unsigned int bytes_per_rep, unsigned long *reps,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-    unsigned long goal = *reps;
-    struct segment_register sreg;
-    int rc;
-
-    ASSERT(seg == x86_seg_es);
-
-    *reps = 0;
-
-    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = priv_op_read_segment(x86_seg_es, &sreg, ctxt);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( !sreg.attr.fields.p )
-        return X86EMUL_UNHANDLEABLE;
-    if ( !sreg.attr.fields.s ||
-         (sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) ||
-         !(sreg.attr.fields.type & (_SEGMENT_WR >> 8)) )
-    {
-        x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
-
-    while ( *reps < goal )
-    {
-        unsigned int data = guest_io_read(port, bytes_per_rep, currd);
-        unsigned long addr;
-
-        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
-                                    sreg.limit, x86_seg_es, ctxt, &addr);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-
-        if ( (rc = __copy_to_user((void *)addr, &data, bytes_per_rep)) != 0 )
-        {
-            x86_emul_pagefault(PFEC_write_access,
-                               addr + bytes_per_rep - rc, ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-
-        ++*reps;
-
-        if ( poc->bpmatch || hypercall_preempt_check() )
-            break;
-
-        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
-        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
-            offset -= bytes_per_rep;
-        else
-            offset += bytes_per_rep;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_rep_outs(enum x86_segment seg, unsigned long offset,
-                            uint16_t port,
-                            unsigned int bytes_per_rep, unsigned long *reps,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    struct vcpu *curr = current;
-    struct domain *currd = current->domain;
-    unsigned long goal = *reps;
-    struct segment_register sreg;
-    int rc;
-
-    *reps = 0;
-
-    if ( !guest_io_okay(port, bytes_per_rep, curr, ctxt->regs) )
-        return X86EMUL_UNHANDLEABLE;
-
-    rc = priv_op_read_segment(seg, &sreg, ctxt);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
-
-    if ( !sreg.attr.fields.p )
-        return X86EMUL_UNHANDLEABLE;
-    if ( !sreg.attr.fields.s ||
-         ((sreg.attr.fields.type & (_SEGMENT_CODE >> 8)) &&
-          !(sreg.attr.fields.type & (_SEGMENT_WR >> 8))) )
-    {
-        x86_emul_hw_exception(seg != x86_seg_ss ? TRAP_gp_fault
-                                                : TRAP_stack_error,
-                              0, ctxt);
-        return X86EMUL_EXCEPTION;
-    }
-
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
-
-    while ( *reps < goal )
-    {
-        unsigned int data = 0;
-        unsigned long addr;
-
-        rc = pv_emul_virt_to_linear(sreg.base, offset, bytes_per_rep,
-                                    sreg.limit, seg, ctxt, &addr);
-        if ( rc != X86EMUL_OKAY )
-            return rc;
-
-        if ( (rc = __copy_from_user(&data, (void *)addr, bytes_per_rep)) != 0 )
-        {
-            x86_emul_pagefault(0, addr + bytes_per_rep - rc, ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-
-        guest_io_write(port, bytes_per_rep, data, currd);
-
-        ++*reps;
-
-        if ( poc->bpmatch || hypercall_preempt_check() )
-            break;
-
-        /* x86_emulate() clips the repetition count to ensure we don't wrap. */
-        if ( unlikely(ctxt->regs->eflags & X86_EFLAGS_DF) )
-            offset -= bytes_per_rep;
-        else
-            offset += bytes_per_rep;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_read_cr(unsigned int reg, unsigned long *val,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    const struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0: /* Read CR0 */
-        *val = (read_cr0() & ~X86_CR0_TS) | curr->arch.pv_vcpu.ctrlreg[0];
-        return X86EMUL_OKAY;
-
-    case 2: /* Read CR2 */
-    case 4: /* Read CR4 */
-        *val = curr->arch.pv_vcpu.ctrlreg[reg];
-        return X86EMUL_OKAY;
-
-    case 3: /* Read CR3 */
-    {
-        const struct domain *currd = curr->domain;
-        unsigned long mfn;
-
-        if ( !is_pv_32bit_domain(currd) )
-        {
-            mfn = pagetable_get_pfn(curr->arch.guest_table);
-            *val = xen_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
-        }
-        else
-        {
-            l4_pgentry_t *pl4e =
-                map_domain_page(_mfn(pagetable_get_pfn(curr->arch.guest_table)));
-
-            mfn = l4e_get_pfn(*pl4e);
-            unmap_domain_page(pl4e);
-            *val = compat_pfn_to_cr3(mfn_to_gmfn(currd, mfn));
-        }
-        /* PTs should not be shared */
-        BUG_ON(page_get_owner(mfn_to_page(mfn)) == dom_cow);
-        return X86EMUL_OKAY;
-    }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_write_cr(unsigned int reg, unsigned long val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct vcpu *curr = current;
-
-    switch ( reg )
-    {
-    case 0: /* Write CR0 */
-        if ( (val ^ read_cr0()) & ~X86_CR0_TS )
-        {
-            gdprintk(XENLOG_WARNING,
-                    "Attempt to change unmodifiable CR0 flags\n");
-            break;
-        }
-        do_fpu_taskswitch(!!(val & X86_CR0_TS));
-        return X86EMUL_OKAY;
-
-    case 2: /* Write CR2 */
-        curr->arch.pv_vcpu.ctrlreg[2] = val;
-        arch_set_cr2(curr, val);
-        return X86EMUL_OKAY;
-
-    case 3: /* Write CR3 */
-    {
-        struct domain *currd = curr->domain;
-        unsigned long gfn;
-        struct page_info *page;
-        int rc;
-
-        gfn = !is_pv_32bit_domain(currd)
-              ? xen_cr3_to_pfn(val) : compat_cr3_to_pfn(val);
-        page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
-        if ( !page )
-            break;
-        rc = new_guest_cr3(page_to_mfn(page));
-        put_page(page);
-
-        switch ( rc )
-        {
-        case 0:
-            return X86EMUL_OKAY;
-        case -ERESTART: /* retry after preemption */
-            return X86EMUL_RETRY;
-        }
-        break;
-    }
-
-    case 4: /* Write CR4 */
-        curr->arch.pv_vcpu.ctrlreg[4] = pv_guest_cr4_fixup(curr, val);
-        write_cr4(pv_guest_cr4_to_real_cr4(curr));
-        ctxt_switch_levelling(curr);
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_read_dr(unsigned int reg, unsigned long *val,
-                           struct x86_emulate_ctxt *ctxt)
-{
-    unsigned long res = do_get_debugreg(reg);
-
-    if ( IS_ERR_VALUE(res) )
-        return X86EMUL_UNHANDLEABLE;
-
-    *val = res;
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_write_dr(unsigned int reg, unsigned long val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    return do_set_debugreg(reg, val) == 0
-           ? X86EMUL_OKAY : X86EMUL_UNHANDLEABLE;
-}
-
-static inline uint64_t guest_misc_enable(uint64_t val)
-{
-    val &= ~(MSR_IA32_MISC_ENABLE_PERF_AVAIL |
-             MSR_IA32_MISC_ENABLE_MONITOR_ENABLE);
-    val |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
-           MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
-           MSR_IA32_MISC_ENABLE_XTPR_DISABLE;
-    return val;
-}
-
-static inline bool is_cpufreq_controller(const struct domain *d)
-{
-    return ((cpufreq_controller == FREQCTL_dom0_kernel) &&
-            is_hardware_domain(d));
-}
-
-static int priv_op_read_msr(unsigned int reg, uint64_t *val,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    struct priv_op_ctxt *poc = container_of(ctxt, struct priv_op_ctxt, ctxt);
-    const struct vcpu *curr = current;
-    const struct domain *currd = curr->domain;
-    bool vpmu_msr = false;
-
-    switch ( reg )
-    {
-        int rc;
-
-    case MSR_FS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = cpu_has_fsgsbase ? __rdfsbase() : curr->arch.pv_vcpu.fs_base;
-        return X86EMUL_OKAY;
-
-    case MSR_GS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = cpu_has_fsgsbase ? __rdgsbase()
-                                : curr->arch.pv_vcpu.gs_base_kernel;
-        return X86EMUL_OKAY;
-
-    case MSR_SHADOW_GS_BASE:
-        if ( is_pv_32bit_domain(currd) )
-            break;
-        *val = curr->arch.pv_vcpu.gs_base_user;
-        return X86EMUL_OKAY;
-
-    /*
-     * In order to fully retain original behavior, defer calling
-     * pv_soft_rdtsc() until after emulation. This may want/need to be
-     * reconsidered.
-     */
-    case MSR_IA32_TSC:
-        poc->tsc |= TSC_BASE;
-        goto normal;
-
-    case MSR_TSC_AUX:
-        poc->tsc |= TSC_AUX;
-        if ( cpu_has_rdtscp )
-            goto normal;
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_EFER:
-        *val = read_efer();
-        if ( is_pv_32bit_domain(currd) )
-            *val &= ~(EFER_LME | EFER_LMA | EFER_LMSLE);
-        return X86EMUL_OKAY;
-
-    case MSR_K7_FID_VID_CTL:
-    case MSR_K7_FID_VID_STATUS:
-    case MSR_K8_PSTATE_LIMIT:
-    case MSR_K8_PSTATE_CTRL:
-    case MSR_K8_PSTATE_STATUS:
-    case MSR_K8_PSTATE0:
-    case MSR_K8_PSTATE1:
-    case MSR_K8_PSTATE2:
-    case MSR_K8_PSTATE3:
-    case MSR_K8_PSTATE4:
-    case MSR_K8_PSTATE5:
-    case MSR_K8_PSTATE6:
-    case MSR_K8_PSTATE7:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            break;
-        if ( unlikely(is_cpufreq_controller(currd)) )
-            goto normal;
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_UCODE_REV:
-        BUILD_BUG_ON(MSR_IA32_UCODE_REV != MSR_AMD_PATCHLEVEL);
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            if ( wrmsr_safe(MSR_IA32_UCODE_REV, 0) )
-                break;
-            /* As documented in the SDM: Do a CPUID 1 here */
-            cpuid_eax(1);
-        }
-        goto normal;
-
-    case MSR_IA32_MISC_ENABLE:
-        if ( rdmsr_safe(reg, *val) )
-            break;
-        *val = guest_misc_enable(*val);
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR0_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            break;
-        *val = curr->arch.pv_vcpu.dr_mask[0];
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) )
-            break;
-        *val = curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1];
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_PERF_CAPABILITIES:
-        /* No extra capabilities are supported. */
-        *val = 0;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_PLATFORM_INFO:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             rdmsr_safe(MSR_INTEL_PLATFORM_INFO, *val) )
-            break;
-        *val = 0;
-        if ( this_cpu(cpuid_faulting_enabled) )
-            *val |= MSR_PLATFORM_INFO_CPUID_FAULTING;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_MISC_FEATURES_ENABLES:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, *val) )
-            break;
-        *val = 0;
-        if ( curr->arch.cpuid_faulting )
-            *val |= MSR_MISC_FEATURES_CPUID_FAULTING;
-        return X86EMUL_OKAY;
-
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            vpmu_msr = true;
-            /* fall through */
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
-            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
-            {
-                if ( vpmu_do_rdmsr(reg, val) )
-                    break;
-                return X86EMUL_OKAY;
-            }
-        }
-        /* fall through */
-    default:
-        if ( rdmsr_hypervisor_regs(reg, val) )
-            return X86EMUL_OKAY;
-
-        rc = vmce_rdmsr(reg, val);
-        if ( rc < 0 )
-            break;
-        if ( rc )
-            return X86EMUL_OKAY;
-        /* fall through */
-    normal:
-        /* Everyone can read the MSR space. */
-        /* gdprintk(XENLOG_WARNING, "Domain attempted RDMSR %08x\n", reg); */
-        if ( rdmsr_safe(reg, *val) )
-            break;
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-#include "x86_64/mmconfig.h"
-
-static int priv_op_write_msr(unsigned int reg, uint64_t val,
-                             struct x86_emulate_ctxt *ctxt)
-{
-    struct vcpu *curr = current;
-    const struct domain *currd = curr->domain;
-    bool vpmu_msr = false;
-
-    switch ( reg )
-    {
-        uint64_t temp;
-        int rc;
-
-    case MSR_FS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrfsbase(val);
-        curr->arch.pv_vcpu.fs_base = val;
-        return X86EMUL_OKAY;
-
-    case MSR_GS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrgsbase(val);
-        curr->arch.pv_vcpu.gs_base_kernel = val;
-        return X86EMUL_OKAY;
-
-    case MSR_SHADOW_GS_BASE:
-        if ( is_pv_32bit_domain(currd) || !is_canonical_address(val) )
-            break;
-        wrmsrl(MSR_SHADOW_GS_BASE, val);
-        curr->arch.pv_vcpu.gs_base_user = val;
-        return X86EMUL_OKAY;
-
-    case MSR_K7_FID_VID_STATUS:
-    case MSR_K7_FID_VID_CTL:
-    case MSR_K8_PSTATE_LIMIT:
-    case MSR_K8_PSTATE_CTRL:
-    case MSR_K8_PSTATE_STATUS:
-    case MSR_K8_PSTATE0:
-    case MSR_K8_PSTATE1:
-    case MSR_K8_PSTATE2:
-    case MSR_K8_PSTATE3:
-    case MSR_K8_PSTATE4:
-    case MSR_K8_PSTATE5:
-    case MSR_K8_PSTATE6:
-    case MSR_K8_PSTATE7:
-    case MSR_K8_HWCR:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_AMD64_NB_CFG:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
-             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
-             ((val ^ temp) & ~(1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-            goto invalid;
-        if ( wrmsr_safe(MSR_AMD64_NB_CFG, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_FAM10H_MMIO_CONF_BASE:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ||
-             boot_cpu_data.x86 < 0x10 || boot_cpu_data.x86 > 0x17 )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, temp) != 0 )
-            break;
-        if ( (pci_probe & PCI_PROBE_MASK) == PCI_PROBE_MMCONF ?
-             temp != val :
-             ((temp ^ val) &
-              ~(FAM10H_MMIO_CONF_ENABLE |
-                (FAM10H_MMIO_CONF_BUSRANGE_MASK <<
-                 FAM10H_MMIO_CONF_BUSRANGE_SHIFT) |
-                ((u64)FAM10H_MMIO_CONF_BASE_MASK <<
-                 FAM10H_MMIO_CONF_BASE_SHIFT))) )
-            goto invalid;
-        if ( wrmsr_safe(MSR_FAM10H_MMIO_CONF_BASE, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_UCODE_REV:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) )
-            return X86EMUL_OKAY;
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val )
-            goto invalid;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_MISC_ENABLE:
-        if ( rdmsr_safe(reg, temp) )
-            break;
-        if ( val != guest_misc_enable(temp) )
-            goto invalid;
-        return X86EMUL_OKAY;
-
-    case MSR_IA32_MPERF:
-    case MSR_IA32_APERF:
-        if ( (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) &&
-             (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_PERF_CTL:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( likely(!is_cpufreq_controller(currd)) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_IA32_THERM_CONTROL:
-    case MSR_IA32_ENERGY_PERF_BIAS:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
-            break;
-        if ( !is_hardware_domain(currd) || !is_pinned_vcpu(curr) ||
-             wrmsr_safe(reg, val) == 0 )
-            return X86EMUL_OKAY;
-        break;
-
-    case MSR_AMD64_DR0_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
-            break;
-        curr->arch.pv_vcpu.dr_mask[0] = val;
-        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
-            wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, val);
-        return X86EMUL_OKAY;
-
-    case MSR_AMD64_DR1_ADDRESS_MASK ... MSR_AMD64_DR3_ADDRESS_MASK:
-        if ( !boot_cpu_has(X86_FEATURE_DBEXT) || (val >> 32) )
-            break;
-        curr->arch.pv_vcpu.dr_mask[reg - MSR_AMD64_DR1_ADDRESS_MASK + 1] = val;
-        if ( curr->arch.debugreg[7] & DR7_ACTIVE_MASK )
-            wrmsrl(reg, val);
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_PLATFORM_INFO:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             val || rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) )
-            break;
-        return X86EMUL_OKAY;
-
-    case MSR_INTEL_MISC_FEATURES_ENABLES:
-        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-             (val & ~MSR_MISC_FEATURES_CPUID_FAULTING) ||
-             rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES, temp) )
-            break;
-        if ( (val & MSR_MISC_FEATURES_CPUID_FAULTING) &&
-             !this_cpu(cpuid_faulting_enabled) )
-            break;
-        curr->arch.cpuid_faulting = !!(val & MSR_MISC_FEATURES_CPUID_FAULTING);
-        return X86EMUL_OKAY;
-
-    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
-    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
-    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
-    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
-        {
-            vpmu_msr = true;
-    case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
-    case MSR_K7_EVNTSEL0...MSR_K7_PERFCTR3:
-            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
-            {
-                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
-                     !is_hardware_domain(currd) )
-                    return X86EMUL_OKAY;
-
-                if ( vpmu_do_wrmsr(reg, val, 0) )
-                    break;
-                return X86EMUL_OKAY;
-            }
-        }
-        /* fall through */
-    default:
-        if ( wrmsr_hypervisor_regs(reg, val) == 1 )
-            return X86EMUL_OKAY;
-
-        rc = vmce_wrmsr(reg, val);
-        if ( rc < 0 )
-            break;
-        if ( rc )
-            return X86EMUL_OKAY;
-
-        if ( (rdmsr_safe(reg, temp) != 0) || (val != temp) )
-    invalid:
-            gdprintk(XENLOG_WARNING,
-                     "Domain attempted WRMSR %08x from 0x%016"PRIx64" to 0x%016"PRIx64"\n",
-                     reg, temp, val);
-        return X86EMUL_OKAY;
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static int priv_op_wbinvd(struct x86_emulate_ctxt *ctxt)
-{
-    /* Ignore the instruction if unprivileged. */
-    if ( !cache_flush_permitted(current->domain) )
-        /*
-         * Non-physdev domain attempted WBINVD; ignore for now since
-         * newer linux uses this in some start-of-day timing loops.
-         */
-        ;
-    else
-        wbinvd();
-
-    return X86EMUL_OKAY;
-}
-
-int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
-                  struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt)
-{
-    guest_cpuid(current, leaf, subleaf, res);
-
-    return X86EMUL_OKAY;
-}
-
-static int priv_op_validate(const struct x86_emulate_state *state,
-                            struct x86_emulate_ctxt *ctxt)
-{
-    switch ( ctxt->opcode )
-    {
-    case 0x6c ... 0x6f: /* ins / outs */
-    case 0xe4 ... 0xe7: /* in / out (immediate port) */
-    case 0xec ... 0xef: /* in / out (port in %dx) */
-    case X86EMUL_OPC(0x0f, 0x06): /* clts */
-    case X86EMUL_OPC(0x0f, 0x09): /* wbinvd */
-    case X86EMUL_OPC(0x0f, 0x20) ...
-         X86EMUL_OPC(0x0f, 0x23): /* mov to/from cr/dr */
-    case X86EMUL_OPC(0x0f, 0x30): /* wrmsr */
-    case X86EMUL_OPC(0x0f, 0x31): /* rdtsc */
-    case X86EMUL_OPC(0x0f, 0x32): /* rdmsr */
-    case X86EMUL_OPC(0x0f, 0xa2): /* cpuid */
-        return X86EMUL_OKAY;
-
-    case 0xfa: case 0xfb: /* cli / sti */
-        if ( !iopl_ok(current, ctxt->regs) )
-            break;
-        /*
-         * This is just too dangerous to allow, in my opinion. Consider if the
-         * caller then tries to reenable interrupts using POPF: we can't trap
-         * that and we'll end up with hard-to-debug lockups. Fast & loose will
-         * do for us. :-)
-        vcpu_info(current, evtchn_upcall_mask) = (ctxt->opcode == 0xfa);
-         */
-        return X86EMUL_DONE;
-
-    case X86EMUL_OPC(0x0f, 0x01):
-    {
-        unsigned int modrm_rm, modrm_reg;
-
-        if ( x86_insn_modrm(state, &modrm_rm, &modrm_reg) != 3 ||
-             (modrm_rm & 7) != 1 )
-            break;
-        switch ( modrm_reg & 7 )
-        {
-        case 2: /* xsetbv */
-        case 7: /* rdtscp */
-            return X86EMUL_OKAY;
-        }
-        break;
-    }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops priv_op_ops = {
-    .insn_fetch          = priv_op_insn_fetch,
-    .read                = x86emul_unhandleable_rw,
-    .validate            = priv_op_validate,
-    .read_io             = priv_op_read_io,
-    .write_io            = priv_op_write_io,
-    .rep_ins             = priv_op_rep_ins,
-    .rep_outs            = priv_op_rep_outs,
-    .read_segment        = priv_op_read_segment,
-    .read_cr             = priv_op_read_cr,
-    .write_cr            = priv_op_write_cr,
-    .read_dr             = priv_op_read_dr,
-    .write_dr            = priv_op_write_dr,
-    .read_msr            = priv_op_read_msr,
-    .write_msr           = priv_op_write_msr,
-    .cpuid               = pv_emul_cpuid,
-    .wbinvd              = priv_op_wbinvd,
-};
-
-static int emulate_privileged_op(struct cpu_user_regs *regs)
-{
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    struct priv_op_ctxt ctxt = {
-        .ctxt.regs = regs,
-        .ctxt.vendor = currd->arch.cpuid->x86_vendor,
-        .ctxt.lma = !is_pv_32bit_domain(currd),
-    };
-    int rc;
-    unsigned int eflags, ar;
-
-    if ( !read_descriptor(regs->cs, curr, &ctxt.cs.base, &ctxt.cs.limit,
-                          &ar, 1) ||
-         !(ar & _SEGMENT_S) ||
-         !(ar & _SEGMENT_P) ||
-         !(ar & _SEGMENT_CODE) )
-        return 0;
-
-    /* Mirror virtualized state into EFLAGS. */
-    ASSERT(regs->eflags & X86_EFLAGS_IF);
-    if ( vcpu_info(curr, evtchn_upcall_mask) )
-        regs->eflags &= ~X86_EFLAGS_IF;
-    else
-        regs->eflags |= X86_EFLAGS_IF;
-    ASSERT(!(regs->eflags & X86_EFLAGS_IOPL));
-    regs->eflags |= curr->arch.pv_vcpu.iopl;
-    eflags = regs->eflags;
-
-    ctxt.ctxt.addr_size = ar & _SEGMENT_L ? 64 : ar & _SEGMENT_DB ? 32 : 16;
-    /* Leave zero in ctxt.ctxt.sp_size, as it's not needed. */
-    rc = x86_emulate(&ctxt.ctxt, &priv_op_ops);
-
-    if ( ctxt.io_emul_stub )
-        unmap_domain_page(ctxt.io_emul_stub);
-
-    /*
-     * Un-mirror virtualized state from EFLAGS.
-     * Nothing we allow to be emulated can change anything other than the
-     * arithmetic bits, and the resume flag.
-     */
-    ASSERT(!((regs->eflags ^ eflags) &
-             ~(X86_EFLAGS_RF | X86_EFLAGS_ARITH_MASK)));
-    regs->eflags |= X86_EFLAGS_IF;
-    regs->eflags &= ~X86_EFLAGS_IOPL;
-
-    switch ( rc )
-    {
-    case X86EMUL_OKAY:
-        if ( ctxt.tsc & TSC_BASE )
-        {
-            if ( ctxt.tsc & TSC_AUX )
-                pv_soft_rdtsc(curr, regs, 1);
-            else if ( currd->arch.vtsc )
-                pv_soft_rdtsc(curr, regs, 0);
-            else
-                msr_split(regs, rdtsc());
-        }
-
-        if ( ctxt.ctxt.retire.singlestep )
-            ctxt.bpmatch |= DR_STEP;
-        if ( ctxt.bpmatch )
-        {
-            curr->arch.debugreg[6] |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
-            if ( !(curr->arch.pv_vcpu.trap_bounce.flags & TBF_EXCEPTION) )
-                pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-        }
-        /* fall through */
-    case X86EMUL_RETRY:
-        return EXCRET_fault_fixed;
-
-    case X86EMUL_EXCEPTION:
-        pv_inject_event(&ctxt.ctxt.event);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 static inline int check_stack_limit(unsigned int ar, unsigned int limit,
                                     unsigned int esp, unsigned int decr)
 {
diff --git a/xen/arch/x86/x86_64/Makefile b/xen/arch/x86/x86_64/Makefile
index d8815e78b0..f336a6ae65 100644
--- a/xen/arch/x86/x86_64/Makefile
+++ b/xen/arch/x86/x86_64/Makefile
@@ -1,7 +1,6 @@
 subdir-y += compat
 
 obj-bin-y += entry.o
-obj-bin-y += gpr_switch.o
 obj-y += traps.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
 obj-y += pci.o
diff --git a/xen/include/asm-x86/pv/traps.h b/xen/include/asm-x86/pv/traps.h
new file mode 100644
index 0000000000..32c7bac587
--- /dev/null
+++ b/xen/include/asm-x86/pv/traps.h
@@ -0,0 +1,48 @@
+/*
+ * pv/traps.h
+ *
+ * PV guest traps interface definitions
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_TRAPS_H__
+#define __X86_PV_TRAPS_H__
+
+#ifdef CONFIG_PV
+
+#include <public/xen.h>
+
+int emulate_privileged_op(struct cpu_user_regs *regs);
+
+#else  /* !CONFIG_PV */
+
+#include <xen/errno.h>
+
+int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
+
+#endif	/* CONFIG_PV */
+
+#endif	/* __X86_PV_TRAPS_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-18 17:28   ` Wei Liu
@ 2017-05-29 15:14     ` Jan Beulich
  2017-05-30 17:27       ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Thu, 18 May 2017 16:18:56 +0100
> Subject: [PATCH] x86/traps: move privilege instruction emulation code

privileged

> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
> pv/traps.h.

A name of "emulate.c" sounds like a container for all sorts of cruft.
I'd prefer if we could use the opportunity of this re-org to see about
not having overly large files. Therefore e.g. "emul-priv.c" or
"priv-emul.c" or some such?

> Note that read_descriptor is duplicated in emulate.c. The duplication
> will be gone once all emulation code is moved.

That's not very desirable; we can only hope to have things
committed together then? However, together with the naming
issue above quite likely the function will want to become non-
static anyway, so perhaps this should then be done right away
instead of cloning it.

> Also move gpr_switch.S to pv/ because the code in that file is only
> used by privilege instruction emulation.
> 
> No functional change.

For this size of a change this is too weak a statement for my taste:
I don't really mean to review the 1.5k of lines you move, so I'd hope
for a statement clarifying that perhaps other than formatting the
code is being moved unchanged.

> +int emulate_privileged_op(struct cpu_user_regs *regs)

Does this perhaps want to gain a pv_ prefix?

> +#ifdef CONFIG_PV
> +
> +#include <public/xen.h>
> +
> +int emulate_privileged_op(struct cpu_user_regs *regs);
> +
> +#else  /* !CONFIG_PV */
> +
> +#include <xen/errno.h>
> +
> +int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }

The function does not return -E... values.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 02/22] x86/traps: move gate op emulation code
  2017-05-18 17:09 ` [PATCH for-next v3 02/22] x86/traps: move gate op " Wei Liu
@ 2017-05-29 15:15   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:15 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> ---
>  xen/arch/x86/pv/emulate.c      | 403 +++++++++++++++++++++++++++++++++++++

Along the lines of comments to patch 1, gate-op.c?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp
  2017-05-18 17:09 ` [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp Wei Liu
@ 2017-05-29 15:18   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> ---
>  xen/arch/x86/pv/emulate.c      | 20 ++++++++++++++++++++

And this one, together with emulate_forced_invalid_op(), perhaps
invalid.c?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op
  2017-05-18 17:09 ` [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op Wei Liu
@ 2017-05-29 15:19   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:19 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> And remove the now unused instruction_done in x86/traps.c.

Hmm, that's another candidate for moving early and making non-static
then, I guess.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 05/22] x86/pv: clean up emulate.c
  2017-05-18 17:09 ` [PATCH for-next v3 05/22] x86/pv: clean up emulate.c Wei Liu
@ 2017-05-29 15:37   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> Fix coding style issues. Replace bool_t with bool. Add spaces around
> binary ops.

For these it would probably be fine to do them while moving the code.
But you did the extra work to put this into a separate patch, so I'm
not going to object to this approach.

> @@ -209,43 +209,45 @@ static int guest_io_okay(
>              /* fallthrough */
>          case 0:  break;
>          }
> -        TOGGLE_MODE();
>  
> -        if ( (x.mask & (((1<<bytes)-1) << (port&7))) == 0 )
> -            return 1;
> +        if ( user_mode )
> +            toggle_guest_mode(v);
> +
> +        if ( (x.mask & (((1 << bytes)-1) << (port & 7))) == 0 )

You've caught the << and & here, but missed the - .

> @@ -369,7 +372,7 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
>          }
>  
>          if ( (start < (port + len)) && ((start + width) > port) )
> -            match |= 1 << i;
> +            match |= 1u << i;

Ah, I guess that's what "Use unsigned integer for shifting" refers to.
The wording first made me assume you talk about the shift count...

> @@ -921,11 +925,11 @@ static int priv_op_read_msr(unsigned int reg, uint64_t *val,
>          *val = curr->arch.pv_vcpu.gs_base_user;
>          return X86EMUL_OKAY;
>  
> -    /*
> -     * In order to fully retain original behavior, defer calling
> -     * pv_soft_rdtsc() until after emulation. This may want/need to be
> -     * reconsidered.
> -     */
> +        /*
> +         * In order to fully retain original behavior, defer calling
> +         * pv_soft_rdtsc() until after emulation. This may want/need to be
> +         * reconsidered.
> +         */
>      case MSR_IA32_TSC:
>          poc->tsc |= TSC_BASE;
>          goto normal;

This comment was intentionally indented that way - the deeper
indentation would imo only be suitable if it followed the case label.

> @@ -1745,17 +1748,17 @@ void emulate_gate_op(struct cpu_user_regs *regs)
>      {
>          unsigned int ss, esp, *stkp;
>          int rc;
> -#define push(item) do \
> -        { \
> -            --stkp; \
> -            esp -= 4; \
> -            rc = __put_user(item, stkp); \
> -            if ( rc ) \
> -            { \
> -                pv_inject_page_fault(PFEC_write_access, \
> -                                     (unsigned long)(stkp + 1) - rc); \
> -                return; \
> -            } \
> +#define push(item) do                                                   \
> +        {                                                               \
> +            --stkp;                                                     \
> +            esp -= 4;                                                   \
> +            rc = __put_user(item, stkp);                                \
> +            if ( rc )                                                   \
> +            {                                                           \
> +                pv_inject_page_fault(PFEC_write_access,                 \
> +                                     (unsigned long)(stkp + 1) - rc);   \
> +                return;                                                 \
> +            }                                                           \
>          } while ( 0 )

I know it's a matter of taste, and I could imagine others having
differing opinions, but I dislike this moving out to the far right of
the backslashes. In particular it makes later adding a line longer
than all current ones ugly - either one has to touch all other lines
or one needs to accept the one new line standing out.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-18 17:09 ` [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
@ 2017-05-29 15:40   ` Jan Beulich
  2017-05-30 17:40     ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:40 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> The following handlers are moved:
> 1. do_set_trap_table

This one makes sense to move to pv/traps.c, but ...

> 2. do_set_debugreg
> 3. do_get_debugreg
> 4. do_fpu_taskswitch

... none of these do. I could see them go into pv/hypercall.c,
but I could also see that file dealing intentionally only with
everything hypercall related except individual handlers. Andrew,
do you have any opinion or thoughts here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 07/22] x86/traps: move pv_inject_event to pv/traps.c
  2017-05-18 17:09 ` [PATCH for-next v3 07/22] x86/traps: move pv_inject_event " Wei Liu
@ 2017-05-29 15:42   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:

> @@ -128,6 +132,75 @@ unsigned long do_get_debugreg(int reg)
>      return -EINVAL;
>  }
>  
> +void pv_inject_event(const struct x86_event *event)
> +{
> +    struct vcpu *v = current;

Could you take the opportunity and name this curr?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce
  2017-05-18 17:09 ` [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
@ 2017-05-29 15:43   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:43 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> +/*
> + * Called from asm to set up the MCE trapbounce info.
> + * Returns 0 if no callback is set up, else 1.
> + */

The comments suggest both of them want to actually return bool.

> +int set_guest_machinecheck_trapbounce(void)
> +{
> +    struct vcpu *v = current;

Again "curr" please if at all possible.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap
  2017-05-18 17:09 ` [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap Wei Liu
@ 2017-05-29 15:49   ` Jan Beulich
  2017-05-31 11:35     ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> It needs to be non-static when we split PV specific code out.

I don't follow here: The variable is used by send_guest_trap() to
communicate with its helper nmi_mce_softirq(). How would you
move one without the other? The main cleanup potential I see
here would be for the struct softirq_trap to be moved from the
header to the C file (as it's used in only one of them).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c
  2017-05-18 17:09 ` [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
@ 2017-05-29 15:54   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/pv/traps.c
> +++ b/xen/arch/x86/pv/traps.c
> @@ -264,6 +264,24 @@ long unregister_guest_nmi_callback(void)
>      return 0;
>  }
>  
> +int guest_has_trap_callback(struct domain *d, uint16_t vcpuid,

bool and all pointers (including the local variables) can be const
afaict (albeit I question the value of both of the local variables,
as each is being used just once). And let's please avoid uint16_t
here when it doesn't really need to be other than unsigned int.

> +                            unsigned int trap_nr)
> +{
> +    struct vcpu *v;
> +    struct trap_info *t;
> +
> +    BUG_ON(d == NULL);
> +    BUG_ON(vcpuid >= d->max_vcpus);
> +
> +    /* Sanity check - XXX should be more fine grained. */
> +    BUG_ON(trap_nr >= NR_VECTORS);
> +
> +    v = d->vcpu[vcpuid];
> +    t = &v->arch.pv_vcpu.trap_ctxt[trap_nr];
> +
> +    return (t->address != 0);

With the return type being bool, the != 0 (and the already
pointless parentheses) could then be dropped too.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 12/22] x86/traps: move send_guest_trap to pv/traps.c
  2017-05-18 17:09 ` [PATCH for-next v3 12/22] x86/traps: move send_guest_trap " Wei Liu
@ 2017-05-29 15:55   ` Jan Beulich
  2017-06-05 17:08     ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 15:55 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:

As said on patch 10(?), this shouldn't be moved alone. And whether
we want to move it in the first place depends on what the PVH
plans here are.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode
  2017-05-18 17:09 ` [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode Wei Liu
@ 2017-05-29 16:05   ` Jan Beulich
  2017-05-30 17:47     ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:05 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> Move from x86_64/traps.c to pv/traps.c.

This again doesn't necessarily fit into traps.c (but it's an option).
Perhaps we want some misc.c or util.c?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c
  2017-05-18 17:09 ` [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c Wei Liu
@ 2017-05-29 16:07   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

Albeit putting it into its own file (iret.c) would also be an option. Not
sure how fine grained we want things to be.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap
  2017-05-18 17:09 ` [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap Wei Liu
@ 2017-05-29 16:07   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 16/22] x86/traps: move callback_op code
  2017-05-18 17:09 ` [PATCH for-next v3 16/22] x86/traps: move callback_op code Wei Liu
@ 2017-05-29 16:09   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:09 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/pv/traps.c     | 148 +++++++++++++++++++++++++++++++++++++++++++

callback.c ?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel
  2017-05-18 17:09 ` [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel Wei Liu
@ 2017-05-29 16:10   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:10 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> And export it via pv/domain.h.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/pv/traps.c         | 36 ++++++++++++++++++++++++++++++++++++

I'm pretty certain hypercall.c would be a better fit here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c
  2017-05-18 17:10 ` [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
@ 2017-05-29 16:12   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:12 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:10, <wei.liu2@citrix.com> wrote:
> Export hypercall_page_initialise_ring1_kernel as the code is moved.

This one again wants to go into hypercall.c. compat_iret() wants to go
next to do_iret() (wherever that ends up), and similarly I think other
functions would better go next to their non-compat counterparts
instead of creating a compat section in pv/traps.c.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 19/22] x86: clean up pv/traps.c
  2017-05-18 17:10 ` [PATCH for-next v3 19/22] x86: clean up pv/traps.c Wei Liu
@ 2017-05-29 16:18   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:18 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:10, <wei.liu2@citrix.com> wrote:
> Fix coding style issues. No functional change.

Same general comment as on the earlier cleanup patch.

> @@ -318,17 +325,19 @@ int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr)
>              return -EBUSY;
>  
>          /* We are called by the machine check (exception or polling) handlers
> -         * on the physical CPU that reported a machine check error. */
> +         * on the physical CPU that reported a machine check error.
> +         */

Please correct the comment as a whole, not just one of its ends.

> @@ -448,27 +458,31 @@ static long register_guest_callback(struct 
> callback_register *reg)
>      switch ( reg->type )
>      {
>      case CALLBACKTYPE_event:
> -        v->arch.pv_vcpu.event_callback_eip    = reg->address;
> +        v->arch.pv_vcpu.event_callback_eip = reg->address;
>          break;
>  
>      case CALLBACKTYPE_failsafe:
>          v->arch.pv_vcpu.failsafe_callback_eip = reg->address;
> +
>          if ( reg->flags & CALLBACKF_mask_events )
>              set_bit(_VGCF_failsafe_disables_events,
>                      &v->arch.vgc_flags);
>          else
>              clear_bit(_VGCF_failsafe_disables_events,
>                        &v->arch.vgc_flags);
> +
>          break;
>  
>      case CALLBACKTYPE_syscall:
>          v->arch.pv_vcpu.syscall_callback_eip  = reg->address;
> +
>          if ( reg->flags & CALLBACKF_mask_events )
>              set_bit(_VGCF_syscall_disables_events,
>                      &v->arch.vgc_flags);
>          else
>              clear_bit(_VGCF_syscall_disables_events,
>                        &v->arch.vgc_flags);
> +
>          break;

Some of the earlier additions of blank lines were already
questionable imo, but especially the ones you add here before the
break statements go to far afaic.

> @@ -674,13 +688,16 @@ void compat_show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs,
>          printk(" %08x", addr);
>          stack++;
>      }
> +
>      if ( mask == PAGE_SIZE )
>      {
>          BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
>          unmap_domain_page(stack);
>      }
> +
>      if ( i == 0 )
>          printk("Stack empty.");
> +
>      printk("\n");
>  }

Same for at least the last one here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 22/22] x86: clean up traps.c
  2017-05-18 17:10 ` [PATCH for-next v3 22/22] x86: clean up traps.c Wei Liu
@ 2017-05-29 16:21   ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-29 16:21 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 18.05.17 at 19:10, <wei.liu2@citrix.com> wrote:
> Replace bool_t with bool. Delete trailing white spaces. Fix some coding
> style issues.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-29 15:14     ` Jan Beulich
@ 2017-05-30 17:27       ` Wei Liu
  2017-05-30 17:30         ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-30 17:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
> >>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
> > From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
> > From: Wei Liu <wei.liu2@citrix.com>
> > Date: Thu, 18 May 2017 16:18:56 +0100
> > Subject: [PATCH] x86/traps: move privilege instruction emulation code
> 
> privileged
> 
> > Move relevant code to pv/emulate.c. Export emulate_privileged_op in
> > pv/traps.h.
> 
> A name of "emulate.c" sounds like a container for all sorts of cruft.
> I'd prefer if we could use the opportunity of this re-org to see about
> not having overly large files. Therefore e.g. "emul-priv.c" or
> "priv-emul.c" or some such?
> 

I think this is a fine idea.

> > Note that read_descriptor is duplicated in emulate.c. The duplication
> > will be gone once all emulation code is moved.
> 
> That's not very desirable; we can only hope to have things
> committed together then? However, together with the naming
> issue above quite likely the function will want to become non-
> static anyway, so perhaps this should then be done right away
> instead of cloning it.
> 

Yes, I can do that.

> > Also move gpr_switch.S to pv/ because the code in that file is only
> > used by privilege instruction emulation.
> > 
> > No functional change.
> 
> For this size of a change this is too weak a statement for my taste:
> I don't really mean to review the 1.5k of lines you move, so I'd hope
> for a statement clarifying that perhaps other than formatting the
> code is being moved unchanged.
> 
> > +int emulate_privileged_op(struct cpu_user_regs *regs)
> 
> Does this perhaps want to gain a pv_ prefix?

OK.

> 
> > +#ifdef CONFIG_PV
> > +
> > +#include <public/xen.h>
> > +
> > +int emulate_privileged_op(struct cpu_user_regs *regs);
> > +
> > +#else  /* !CONFIG_PV */
> > +
> > +#include <xen/errno.h>
> > +
> > +int emulate_privileged_op(struct cpu_user_regs *regs) { return -EOPNOTSUPP; }
> 
> The function does not return -E... values.
> 

Right. Not sure why I missed this one. Later patches used 0 to match the
non-stub functions.

Wei.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-30 17:27       ` Wei Liu
@ 2017-05-30 17:30         ` Andrew Cooper
  2017-05-31  5:55           ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2017-05-30 17:30 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich; +Cc: Xen-devel

On 30/05/17 18:27, Wei Liu wrote:
> On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
>>>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
>>> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
>>> From: Wei Liu <wei.liu2@citrix.com>
>>> Date: Thu, 18 May 2017 16:18:56 +0100
>>> Subject: [PATCH] x86/traps: move privilege instruction emulation code
>> privileged
>>
>>> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
>>> pv/traps.h.
>> A name of "emulate.c" sounds like a container for all sorts of cruft.
>> I'd prefer if we could use the opportunity of this re-org to see about
>> not having overly large files. Therefore e.g. "emul-priv.c" or
>> "priv-emul.c" or some such?
>>
> I think this is a fine idea.

If we are doing this, I'd recommend emul-$FOO.c to avoid having
emulation files named invalid.c (which isn't obvious as to its purpose)
as suggested in patch 3.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-29 15:40   ` Jan Beulich
@ 2017-05-30 17:40     ` Andrew Cooper
  2017-05-31  5:59       ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2017-05-30 17:40 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu; +Cc: Xen-devel

On 29/05/17 16:40, Jan Beulich wrote:
>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>> The following handlers are moved:
>> 1. do_set_trap_table
> This one makes sense to move to pv/traps.c, but ...
>
>> 2. do_set_debugreg
>> 3. do_get_debugreg
>> 4. do_fpu_taskswitch
> ... none of these do. I could see them go into pv/hypercall.c,
> but I could also see that file dealing intentionally only with
> everything hypercall related except individual handlers. Andrew,
> do you have any opinion or thoughts here?

Despite its name, traps.c deals with mostly low level exception
handling, so I am not completely convinced that do_set_trap_table()
would logically live in traps.c

I'd also prefer not to mix these into hypercall.c.  The best I can
suggest is pv/domain.c, but even that isn't great.

Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
suitable name either.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode
  2017-05-29 16:05   ` Jan Beulich
@ 2017-05-30 17:47     ` Andrew Cooper
  2017-05-31  6:00       ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2017-05-30 17:47 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu; +Cc: Xen-devel

On 29/05/17 17:05, Jan Beulich wrote:
>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>> Move from x86_64/traps.c to pv/traps.c.
> This again doesn't necessarily fit into traps.c (but it's an option).
> Perhaps we want some misc.c or util.c?

Or pv/domain.c, as this does logically fit with the context switching theme?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-30 17:30         ` Andrew Cooper
@ 2017-05-31  5:55           ` Jan Beulich
  2017-05-31 11:01             ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-31  5:55 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu; +Cc: Xen-devel

>>> On 30.05.17 at 19:30, <andrew.cooper3@citrix.com> wrote:
> On 30/05/17 18:27, Wei Liu wrote:
>> On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
>>>>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
>>>> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>> Date: Thu, 18 May 2017 16:18:56 +0100
>>>> Subject: [PATCH] x86/traps: move privilege instruction emulation code
>>> privileged
>>>
>>>> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
>>>> pv/traps.h.
>>> A name of "emulate.c" sounds like a container for all sorts of cruft.
>>> I'd prefer if we could use the opportunity of this re-org to see about
>>> not having overly large files. Therefore e.g. "emul-priv.c" or
>>> "priv-emul.c" or some such?
>>>
>> I think this is a fine idea.
> 
> If we are doing this, I'd recommend emul-$FOO.c to avoid having
> emulation files named invalid.c (which isn't obvious as to its purpose)
> as suggested in patch 3.

Agreed - I didn't like the bare "invalid.c" very much either, but
I also wasn't entirely happy with the longish "emul-invalid.c".
Perhaps "emul-inval.c" then?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-30 17:40     ` Andrew Cooper
@ 2017-05-31  5:59       ` Jan Beulich
  2017-05-31 11:14         ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-31  5:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu

>>> On 30.05.17 at 19:40, <andrew.cooper3@citrix.com> wrote:
> On 29/05/17 16:40, Jan Beulich wrote:
>>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>>> The following handlers are moved:
>>> 1. do_set_trap_table
>> This one makes sense to move to pv/traps.c, but ...
>>
>>> 2. do_set_debugreg
>>> 3. do_get_debugreg
>>> 4. do_fpu_taskswitch
>> ... none of these do. I could see them go into pv/hypercall.c,
>> but I could also see that file dealing intentionally only with
>> everything hypercall related except individual handlers. Andrew,
>> do you have any opinion or thoughts here?
> 
> Despite its name, traps.c deals with mostly low level exception
> handling, so I am not completely convinced that do_set_trap_table()
> would logically live in traps.c

I can see this being the case for traps.c, but pv/traps.c? There's
not much _low level_ exception handling that's PV-specific. But I
certainly don't mind such relatively small hypercall handlers to be
lumped together into some other file, ...

> I'd also prefer not to mix these into hypercall.c.  The best I can
> suggest is pv/domain.c, but even that isn't great.
> 
> Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
> suitable name either.

... be it this name or some other one (if we can think of a better
alternative). Thinking of it: The currently present file is named
"hypercall.c" - how about "hypercalls.c"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode
  2017-05-30 17:47     ` Andrew Cooper
@ 2017-05-31  6:00       ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-31  6:00 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu; +Cc: Xen-devel

>>> On 30.05.17 at 19:47, <andrew.cooper3@citrix.com> wrote:
> On 29/05/17 17:05, Jan Beulich wrote:
>>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>>> Move from x86_64/traps.c to pv/traps.c.
>> This again doesn't necessarily fit into traps.c (but it's an option).
>> Perhaps we want some misc.c or util.c?
> 
> Or pv/domain.c, as this does logically fit with the context switching theme?

Good point.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-31  5:55           ` Jan Beulich
@ 2017-05-31 11:01             ` Wei Liu
  2017-05-31 11:05               ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-31 11:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Tue, May 30, 2017 at 11:55:15PM -0600, Jan Beulich wrote:
> >>> On 30.05.17 at 19:30, <andrew.cooper3@citrix.com> wrote:
> > On 30/05/17 18:27, Wei Liu wrote:
> >> On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
> >>>>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
> >>>> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
> >>>> From: Wei Liu <wei.liu2@citrix.com>
> >>>> Date: Thu, 18 May 2017 16:18:56 +0100
> >>>> Subject: [PATCH] x86/traps: move privilege instruction emulation code
> >>> privileged
> >>>
> >>>> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
> >>>> pv/traps.h.
> >>> A name of "emulate.c" sounds like a container for all sorts of cruft.
> >>> I'd prefer if we could use the opportunity of this re-org to see about
> >>> not having overly large files. Therefore e.g. "emul-priv.c" or
> >>> "priv-emul.c" or some such?
> >>>
> >> I think this is a fine idea.
> > 
> > If we are doing this, I'd recommend emul-$FOO.c to avoid having
> > emulation files named invalid.c (which isn't obvious as to its purpose)
> > as suggested in patch 3.
> 
> Agreed - I didn't like the bare "invalid.c" very much either, but
> I also wasn't entirely happy with the longish "emul-invalid.c".
> Perhaps "emul-inval.c" then?
> 

No objection from me.

To recap, I'm going to break emulation code into three files:

* emul-inval.c
* emul-gate-op.c
* emul-priv.c

The exported functions are going to be prefixed with pv_.

The shared functions use by those three files are going to be in
emul-common.c (?). They are going to be prefixed pv_emulate_ and
exported via a local header file emul-common.h (?).

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-31 11:01             ` Wei Liu
@ 2017-05-31 11:05               ` Andrew Cooper
  2017-05-31 11:36                 ` Wei Liu
  2017-05-31 11:43                 ` Jan Beulich
  0 siblings, 2 replies; 65+ messages in thread
From: Andrew Cooper @ 2017-05-31 11:05 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich; +Cc: Xen-devel

On 31/05/17 12:01, Wei Liu wrote:
> On Tue, May 30, 2017 at 11:55:15PM -0600, Jan Beulich wrote:
>>>>> On 30.05.17 at 19:30, <andrew.cooper3@citrix.com> wrote:
>>> On 30/05/17 18:27, Wei Liu wrote:
>>>> On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
>>>>>>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
>>>>>> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
>>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>> Date: Thu, 18 May 2017 16:18:56 +0100
>>>>>> Subject: [PATCH] x86/traps: move privilege instruction emulation code
>>>>> privileged
>>>>>
>>>>>> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
>>>>>> pv/traps.h.
>>>>> A name of "emulate.c" sounds like a container for all sorts of cruft.
>>>>> I'd prefer if we could use the opportunity of this re-org to see about
>>>>> not having overly large files. Therefore e.g. "emul-priv.c" or
>>>>> "priv-emul.c" or some such?
>>>>>
>>>> I think this is a fine idea.
>>> If we are doing this, I'd recommend emul-$FOO.c to avoid having
>>> emulation files named invalid.c (which isn't obvious as to its purpose)
>>> as suggested in patch 3.
>> Agreed - I didn't like the bare "invalid.c" very much either, but
>> I also wasn't entirely happy with the longish "emul-invalid.c".
>> Perhaps "emul-inval.c" then?
>>
> No objection from me.
>
> To recap, I'm going to break emulation code into three files:
>
> * emul-inval.c

emul-invalid-op.c ?  emul-inval is still an unhelpful name to describe
its purpose.

> * emul-gate-op.c
> * emul-priv.c

emul-priv-op.c, for consistency?

>
> The exported functions are going to be prefixed with pv_.
>
> The shared functions use by those three files are going to be in
> emul-common.c (?). They are going to be prefixed pv_emulate_ and
> exported via a local header file emul-common.h (?).

+1

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-31  5:59       ` Jan Beulich
@ 2017-05-31 11:14         ` Wei Liu
  2017-05-31 11:45           ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-31 11:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Tue, May 30, 2017 at 11:59:31PM -0600, Jan Beulich wrote:
> >>> On 30.05.17 at 19:40, <andrew.cooper3@citrix.com> wrote:
> > On 29/05/17 16:40, Jan Beulich wrote:
> >>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> >>> The following handlers are moved:
> >>> 1. do_set_trap_table
> >> This one makes sense to move to pv/traps.c, but ...
> >>
> >>> 2. do_set_debugreg
> >>> 3. do_get_debugreg
> >>> 4. do_fpu_taskswitch
> >> ... none of these do. I could see them go into pv/hypercall.c,
> >> but I could also see that file dealing intentionally only with
> >> everything hypercall related except individual handlers. Andrew,
> >> do you have any opinion or thoughts here?
> > 
> > Despite its name, traps.c deals with mostly low level exception
> > handling, so I am not completely convinced that do_set_trap_table()
> > would logically live in traps.c
> 
> I can see this being the case for traps.c, but pv/traps.c? There's
> not much _low level_ exception handling that's PV-specific. But I
> certainly don't mind such relatively small hypercall handlers to be
> lumped together into some other file, ...
> 
> > I'd also prefer not to mix these into hypercall.c.  The best I can
> > suggest is pv/domain.c, but even that isn't great.
> > 
> > Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
> > suitable name either.
> 
> ... be it this name or some other one (if we can think of a better
> alternative). Thinking of it: The currently present file is named
> "hypercall.c" - how about "hypercalls.c"?
> 

Well I don't think moving them into hypercall(s).c is nice either.

Since you expressed the idea of using iret.c for do_iret, maybe we can
use debugreg.c and fpu_taskswitch.c ?

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap
  2017-05-29 15:49   ` Jan Beulich
@ 2017-05-31 11:35     ` Wei Liu
  2017-05-31 11:46       ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-05-31 11:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Mon, May 29, 2017 at 09:49:47AM -0600, Jan Beulich wrote:
> >>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> > It needs to be non-static when we split PV specific code out.
> 
> I don't follow here: The variable is used by send_guest_trap() to
> communicate with its helper nmi_mce_softirq(). How would you
> move one without the other?

I chose to not move nmi_mce_softirq, which is needed by trap_init, hence
this patch.

Your suggestion will require nmi_mce_softirq to be exported via a header
(and possibly prefix with pv_). I'm fine with that, too.

> The main cleanup potential I see
> here would be for the struct softirq_trap to be moved from the
> header to the C file (as it's used in only one of them).
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-31 11:05               ` Andrew Cooper
@ 2017-05-31 11:36                 ` Wei Liu
  2017-05-31 11:43                 ` Jan Beulich
  1 sibling, 0 replies; 65+ messages in thread
From: Wei Liu @ 2017-05-31 11:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Jan Beulich

On Wed, May 31, 2017 at 12:05:26PM +0100, Andrew Cooper wrote:
> On 31/05/17 12:01, Wei Liu wrote:
> > On Tue, May 30, 2017 at 11:55:15PM -0600, Jan Beulich wrote:
> >>>>> On 30.05.17 at 19:30, <andrew.cooper3@citrix.com> wrote:
> >>> On 30/05/17 18:27, Wei Liu wrote:
> >>>> On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
> >>>>>>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
> >>>>>> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
> >>>>>> From: Wei Liu <wei.liu2@citrix.com>
> >>>>>> Date: Thu, 18 May 2017 16:18:56 +0100
> >>>>>> Subject: [PATCH] x86/traps: move privilege instruction emulation code
> >>>>> privileged
> >>>>>
> >>>>>> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
> >>>>>> pv/traps.h.
> >>>>> A name of "emulate.c" sounds like a container for all sorts of cruft.
> >>>>> I'd prefer if we could use the opportunity of this re-org to see about
> >>>>> not having overly large files. Therefore e.g. "emul-priv.c" or
> >>>>> "priv-emul.c" or some such?
> >>>>>
> >>>> I think this is a fine idea.
> >>> If we are doing this, I'd recommend emul-$FOO.c to avoid having
> >>> emulation files named invalid.c (which isn't obvious as to its purpose)
> >>> as suggested in patch 3.
> >> Agreed - I didn't like the bare "invalid.c" very much either, but
> >> I also wasn't entirely happy with the longish "emul-invalid.c".
> >> Perhaps "emul-inval.c" then?
> >>
> > No objection from me.
> >
> > To recap, I'm going to break emulation code into three files:
> >
> > * emul-inval.c
> 
> emul-invalid-op.c ?  emul-inval is still an unhelpful name to describe
> its purpose.
> 
> > * emul-gate-op.c
> > * emul-priv.c
> 
> emul-priv-op.c, for consistency?
> 

These names are reasonable. If I don't hear further objection I'm going
to use them.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code
  2017-05-31 11:05               ` Andrew Cooper
  2017-05-31 11:36                 ` Wei Liu
@ 2017-05-31 11:43                 ` Jan Beulich
  1 sibling, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-05-31 11:43 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu; +Cc: Xen-devel

>>> On 31.05.17 at 13:05, <andrew.cooper3@citrix.com> wrote:
> On 31/05/17 12:01, Wei Liu wrote:
>> On Tue, May 30, 2017 at 11:55:15PM -0600, Jan Beulich wrote:
>>>>>> On 30.05.17 at 19:30, <andrew.cooper3@citrix.com> wrote:
>>>> On 30/05/17 18:27, Wei Liu wrote:
>>>>> On Mon, May 29, 2017 at 09:14:07AM -0600, Jan Beulich wrote:
>>>>>>>>> On 18.05.17 at 19:28, <wei.liu2@citrix.com> wrote:
>>>>>>> From 58df816b937dc7a3598de01f053a6030e631057e Mon Sep 17 00:00:00 2001
>>>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>>> Date: Thu, 18 May 2017 16:18:56 +0100
>>>>>>> Subject: [PATCH] x86/traps: move privilege instruction emulation code
>>>>>> privileged
>>>>>>
>>>>>>> Move relevant code to pv/emulate.c. Export emulate_privileged_op in
>>>>>>> pv/traps.h.
>>>>>> A name of "emulate.c" sounds like a container for all sorts of cruft.
>>>>>> I'd prefer if we could use the opportunity of this re-org to see about
>>>>>> not having overly large files. Therefore e.g. "emul-priv.c" or
>>>>>> "priv-emul.c" or some such?
>>>>>>
>>>>> I think this is a fine idea.
>>>> If we are doing this, I'd recommend emul-$FOO.c to avoid having
>>>> emulation files named invalid.c (which isn't obvious as to its purpose)
>>>> as suggested in patch 3.
>>> Agreed - I didn't like the bare "invalid.c" very much either, but
>>> I also wasn't entirely happy with the longish "emul-invalid.c".
>>> Perhaps "emul-inval.c" then?
>>>
>> No objection from me.
>>
>> To recap, I'm going to break emulation code into three files:
>>
>> * emul-inval.c
> 
> emul-invalid-op.c ?  emul-inval is still an unhelpful name to describe
> its purpose.
> 
>> * emul-gate-op.c
>> * emul-priv.c
> 
> emul-priv-op.c, for consistency?

And then above perhaps emul-inv-op.c?

>> The exported functions are going to be prefixed with pv_.
>>
>> The shared functions use by those three files are going to be in
>> emul-common.c (?). They are going to be prefixed pv_emulate_ and

pv_emul_* ?

>> exported via a local header file emul-common.h (?).

Or simply emulate.[ch]?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-31 11:14         ` Wei Liu
@ 2017-05-31 11:45           ` Jan Beulich
  2017-06-02 11:01             ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-31 11:45 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 31.05.17 at 13:14, <wei.liu2@citrix.com> wrote:
> On Tue, May 30, 2017 at 11:59:31PM -0600, Jan Beulich wrote:
>> >>> On 30.05.17 at 19:40, <andrew.cooper3@citrix.com> wrote:
>> > On 29/05/17 16:40, Jan Beulich wrote:
>> >>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>> >>> The following handlers are moved:
>> >>> 1. do_set_trap_table
>> >> This one makes sense to move to pv/traps.c, but ...
>> >>
>> >>> 2. do_set_debugreg
>> >>> 3. do_get_debugreg
>> >>> 4. do_fpu_taskswitch
>> >> ... none of these do. I could see them go into pv/hypercall.c,
>> >> but I could also see that file dealing intentionally only with
>> >> everything hypercall related except individual handlers. Andrew,
>> >> do you have any opinion or thoughts here?
>> > 
>> > Despite its name, traps.c deals with mostly low level exception
>> > handling, so I am not completely convinced that do_set_trap_table()
>> > would logically live in traps.c
>> 
>> I can see this being the case for traps.c, but pv/traps.c? There's
>> not much _low level_ exception handling that's PV-specific. But I
>> certainly don't mind such relatively small hypercall handlers to be
>> lumped together into some other file, ...
>> 
>> > I'd also prefer not to mix these into hypercall.c.  The best I can
>> > suggest is pv/domain.c, but even that isn't great.
>> > 
>> > Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
>> > suitable name either.
>> 
>> ... be it this name or some other one (if we can think of a better
>> alternative). Thinking of it: The currently present file is named
>> "hypercall.c" - how about "hypercalls.c"?
>> 
> 
> Well I don't think moving them into hypercall(s).c is nice either.
> 
> Since you expressed the idea of using iret.c for do_iret, maybe we can
> use debugreg.c and fpu_taskswitch.c ?

I did consider this too, but some of these are really small, and
while is dislike overly large files, I also don't think files with just
one or two dozens of actual code lines are very useful to have.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap
  2017-05-31 11:35     ` Wei Liu
@ 2017-05-31 11:46       ` Jan Beulich
  2017-05-31 11:54         ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-05-31 11:46 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 31.05.17 at 13:35, <wei.liu2@citrix.com> wrote:
> On Mon, May 29, 2017 at 09:49:47AM -0600, Jan Beulich wrote:
>> >>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>> > It needs to be non-static when we split PV specific code out.
>> 
>> I don't follow here: The variable is used by send_guest_trap() to
>> communicate with its helper nmi_mce_softirq(). How would you
>> move one without the other?
> 
> I chose to not move nmi_mce_softirq, which is needed by trap_init, hence
> this patch.
> 
> Your suggestion will require nmi_mce_softirq to be exported via a header
> (and possibly prefix with pv_). I'm fine with that, too.

No, it's still a private helper. If anything there'll need to be part
of trap_init() split off to e.g. pv_trap_init().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap
  2017-05-31 11:46       ` Jan Beulich
@ 2017-05-31 11:54         ` Wei Liu
  0 siblings, 0 replies; 65+ messages in thread
From: Wei Liu @ 2017-05-31 11:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Wed, May 31, 2017 at 05:46:32AM -0600, Jan Beulich wrote:
> >>> On 31.05.17 at 13:35, <wei.liu2@citrix.com> wrote:
> > On Mon, May 29, 2017 at 09:49:47AM -0600, Jan Beulich wrote:
> >> >>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> >> > It needs to be non-static when we split PV specific code out.
> >> 
> >> I don't follow here: The variable is used by send_guest_trap() to
> >> communicate with its helper nmi_mce_softirq(). How would you
> >> move one without the other?
> > 
> > I chose to not move nmi_mce_softirq, which is needed by trap_init, hence
> > this patch.
> > 
> > Your suggestion will require nmi_mce_softirq to be exported via a header
> > (and possibly prefix with pv_). I'm fine with that, too.
> 
> No, it's still a private helper. If anything there'll need to be part
> of trap_init() split off to e.g. pv_trap_init().
> 

NP. I will see what I can do.

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-05-31 11:45           ` Jan Beulich
@ 2017-06-02 11:01             ` Wei Liu
  2017-06-06  7:36               ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-06-02 11:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Wed, May 31, 2017 at 05:45:14AM -0600, Jan Beulich wrote:
> >>> On 31.05.17 at 13:14, <wei.liu2@citrix.com> wrote:
> > On Tue, May 30, 2017 at 11:59:31PM -0600, Jan Beulich wrote:
> >> >>> On 30.05.17 at 19:40, <andrew.cooper3@citrix.com> wrote:
> >> > On 29/05/17 16:40, Jan Beulich wrote:
> >> >>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> >> >>> The following handlers are moved:
> >> >>> 1. do_set_trap_table
> >> >> This one makes sense to move to pv/traps.c, but ...
> >> >>
> >> >>> 2. do_set_debugreg
> >> >>> 3. do_get_debugreg
> >> >>> 4. do_fpu_taskswitch
> >> >> ... none of these do. I could see them go into pv/hypercall.c,
> >> >> but I could also see that file dealing intentionally only with
> >> >> everything hypercall related except individual handlers. Andrew,
> >> >> do you have any opinion or thoughts here?
> >> > 
> >> > Despite its name, traps.c deals with mostly low level exception
> >> > handling, so I am not completely convinced that do_set_trap_table()
> >> > would logically live in traps.c
> >> 
> >> I can see this being the case for traps.c, but pv/traps.c? There's
> >> not much _low level_ exception handling that's PV-specific. But I
> >> certainly don't mind such relatively small hypercall handlers to be
> >> lumped together into some other file, ...
> >> 
> >> > I'd also prefer not to mix these into hypercall.c.  The best I can
> >> > suggest is pv/domain.c, but even that isn't great.
> >> > 
> >> > Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
> >> > suitable name either.
> >> 
> >> ... be it this name or some other one (if we can think of a better
> >> alternative). Thinking of it: The currently present file is named
> >> "hypercall.c" - how about "hypercalls.c"?
> >> 
> > 
> > Well I don't think moving them into hypercall(s).c is nice either.
> > 
> > Since you expressed the idea of using iret.c for do_iret, maybe we can
> > use debugreg.c and fpu_taskswitch.c ?
> 
> I did consider this too, but some of these are really small, and
> while is dislike overly large files, I also don't think files with just
> one or two dozens of actual code lines are very useful to have.
> 

TBH I don't really see a problem with that approach -- we have clear_page.S,
copy_page.S and gpr_switch.S which don't get lumped together into one
file.

But if you don't like that, I will just put those small handlers
into hypercall.c and change the comment of that file to:

diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
index 7c5e5a629d..b0c9e01268 100644
--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -1,7 +1,7 @@
 /******************************************************************************
  * arch/x86/pv/hypercall.c
  *
- * PV hypercall dispatching routines
+ * PV hypercall dispatching routines and misc hypercalls
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 12/22] x86/traps: move send_guest_trap to pv/traps.c
  2017-05-29 15:55   ` Jan Beulich
@ 2017-06-05 17:08     ` Wei Liu
  2017-06-06  7:37       ` Jan Beulich
  0 siblings, 1 reply; 65+ messages in thread
From: Wei Liu @ 2017-06-05 17:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Xen-devel

On Mon, May 29, 2017 at 09:55:14AM -0600, Jan Beulich wrote:
> >>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
> 
> As said on patch 10(?), this shouldn't be moved alone. And whether
> we want to move it in the first place depends on what the PVH
> plans here are.
> 

What do you want me to do with this patch? I'm inclined to just move it
because it is only used by PV at the moment.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-06-02 11:01             ` Wei Liu
@ 2017-06-06  7:36               ` Jan Beulich
  2017-06-08 11:30                 ` Andrew Cooper
  0 siblings, 1 reply; 65+ messages in thread
From: Jan Beulich @ 2017-06-06  7:36 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu; +Cc: Xen-devel

>>> On 02.06.17 at 13:01, <wei.liu2@citrix.com> wrote:
> On Wed, May 31, 2017 at 05:45:14AM -0600, Jan Beulich wrote:
>> >>> On 31.05.17 at 13:14, <wei.liu2@citrix.com> wrote:
>> > On Tue, May 30, 2017 at 11:59:31PM -0600, Jan Beulich wrote:
>> >> >>> On 30.05.17 at 19:40, <andrew.cooper3@citrix.com> wrote:
>> >> > On 29/05/17 16:40, Jan Beulich wrote:
>> >> >>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>> >> >>> The following handlers are moved:
>> >> >>> 1. do_set_trap_table
>> >> >> This one makes sense to move to pv/traps.c, but ...
>> >> >>
>> >> >>> 2. do_set_debugreg
>> >> >>> 3. do_get_debugreg
>> >> >>> 4. do_fpu_taskswitch
>> >> >> ... none of these do. I could see them go into pv/hypercall.c,
>> >> >> but I could also see that file dealing intentionally only with
>> >> >> everything hypercall related except individual handlers. Andrew,
>> >> >> do you have any opinion or thoughts here?
>> >> > 
>> >> > Despite its name, traps.c deals with mostly low level exception
>> >> > handling, so I am not completely convinced that do_set_trap_table()
>> >> > would logically live in traps.c
>> >> 
>> >> I can see this being the case for traps.c, but pv/traps.c? There's
>> >> not much _low level_ exception handling that's PV-specific. But I
>> >> certainly don't mind such relatively small hypercall handlers to be
>> >> lumped together into some other file, ...
>> >> 
>> >> > I'd also prefer not to mix these into hypercall.c.  The best I can
>> >> > suggest is pv/domain.c, but even that isn't great.
>> >> > 
>> >> > Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
>> >> > suitable name either.
>> >> 
>> >> ... be it this name or some other one (if we can think of a better
>> >> alternative). Thinking of it: The currently present file is named
>> >> "hypercall.c" - how about "hypercalls.c"?
>> >> 
>> > 
>> > Well I don't think moving them into hypercall(s).c is nice either.
>> > 
>> > Since you expressed the idea of using iret.c for do_iret, maybe we can
>> > use debugreg.c and fpu_taskswitch.c ?
>> 
>> I did consider this too, but some of these are really small, and
>> while is dislike overly large files, I also don't think files with just
>> one or two dozens of actual code lines are very useful to have.
>> 
> 
> TBH I don't really see a problem with that approach -- we have clear_page.S,
> copy_page.S and gpr_switch.S which don't get lumped together into one
> file.

I view C files slightly differently from assembly ones in this regard.

> But if you don't like that, I will just put those small handlers
> into hypercall.c and change the comment of that file to:
> 
> diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
> index 7c5e5a629d..b0c9e01268 100644
> --- a/xen/arch/x86/pv/hypercall.c
> +++ b/xen/arch/x86/pv/hypercall.c
> @@ -1,7 +1,7 @@
>  
> /****************************************************************************
> **
>   * arch/x86/pv/hypercall.c
>   *
> - * PV hypercall dispatching routines
> + * PV hypercall dispatching routines and misc hypercalls

As indicated before, I don't really like the idea of specific
hypercall handler being put here. I did suggest hypercalls.c
as a possible place for one which don't fit elsewhere.

Andrew, do you have any specific opinion both on the specific
situation here as well as the broader question of file granularity?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 12/22] x86/traps: move send_guest_trap to pv/traps.c
  2017-06-05 17:08     ` Wei Liu
@ 2017-06-06  7:37       ` Jan Beulich
  0 siblings, 0 replies; 65+ messages in thread
From: Jan Beulich @ 2017-06-06  7:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel

>>> On 05.06.17 at 19:08, <wei.liu2@citrix.com> wrote:
> On Mon, May 29, 2017 at 09:55:14AM -0600, Jan Beulich wrote:
>> >>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>> 
>> As said on patch 10(?), this shouldn't be moved alone. And whether
>> we want to move it in the first place depends on what the PVH
>> plans here are.
>> 
> 
> What do you want me to do with this patch? I'm inclined to just move it
> because it is only used by PV at the moment.

In the absence of any indication on the PVH plans, moving it is
likely the most sensible action at this point.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-06-06  7:36               ` Jan Beulich
@ 2017-06-08 11:30                 ` Andrew Cooper
  2017-06-08 14:28                   ` Wei Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Andrew Cooper @ 2017-06-08 11:30 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu; +Cc: Xen-devel

On 06/06/17 08:36, Jan Beulich wrote:
>>>> On 02.06.17 at 13:01, <wei.liu2@citrix.com> wrote:
>> On Wed, May 31, 2017 at 05:45:14AM -0600, Jan Beulich wrote:
>>>>>> On 31.05.17 at 13:14, <wei.liu2@citrix.com> wrote:
>>>> On Tue, May 30, 2017 at 11:59:31PM -0600, Jan Beulich wrote:
>>>>>>>> On 30.05.17 at 19:40, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 29/05/17 16:40, Jan Beulich wrote:
>>>>>>>>>> On 18.05.17 at 19:09, <wei.liu2@citrix.com> wrote:
>>>>>>>> The following handlers are moved:
>>>>>>>> 1. do_set_trap_table
>>>>>>> This one makes sense to move to pv/traps.c, but ...
>>>>>>>
>>>>>>>> 2. do_set_debugreg
>>>>>>>> 3. do_get_debugreg
>>>>>>>> 4. do_fpu_taskswitch
>>>>>>> ... none of these do. I could see them go into pv/hypercall.c,
>>>>>>> but I could also see that file dealing intentionally only with
>>>>>>> everything hypercall related except individual handlers. Andrew,
>>>>>>> do you have any opinion or thoughts here?
>>>>>> Despite its name, traps.c deals with mostly low level exception
>>>>>> handling, so I am not completely convinced that do_set_trap_table()
>>>>>> would logically live in traps.c
>>>>> I can see this being the case for traps.c, but pv/traps.c? There's
>>>>> not much _low level_ exception handling that's PV-specific. But I
>>>>> certainly don't mind such relatively small hypercall handlers to be
>>>>> lumped together into some other file, ...
>>>>>
>>>>>> I'd also prefer not to mix these into hypercall.c.  The best I can
>>>>>> suggest is pv/domain.c, but even that isn't great.
>>>>>>
>>>>>> Sorry for being unhelpful.  I'm not sure pv/misc-hypercalls.c is a
>>>>>> suitable name either.
>>>>> ... be it this name or some other one (if we can think of a better
>>>>> alternative). Thinking of it: The currently present file is named
>>>>> "hypercall.c" - how about "hypercalls.c"?
>>>>>
>>>> Well I don't think moving them into hypercall(s).c is nice either.
>>>>
>>>> Since you expressed the idea of using iret.c for do_iret, maybe we can
>>>> use debugreg.c and fpu_taskswitch.c ?
>>> I did consider this too, but some of these are really small, and
>>> while is dislike overly large files, I also don't think files with just
>>> one or two dozens of actual code lines are very useful to have.
>>>
>> TBH I don't really see a problem with that approach -- we have clear_page.S,
>> copy_page.S and gpr_switch.S which don't get lumped together into one
>> file.
> I view C files slightly differently from assembly ones in this regard.

Looking at those files, they really should be replaced with something
better.  Steel^W Borrowing the Linux alternatives-based clear/copy
routines would probably be a very good move.

>
>> But if you don't like that, I will just put those small handlers
>> into hypercall.c and change the comment of that file to:
>>
>> diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
>> index 7c5e5a629d..b0c9e01268 100644
>> --- a/xen/arch/x86/pv/hypercall.c
>> +++ b/xen/arch/x86/pv/hypercall.c
>> @@ -1,7 +1,7 @@
>>  
>> /****************************************************************************
>> **
>>   * arch/x86/pv/hypercall.c
>>   *
>> - * PV hypercall dispatching routines
>> + * PV hypercall dispatching routines and misc hypercalls
> As indicated before, I don't really like the idea of specific
> hypercall handler being put here. I did suggest hypercalls.c
> as a possible place for one which don't fit elsewhere.
>
> Andrew, do you have any specific opinion both on the specific
> situation here as well as the broader question of file granularity?

I'd prefer not to have individual files for individual functions; that
is going too far IMO.  I'd also prefer not to mix misc hypercalls into
this file.

pv/misc-hypercalls.c ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c
  2017-06-08 11:30                 ` Andrew Cooper
@ 2017-06-08 14:28                   ` Wei Liu
  0 siblings, 0 replies; 65+ messages in thread
From: Wei Liu @ 2017-06-08 14:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Jan Beulich

On Thu, Jun 08, 2017 at 12:30:02PM +0100, Andrew Cooper wrote:
> I'd prefer not to have individual files for individual functions; that
> is going too far IMO.  I'd also prefer not to mix misc hypercalls into
> this file.
> 
> pv/misc-hypercalls.c ?
> 

Sure this is fine by me.

I will move the *_debugreg and fpu_switch hypercall handlers there.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2017-06-08 14:28 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-18 17:09 [PATCH for-next v3 00/22] x86: refactor trap handling code Wei Liu
2017-05-18 17:09 ` [PATCH for-next v3 01/22] x86/traps: move privilege instruction emulation code Wei Liu
2017-05-18 17:28   ` Wei Liu
2017-05-29 15:14     ` Jan Beulich
2017-05-30 17:27       ` Wei Liu
2017-05-30 17:30         ` Andrew Cooper
2017-05-31  5:55           ` Jan Beulich
2017-05-31 11:01             ` Wei Liu
2017-05-31 11:05               ` Andrew Cooper
2017-05-31 11:36                 ` Wei Liu
2017-05-31 11:43                 ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 02/22] x86/traps: move gate op " Wei Liu
2017-05-29 15:15   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 03/22] x86/traps: move emulate_invalid_rdtscp Wei Liu
2017-05-29 15:18   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 04/22] x86/traps: move emulate_forced_invalid_op Wei Liu
2017-05-29 15:19   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 05/22] x86/pv: clean up emulate.c Wei Liu
2017-05-29 15:37   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 06/22] x86/traps: move PV hypercall handlers to pv/traps.c Wei Liu
2017-05-29 15:40   ` Jan Beulich
2017-05-30 17:40     ` Andrew Cooper
2017-05-31  5:59       ` Jan Beulich
2017-05-31 11:14         ` Wei Liu
2017-05-31 11:45           ` Jan Beulich
2017-06-02 11:01             ` Wei Liu
2017-06-06  7:36               ` Jan Beulich
2017-06-08 11:30                 ` Andrew Cooper
2017-06-08 14:28                   ` Wei Liu
2017-05-18 17:09 ` [PATCH for-next v3 07/22] x86/traps: move pv_inject_event " Wei Liu
2017-05-29 15:42   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 08/22] x86/traps: move set_guest_{machinecheck, nmi}_trapbounce Wei Liu
2017-05-29 15:43   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 09/22] x86/traps: move {un, }register_guest_nmi_callback Wei Liu
2017-05-18 17:09 ` [PATCH for-next v3 10/22] x86/traps: delcare percpu softirq_trap Wei Liu
2017-05-29 15:49   ` Jan Beulich
2017-05-31 11:35     ` Wei Liu
2017-05-31 11:46       ` Jan Beulich
2017-05-31 11:54         ` Wei Liu
2017-05-18 17:09 ` [PATCH for-next v3 11/22] x86/traps: move guest_has_trap_callback to pv/traps.c Wei Liu
2017-05-29 15:54   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 12/22] x86/traps: move send_guest_trap " Wei Liu
2017-05-29 15:55   ` Jan Beulich
2017-06-05 17:08     ` Wei Liu
2017-06-06  7:37       ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 13/22] x86/traps: move toggle_guest_mode Wei Liu
2017-05-29 16:05   ` Jan Beulich
2017-05-30 17:47     ` Andrew Cooper
2017-05-31  6:00       ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 14/22] x86/traps: move do_iret to pv/traps.c Wei Liu
2017-05-29 16:07   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 15/22] x86/traps: move init_int80_direct_trap Wei Liu
2017-05-29 16:07   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 16/22] x86/traps: move callback_op code Wei Liu
2017-05-29 16:09   ` Jan Beulich
2017-05-18 17:09 ` [PATCH for-next v3 17/22] x86/traps: move hypercall_page_initialise_ring3_kernel Wei Liu
2017-05-29 16:10   ` Jan Beulich
2017-05-18 17:10 ` [PATCH for-next v3 18/22] x86/traps: merge x86_64/compat/traps.c into pv/traps.c Wei Liu
2017-05-29 16:12   ` Jan Beulich
2017-05-18 17:10 ` [PATCH for-next v3 19/22] x86: clean up pv/traps.c Wei Liu
2017-05-29 16:18   ` Jan Beulich
2017-05-18 17:10 ` [PATCH for-next v3 20/22] x86: guest_has_trap_callback should return bool Wei Liu
2017-05-18 17:10 ` [PATCH for-next v3 21/22] x86: fix coding style issues in asm-x86/traps.h Wei Liu
2017-05-18 17:10 ` [PATCH for-next v3 22/22] x86: clean up traps.c Wei Liu
2017-05-29 16:21   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.