All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/17] x86/PMU: Xen PMU PV support
@ 2014-01-21 19:08 Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
                   ` (16 more replies)
  0 siblings, 17 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Here is the fourth version PV (and now PVH) PMU patches.

The following patch series adds PMU support in Xen for PV(H)
guests. There is a companion patchset for Linux kernel. In addition,
another set of changes will be provided (later) for userland perf
code.

This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that have running dom0 VCPUs
on them.
* No backtrace support.
* Will fail to load under XSM: we ran out of bits in permissions vector and
this needs to be fixed separately


A few notes that may help reviewing: 

* A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor 
CPU is used for passing registers' values as well as PMU state at the time of
PMU interrupt.
* PMU interrupts are taken by hypervisor either as NMIs or regular vector
interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
and as virtual interrupts to PV(H) guests
* PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
accesses xenpmu_data_t and flushes it to HW it before returning.
* PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
in addition to 'vpmu' boot option (which is preserved for back compatibility).
The following modes are provided:
  * disable: VPMU is off
  * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
  * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
    in guests is suspended.
* /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
/proc/kallsyms)
* VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
up from hvm subtree


Changes in v4:

* Added support for PVH guests:
  o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10 
  o more careful use of is_hvm_domain
  o Additional patch (16)
* Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
* Fixed dom0's VCPU selection in privileged mode
* Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
  (don't want to expose compat_cpu_user_regs in a public header)
* Renamed public structures by prefixing them with "xen_"
* Added an entry for xenpf_symdata in xlat.lst
* Fixed pv_cpuid check for vpmu-specific cpuid adjustments
* Varios code style fixes
* Eliminated anonymous unions
* Added more verbiage to NMI patch description


Changes in v3:

* Moved PMU MSR banks out from architectural context data structures to allow
for future expansion without protocol changes
* PMU interrupts can be either NMIs or regular vector interrupts (the latter
is the default)
* Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
race conditions with the guest
* Fixed races with PV guest in MSR access handlers
* More Intel VPMU cleanup
* Moved NMI-unsafe code from NMI handler
* Dropped changes to vcpu->is_running
* Added LVTPC apic handling (cached for PV guests)
* Separated privileged profiling mode into a standalone patch
* Separated NMI handling into a standalone patch


Changes in v2:

* Xen symbols are exported as data structure (as opoosed to a set of formatted
strings in v1). Even though one symbol per hypercall is returned performance
appears to be acceptable: reading whole file from dom0 userland takes on average
about twice as long as reading /proc/kallsyms
* More cleanup of Intel VPMU code to simplify publicly exported structures
* There is an architecture-independent and x86-specific public include files (ARM
has a stub)
* General cleanup of public include files to make them more presentable (and
to make auto doc generation better)
* Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
changes to common/schedule.c architecture-independent). Note that this is not
tested since I don't have access to ARM hardware.
* PCPU ID of interrupted processor is now passed to PV guest


Boris Ostrovsky (17):
  common/symbols: Export hypervisor symbols to privileged guest
  x86/VPMU: Stop AMD counters when called from vpmu_save_force()
  x86/VPMU: Minor VPMU cleanup
  intel/VPMU: Clean up Intel VPMU code
  x86/VPMU: Handle APIC_LVTPC accesses
  intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  x86/VPMU: Add public xenpmu.h
  x86/VPMU: Make vpmu not HVM-specific
  x86/VPMU: Interface for setting PMU mode and flags
  x86/VPMU: Initialize PMU for PV guests
  x86/VPMU: Add support for PMU register handling on PV guests
  x86/VPMU: Handle PMU interrupts for PV guests
  x86/VPMU: Add privileged PMU mode
  x86/VPMU: Save VPMU state for PV guests during context switch
  x86/VPMU: NMI-based VPMU support
  x86/VPMU: Suport for PVH guests
  x86/VPMU: Move VPMU files up from hvm/ directory

 xen/arch/x86/Makefile                    |   1 +
 xen/arch/x86/domain.c                    |  18 +-
 xen/arch/x86/hvm/Makefile                |   1 -
 xen/arch/x86/hvm/hvm.c                   |   3 +-
 xen/arch/x86/hvm/svm/Makefile            |   1 -
 xen/arch/x86/hvm/svm/vpmu.c              | 494 ----------------
 xen/arch/x86/hvm/vlapic.c                |   5 +-
 xen/arch/x86/hvm/vmx/Makefile            |   1 -
 xen/arch/x86/hvm/vmx/vmcs.c              |  55 ++
 xen/arch/x86/hvm/vmx/vmx.c               |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 931 ------------------------------
 xen/arch/x86/hvm/vpmu.c                  | 266 ---------
 xen/arch/x86/oprofile/op_model_ppro.c    |   8 +-
 xen/arch/x86/platform_hypercall.c        |  18 +
 xen/arch/x86/traps.c                     |  35 +-
 xen/arch/x86/vpmu.c                      | 671 ++++++++++++++++++++++
 xen/arch/x86/vpmu_amd.c                  | 499 ++++++++++++++++
 xen/arch/x86/vpmu_intel.c                | 936 +++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/entry.S       |   4 +
 xen/arch/x86/x86_64/entry.S              |   4 +
 xen/arch/x86/x86_64/platform_hypercall.c |   2 +
 xen/common/event_channel.c               |   1 +
 xen/common/symbols.c                     |  50 +-
 xen/common/vsprintf.c                    |   2 +-
 xen/include/asm-x86/domain.h             |   2 +
 xen/include/asm-x86/hvm/vcpu.h           |   3 -
 xen/include/asm-x86/hvm/vmx/vmcs.h       |   4 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  51 --
 xen/include/asm-x86/hvm/vpmu.h           | 104 ----
 xen/include/asm-x86/vpmu.h               |  98 ++++
 xen/include/public/arch-x86/xenpmu.h     |  66 +++
 xen/include/public/platform.h            |  19 +
 xen/include/public/xen.h                 |   2 +
 xen/include/public/xenpmu.h              | 102 ++++
 xen/include/xen/hypercall.h              |   4 +
 xen/include/xen/softirq.h                |   1 +
 xen/include/xen/symbols.h                |   7 +-
 xen/include/xlat.lst                     |   1 +
 38 files changed, 2601 insertions(+), 1873 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h
 create mode 100644 xen/include/public/arch-x86/xenpmu.h
 create mode 100644 xen/include/public/xenpmu.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-24 14:16   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 02/17] x86/VPMU: Stop AMD counters when called from vpmu_save_force() Boris Ostrovsky
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
hypercall

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/platform_hypercall.c        | 18 ++++++++++++
 xen/arch/x86/x86_64/platform_hypercall.c |  2 ++
 xen/common/symbols.c                     | 50 +++++++++++++++++++++++++++++++-
 xen/common/vsprintf.c                    |  2 +-
 xen/include/public/platform.h            | 19 ++++++++++++
 xen/include/xen/symbols.h                |  7 +++--
 xen/include/xlat.lst                     |  1 +
 7 files changed, 95 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 2162811..cdb6886 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include <xen/cpu.h>
 #include <xen/pmstat.h>
 #include <xen/irq.h>
+#include <xen/symbols.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -601,6 +602,23 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_get_symbol:
+    {
+        char name[XEN_KSYM_NAME_LEN + 1];
+        XEN_GUEST_HANDLE_64(char) nameh;
+
+        guest_from_compat_handle(nameh, op->u.symdata.u.name);
+
+        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
+                           &op->u.symdata.address, name);
+
+        if ( !ret && copy_to_guest(nameh, name, XEN_KSYM_NAME_LEN + 1) )
+            ret = -EFAULT;
+        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
+            ret = -EFAULT;
+    }
+    break;
+ 
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/arch/x86/x86_64/platform_hypercall.c b/xen/arch/x86/x86_64/platform_hypercall.c
index b6f380e..795837f 100644
--- a/xen/arch/x86/x86_64/platform_hypercall.c
+++ b/xen/arch/x86/x86_64/platform_hypercall.c
@@ -32,6 +32,8 @@ CHECK_pf_pcpu_version;
 CHECK_pf_enter_acpi_sleep;
 #undef xen_pf_enter_acpi_sleep
 
+#define xenpf_symdata   compat_pf_symdata
+
 #define COMPAT
 #define _XEN_GUEST_HANDLE(t) XEN_GUEST_HANDLE(t)
 #define _XEN_GUEST_HANDLE_PARAM(t) XEN_GUEST_HANDLE_PARAM(t)
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 45941e1..98f9534 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <public/platform.h>
+#include <xen/guest_access.h>
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -107,7 +109,7 @@ const char *symbols_lookup(unsigned long addr,
     unsigned long i, low, high, mid;
     unsigned long symbol_end = 0;
 
-    namebuf[KSYM_NAME_LEN] = 0;
+    namebuf[XEN_KSYM_NAME_LEN] = 0;
     namebuf[0] = 0;
 
     if (!is_active_kernel_text(addr))
@@ -148,3 +150,49 @@ const char *symbols_lookup(unsigned long addr,
     *offset = addr - symbols_address(low);
     return namebuf;
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+    /*
+     * Get just the first code, look it up in the token table,
+     * and return the first char from this token.
+     */
+    return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+/*
+ * Symbols are most likely accessed sequentially so we remember position from
+ * previous read. This can help us avoid the extra call to get_symbol_offset().
+ */
+static uint64_t next_symbol, next_offset;
+static DEFINE_SPINLOCK(symbols_mutex);
+
+int xensyms_read(uint32_t *symnum, uint32_t *type, uint64_t *address, char *name)
+{
+    if ( *symnum > symbols_num_syms )
+        return -ERANGE;
+    if ( *symnum == symbols_num_syms )
+        return 0;
+
+    spin_lock(&symbols_mutex);
+
+    if ( *symnum == 0 )
+        next_offset = next_symbol = 0;
+    if ( next_symbol != *symnum )
+        /* Non-sequential access */
+        next_offset = get_symbol_offset(*symnum);
+
+    *type = symbols_get_symbol_type(next_offset);
+    next_offset = symbols_expand_symbol(next_offset, name);
+    *address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
+
+    next_symbol = ++(*symnum);
+
+    spin_unlock(&symbols_mutex);
+
+    return 0;
+}
diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 1a6198e..c5ae187 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -275,7 +275,7 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
     case 'S': /* Symbol name unconditionally with offset and size */
     {
         unsigned long sym_size, sym_offset;
-        char namebuf[KSYM_NAME_LEN+1];
+        char namebuf[XEN_KSYM_NAME_LEN+1];
 
         /* Advance parents fmt string, as we have consumed 's' or 'S' */
         ++*fmt_ptr;
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 4341f54..ba9da49 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -527,6 +527,24 @@ struct xenpf_core_parking {
 typedef struct xenpf_core_parking xenpf_core_parking_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 
+#define XENPF_get_symbol   61
+#define XEN_KSYM_NAME_LEN 127
+struct xenpf_symdata {
+    /* IN variables */
+    uint32_t symnum;
+
+    /* OUT variables */
+    uint32_t type;
+    uint64_t address;
+
+    union {
+        XEN_GUEST_HANDLE(char) name;
+        uint64_t pad;
+    } u;
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -553,6 +571,7 @@ struct xen_platform_op {
         struct xenpf_cpu_hotadd        cpu_add;
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
+        struct xenpf_symdata           symdata;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 87cd77d..adbf91d 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -2,8 +2,8 @@
 #define _XEN_SYMBOLS_H
 
 #include <xen/types.h>
-
-#define KSYM_NAME_LEN 127
+#include <public/xen.h>
+#include <public/platform.h>
 
 /* Lookup an address. */
 const char *symbols_lookup(unsigned long addr,
@@ -11,4 +11,7 @@ const char *symbols_lookup(unsigned long addr,
                            unsigned long *offset,
                            char *namebuf);
 
+extern int xensyms_read(uint32_t *symnum, uint32_t *type,
+                        uint64_t *address, char *name);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index f00cef3..cf89583 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -84,6 +84,7 @@
 ?	processor_px			platform.h
 !	psd_package			platform.h
 ?	xenpf_enter_acpi_sleep		platform.h
+!	xenpf_symdata			platform.h
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
 !	sched_poll			sched.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 02/17] x86/VPMU: Stop AMD counters when called from vpmu_save_force()
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 03/17] x86/VPMU: Minor VPMU cleanup Boris Ostrovsky
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Change amd_vpmu_save() algorithm to accommodate cases when we need
to stop counters from vpmu_save_force() (needed by subsequent PMU
patches).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c | 14 ++++----------
 xen/arch/x86/hvm/vpmu.c     | 12 ++++++------
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 66a3815..bec40d8 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -223,22 +223,16 @@ static int amd_vpmu_save(struct vcpu *v)
     struct amd_vpmu_context *ctx = vpmu->context;
     unsigned int i;
 
-    /*
-     * Stop the counters. If we came here via vpmu_save_force (i.e.
-     * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
-     */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+    if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
     {
-        vpmu_set(vpmu, VPMU_FROZEN);
-
         for ( i = 0; i < num_counters; i++ )
             wrmsrl(ctrls[i], 0);
 
-        return 0;
+        vpmu_set(vpmu, VPMU_FROZEN);
     }
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return 0;
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+            return 0;
 
     context_save(v);
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 21fbaba..a4e3664 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -127,13 +127,19 @@ static void vpmu_save_force(void *arg)
     struct vcpu *v = (struct vcpu *)arg;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         return;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
     if ( vpmu->arch_vpmu_ops )
         (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
 
     vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
+    vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
 }
@@ -177,12 +183,8 @@ void vpmu_load(struct vcpu *v)
          * before saving the context.
          */
         if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
             on_selected_cpus(cpumask_of(vpmu->last_pcpu),
                              vpmu_save_force, (void *)v, 1);
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-        }
     } 
 
     /* Prevent forced context save from remote CPU */
@@ -195,9 +197,7 @@ void vpmu_load(struct vcpu *v)
         vpmu = vcpu_vpmu(prev);
 
         /* Someone ran here before us */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
         vpmu_save_force(prev);
-        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
         vpmu = vcpu_vpmu(v);
     }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 03/17] x86/VPMU: Minor VPMU cleanup
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 02/17] x86/VPMU: Stop AMD counters when called from vpmu_save_force() Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-24 14:28   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 04/17] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Update macros that modify VPMU flags to allow changing multiple bits at once.

Make sure that we only touch MSR bitmap on HVM guests (both VMX and SVM). This
is needed by subsequent PMU patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 14 +++++++++-----
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  9 +++------
 xen/arch/x86/hvm/vpmu.c           | 11 +++--------
 xen/include/asm-x86/hvm/vpmu.h    |  9 +++++----
 4 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index bec40d8..84b8a36 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -236,7 +236,8 @@ static int amd_vpmu_save(struct vcpu *v)
 
     context_save(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+    if ( !is_pv_domain(v->domain) && 
+        !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -276,7 +277,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     /* For all counters, enable guest only mode for HVM guest */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+    if ( !is_pv_domain(v->domain) && (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
         !(is_guest_mode(msr_content)) )
     {
         set_guest_mode(msr_content);
@@ -292,7 +293,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
-        if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( !is_pv_domain(v->domain) &&
+             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
             amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -303,7 +305,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( !is_pv_domain(v->domain) &&
+             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
             amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -395,7 +398,8 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( !is_pv_domain(v->domain) &&
+         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index ee26362..5368670 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -326,10 +326,7 @@ static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        return 0;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) 
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
     __core2_vpmu_save(v);
@@ -446,7 +443,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
     {
         __core2_vpmu_load(current);
         vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap )
+        if ( cpu_has_vmx_msr_bitmap && !is_pv_domain(current->domain) )
             core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
     }
     return 1;
@@ -813,7 +810,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
     xfree(core2_vpmu_cxt->pmu_enable);
     xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap )
+    if ( cpu_has_vmx_msr_bitmap && !is_pv_domain(v->domain) )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
     vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index a4e3664..d6a9ff6 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -127,10 +127,7 @@ static void vpmu_save_force(void *arg)
     struct vcpu *v = (struct vcpu *)arg;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
         return;
 
     vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
@@ -138,8 +135,7 @@ static void vpmu_save_force(void *arg)
     if ( vpmu->arch_vpmu_ops )
         (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
 
-    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
-    vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
 }
@@ -149,8 +145,7 @@ void vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int pcpu = smp_processor_id();
 
-    if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) &&
-           vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) )
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
        return;
 
     vpmu->last_pcpu = pcpu;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 40f63fb..2a713be 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -81,10 +81,11 @@ struct vpmu_struct {
 #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
 
 
-#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
-#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
+#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
+#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
+#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
+#define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
+#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 04/17] intel/VPMU: Clean up Intel VPMU code
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 03/17] x86/VPMU: Minor VPMU cleanup Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 05/17] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures with
fields in core2_vpmu_context.

Call core2_get_pmc_count() once, during initialization.

Properly clean up when core2_vpmu_alloc_resource() fails and add routines
to remove MSRs from VMCS.


Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c              |  55 ++++++
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 310 ++++++++++++++-----------------
 xen/include/asm-x86/hvm/vmx/vmcs.h       |   2 +
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
 4 files changed, 199 insertions(+), 187 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 44f33cb..5f86b17 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1205,6 +1205,34 @@ int vmx_add_guest_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_guest_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+
+    for ( ; idx < msr_count - 1; idx++ )
+    {
+        msr_area[idx].index = msr_area[idx + 1].index;
+        msr_area[idx].data = msr_area[idx + 1].data;
+    }
+    msr_area[msr_count - 1].index = 0;
+
+    curr->arch.hvm_vmx.msr_count = --msr_count;
+    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
+    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
+}
+
 int vmx_add_host_load_msr(u32 msr)
 {
     struct vcpu *curr = current;
@@ -1235,6 +1263,33 @@ int vmx_add_host_load_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_host_load_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int idx,  msr_count = curr->arch.hvm_vmx.host_msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+
+    for ( ; idx < msr_count - 1; idx++ )
+    {
+        msr_area[idx].index = msr_area[idx + 1].index;
+        msr_area[idx].data = msr_area[idx + 1].data;
+    }
+    msr_area[msr_count - 1].index = 0;
+
+    curr->arch.hvm_vmx.host_msr_count = --msr_count;
+    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+}
+
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
 {
     if ( !test_and_set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap) )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 5368670..8d920c0 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -69,6 +69,26 @@
 static bool_t __read_mostly full_width_write;
 
 /*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+#define VPMU_CORE2_MAX_FIXED_PMCS     4
+struct core2_vpmu_context {
+    u64 fixed_ctrl;
+    u64 ds_area;
+    u64 pebs_enable;
+    u64 global_ovf_status;
+    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
+    struct arch_msr_pair arch_msr_pair[1];
+};
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
  * QUIRK to workaround an issue on various family 6 cpus.
  * The issue leads to endless PMC interrupt loops on the processor.
  * If the interrupt handler is running and a pmc reaches the value 0, this
@@ -88,11 +108,8 @@ static void check_pmc_quirk(void)
         is_pmc_quirk = 0;    
 }
 
-static int core2_get_pmc_count(void);
 static void handle_pmc_quirk(u64 msr_content)
 {
-    int num_gen_pmc = core2_get_pmc_count();
-    int num_fix_pmc  = 3;
     int i;
     u64 val;
 
@@ -100,7 +117,7 @@ static void handle_pmc_quirk(u64 msr_content)
         return;
 
     val = msr_content;
-    for ( i = 0; i < num_gen_pmc; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -112,7 +129,7 @@ static void handle_pmc_quirk(u64 msr_content)
         val >>= 1;
     }
     val = msr_content >> 32;
-    for ( i = 0; i < num_fix_pmc; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -125,75 +142,42 @@ static void handle_pmc_quirk(u64 msr_content)
     }
 }
 
-static const u32 core2_fix_counters_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR0,
-    MSR_CORE_PERF_FIXED_CTR1,
-    MSR_CORE_PERF_FIXED_CTR2
-};
-
 /*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
  */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */
-#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
-
-/* Core 2 Non-architectual Performance Control MSRs. */
-static const u32 core2_ctrls_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR_CTRL,
-    MSR_IA32_PEBS_ENABLE,
-    MSR_IA32_DS_AREA
-};
-
-struct pmumsr {
-    unsigned int num;
-    const u32 *msr;
-};
-
-static const struct pmumsr core2_fix_counters = {
-    VPMU_CORE2_NUM_FIXED,
-    core2_fix_counters_msr
-};
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
 
-static const struct pmumsr core2_ctrls = {
-    VPMU_CORE2_NUM_CTRLS,
-    core2_ctrls_msr
-};
-static int arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
+}
 
 /*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
  */
-static int core2_get_pmc_count(void)
+static int core2_get_fixed_pmc_count(void)
 {
-    u32 eax, ebx, ecx, edx;
-
-    if ( arch_pmc_cnt == 0 )
-    {
-        cpuid(0xa, &eax, &ebx, &ecx, &edx);
-        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
-    }
+    u32 eax;
 
-    return arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
 }
 
 static u64 core2_calc_intial_glb_ctrl_msr(void)
 {
-    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
-    u64 fix_pmc_bits  = (1 << 3) - 1;
-    return ((fix_pmc_bits << 32) | arch_pmc_bits);
+    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
+    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
+    return ( (fix_pmc_bits << 32) | arch_pmc_bits );
 }
 
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
-    u32 eax, ebx, ecx, edx;
+    u32 edx;
 
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
+    edx = cpuid_edx(0xa);
+    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
 }
 
 static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
@@ -201,9 +185,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     int i;
     u32 msr_index_pmc;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_fix_counters.msr[i] == msr_index )
+        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
         {
             *type = MSR_TYPE_COUNTER;
             *index = i;
@@ -211,14 +195,12 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
+    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
+        (msr_index == MSR_IA32_DS_AREA) ||
+        (msr_index == MSR_IA32_PEBS_ENABLE) )
     {
-        if ( core2_ctrls.msr[i] == msr_index )
-        {
-            *type = MSR_TYPE_CTRL;
-            *index = i;
-            return 1;
-        }
+        *type = MSR_TYPE_CTRL;
+        return 1;
     }
 
     if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
@@ -231,7 +213,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
 
     msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
     if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-         (msr_index_pmc < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) )
+         (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
     {
         *type = MSR_TYPE_ARCH_COUNTER;
         *index = msr_index_pmc - MSR_IA32_PERFCTR0;
@@ -239,7 +221,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     }
 
     if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + core2_get_pmc_count())) )
+         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
     {
         *type = MSR_TYPE_ARCH_CTRL;
         *index = msr_index - MSR_P6_EVNTSEL0;
@@ -254,13 +236,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     int i;
 
     /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                   msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
@@ -275,26 +257,28 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     }
 
     /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
 
         if ( full_width_write )
@@ -305,10 +289,12 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static inline void __core2_vpmu_save(struct vcpu *v)
@@ -316,10 +302,10 @@ static inline void __core2_vpmu_save(struct vcpu *v)
     int i;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        rdmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -343,20 +329,22 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     unsigned int i, pmc_start;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        wrmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
     else
         pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
         wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, core2_vpmu_cxt->arch_msr_pair[i].control);
+    }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -373,56 +361,39 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct core2_vpmu_context *core2_vpmu_cxt;
-    struct core2_pmu_enable *pmu_enable;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
     if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
 
-    pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) +
-                               core2_get_pmc_count() - 1);
-    if ( !pmu_enable )
-        goto out1;
-
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (core2_get_pmc_count()-1)*sizeof(struct arch_msr_pair));
+                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
     if ( !core2_vpmu_cxt )
-        goto out2;
-    core2_vpmu_cxt->pmu_enable = pmu_enable;
+        goto out_err;
+
     vpmu->context = (void *)core2_vpmu_cxt;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
     return 1;
- out2:
-    xfree(pmu_enable);
- out1:
-    gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU feature is "
-             "unavailable on domain %d vcpu %d.\n",
-             v->vcpu_id, v->domain->domain_id);
-    return 0;
-}
 
-static void core2_vpmu_save_msr_context(struct vcpu *v, int type,
-                                       int index, u64 msr_data)
-{
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+out_err:
+    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    release_pmu_ownship(PMU_OWNER_HVM);
 
-    switch ( type )
-    {
-    case MSR_TYPE_CTRL:
-        core2_vpmu_cxt->ctrls[index] = msr_data;
-        break;
-    case MSR_TYPE_ARCH_CTRL:
-        core2_vpmu_cxt->arch_msr_pair[index].control = msr_data;
-        break;
-    }
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
 }
 
 static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
@@ -433,10 +404,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
         return 0;
 
     if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-	 (vpmu->context != NULL ||
-	  !core2_vpmu_alloc_resource(current)) )
+         !core2_vpmu_alloc_resource(current) )
         return 0;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
     /* Do the lazy load staff. */
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
@@ -452,7 +421,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     u64 global_ctrl, non_global_ctrl;
-    char pmu_enable = 0;
+    unsigned pmu_enable = 0;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -497,6 +466,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
         return 1;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
@@ -509,27 +479,25 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
-            core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0;
+            core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 1;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
-        for ( i = 0; i < core2_get_pmc_count(); i++ )
+        for ( i = 0; i < arch_pmc_cnt; i++ )
         {
             rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] =
-                    global_ctrl & (non_global_ctrl >> 22) & 1;
+            pmu_enable += global_ctrl & (non_global_ctrl >> 22) & 1;
             global_ctrl >>= 1;
         }
 
         rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
         global_ctrl = msr_content >> 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1 : 0);
             non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
             global_ctrl >>= 1;
         }
@@ -538,27 +506,27 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         non_global_ctrl = msr_content;
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
         global_ctrl >>= 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1 : 0);
             non_global_ctrl >>= 4;
             global_ctrl >>= 1;
         }
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
         break;
     default:
         tmp = msr - MSR_P6_EVNTSEL0;
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        if ( tmp >= 0 && tmp < core2_get_pmc_count() )
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] =
-                (global_ctrl >> tmp) & (msr_content >> 22) & 1;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
+                pmu_enable += (global_ctrl >> i) &
+                    (core2_vpmu_cxt->arch_msr_pair[i].control >> 22) & 1;
+        }
     }
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i];
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i];
-    pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable;
+    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
     if ( pmu_enable )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
@@ -577,7 +545,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
     }
 
-    core2_vpmu_save_msr_context(v, type, index, msr_content);
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -593,7 +560,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             if  ( msr == MSR_IA32_DS_AREA )
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (VPMU_CORE2_NUM_FIXED * FIXED_CTR_CTRL_BITS)) - 1);
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
             if (msr_content & mask)
                 inject_gp = 1;
             break;
@@ -678,7 +645,7 @@ static void core2_vpmu_do_cpuid(unsigned int input,
 static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i, num;
+    int i;
     const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
     u64 val;
 
@@ -696,27 +663,25 @@ static void core2_vpmu_dump(const struct vcpu *v)
 
     printk("    vPMU running\n");
     core2_vpmu_cxt = vpmu->context;
-    num = core2_get_pmc_count();
+
     /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < num; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
 
-        if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] )
-            printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-                   i, msr_pair[i].counter, msr_pair[i].control);
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+               i, msr_pair[i].counter, msr_pair[i].control);
     }
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
      */
-    val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] )
-            printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-                   i, core2_vpmu_cxt->fix_counters[i],
-                   val & FIXED_CTR_CTRL_MASK);
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, core2_vpmu_cxt->fix_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
 }
@@ -734,7 +699,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
         core2_vpmu_cxt->global_ovf_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
     else
@@ -797,18 +762,27 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         }
     }
 func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
+    {
+        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
+        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
+               fixed_pmc_cnt);
+    }
     check_pmc_quirk();
+
     return 0;
 }
 
 static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
 
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
-    xfree(core2_vpmu_cxt->pmu_enable);
+
     xfree(vpmu->context);
     if ( cpu_has_vmx_msr_bitmap && !is_pv_domain(v->domain) )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index ebaba5c..ed81cfb 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -473,7 +473,9 @@ void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
 int vmx_add_guest_msr(u32 msr);
+void vmx_rm_guest_msr(u32 msr);
 int vmx_add_host_load_msr(u32 msr);
+void vmx_rm_host_load_msr(u32 msr);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
index 60b05fd..410372d 100644
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
@@ -23,29 +23,10 @@
 #ifndef __ASM_X86_HVM_VPMU_CORE_H_
 #define __ASM_X86_HVM_VPMU_CORE_H_
 
-/* Currently only 3 fixed counters are supported. */
-#define VPMU_CORE2_NUM_FIXED 3
-/* Currently only 3 Non-architectual Performance Control MSRs */
-#define VPMU_CORE2_NUM_CTRLS 3
-
 struct arch_msr_pair {
     u64 counter;
     u64 control;
 };
 
-struct core2_pmu_enable {
-    char ds_area_enable;
-    char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
-    char arch_pmc_enable[1];
-};
-
-struct core2_vpmu_context {
-    struct core2_pmu_enable *pmu_enable;
-    u64 fix_counters[VPMU_CORE2_NUM_FIXED];
-    u64 ctrls[VPMU_CORE2_NUM_CTRLS];
-    u64 global_ovf_status;
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 05/17] x86/VPMU: Handle APIC_LVTPC accesses
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 04/17] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 06/17] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Update APIC_LVTPC vector when HVM guest writes to it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |  4 ----
 xen/arch/x86/hvm/vlapic.c         |  5 ++++-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 17 -----------------
 xen/arch/x86/hvm/vpmu.c           | 14 +++++++++++---
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 5 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 84b8a36..f6c542b 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -290,8 +290,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
         if ( !is_pv_domain(v->domain) &&
              !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
@@ -302,8 +300,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
         if ( !is_pv_domain(v->domain) &&
              ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index bc06010..d954f4f 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,6 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
@@ -732,8 +733,10 @@ static int vlapic_reg_write(struct vcpu *v,
             vlapic_adjust_i8259_target(v->domain);
             pt_may_unmask_irq(v->domain, NULL);
         }
-        if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
+        else if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
             pt_may_unmask_irq(NULL, &vlapic->pt);
+        else if ( offset == APIC_LVTPC )
+            vpmu_lvtpc_update(val);
         break;
 
     case APIC_TMICT:
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 8d920c0..a966b91 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -532,19 +532,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    /* Setup LVTPC in local apic */
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
-    }
-    else
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
-    }
-
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -710,10 +697,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
             return 0;
     }
 
-    /* HW sets the MASK bit when performance counter interrupt occurs*/
-    vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED;
-    apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-
     return 1;
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index d6a9ff6..0770bcf 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -64,6 +64,14 @@ static void __init parse_vpmu_param(char *s)
     }
 }
 
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
@@ -227,18 +235,18 @@ void vpmu_initialise(struct vcpu *v)
     case X86_VENDOR_AMD:
         if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
             opt_vpmu_enabled = 0;
-        break;
+        return;
 
     case X86_VENDOR_INTEL:
         if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
             opt_vpmu_enabled = 0;
-        break;
+        return;
 
     default:
         printk("VPMU: Initialization failed. "
                "Unknown CPU vendor %d\n", vendor);
         opt_vpmu_enabled = 0;
-        break;
+        return;
     }
 }
 
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 2a713be..7ee0f01 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -87,6 +87,7 @@ struct vpmu_struct {
 #define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
 #define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
 
+void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 06/17] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 05/17] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-21 19:08 ` [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

MSR_CORE_PERF_GLOBAL_CTRL register should be set zero initially. It is up to
the guest to set it so that counters are enabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index a966b91..217c1f7 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -164,13 +164,6 @@ static int core2_get_fixed_pmc_count(void)
     return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
 }
 
-static u64 core2_calc_intial_glb_ctrl_msr(void)
-{
-    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
-    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
-    return ( (fix_pmc_bits << 32) | arch_pmc_bits );
-}
-
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
@@ -371,8 +364,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
         goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                 core2_calc_intial_glb_ctrl_msr());
+    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
                     (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 06/17] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-24 14:54   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 08/17] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Add xenpmu.h header file, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c              | 71 ++++++++++++++------------
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 87 +++++++++++++++++---------------
 xen/arch/x86/hvm/vpmu.c                  |  1 +
 xen/arch/x86/oprofile/op_model_ppro.c    |  6 ++-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 32 ------------
 xen/include/asm-x86/hvm/vpmu.h           | 13 ++---
 xen/include/public/arch-x86/xenpmu.h     | 66 ++++++++++++++++++++++++
 xen/include/public/xenpmu.h              | 38 ++++++++++++++
 8 files changed, 199 insertions(+), 115 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/xenpmu.h
 create mode 100644 xen/include/public/xenpmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index f6c542b..bf7f1f6 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vpmu.h>
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include <public/xenpmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,10 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+#define AMD_MAX_COUNTERS    6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
     MSR_K7_PERFCTR0,
@@ -83,13 +84,6 @@ static const u32 AMD_F15H_CTRLS[] = {
     MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-    u64 counters[MAX_NUM_COUNTERS];
-    u64 ctrls[MAX_NUM_COUNTERS];
-    bool_t msr_bitmap_set;
-};
-
 static inline int get_pmu_reg_type(u32 addr)
 {
     if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
@@ -142,7 +136,7 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -157,7 +151,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -177,28 +171,31 @@ static inline void context_load(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     for ( i = 0; i < num_counters; i++ )
     {
-        wrmsrl(counters[i], ctxt->counters[i]);
-        wrmsrl(ctrls[i], ctxt->ctrls[i]);
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
     }
 }
 
 static void amd_vpmu_load(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
 
     vpmu_reset(vpmu, VPMU_FROZEN);
 
     if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
     {
         unsigned int i;
+	uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
         for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctxt->ctrls[i]);
+            wrmsrl(ctrls[i], ctrl_regs[i]);
 
         return;
     }
@@ -210,17 +207,18 @@ static inline void context_save(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
 
     /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
     for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], ctxt->counters[i]);
+        rdmsrl(counters[i], counter_regs[i]);
 }
 
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctx = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctx = vpmu->context;
     unsigned int i;
 
     if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
@@ -248,7 +246,9 @@ static void context_update(unsigned int msr, u64 msr_content)
     unsigned int i;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     if ( k7_counters_mirrored &&
         ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
@@ -260,12 +260,12 @@ static void context_update(unsigned int msr, u64 msr_content)
     {
        if ( msr == ctrls[i] )
        {
-           ctxt->ctrls[i] = msr_content;
+           ctrl_regs[i] = msr_content;
            return;
        }
         else if (msr == counters[i] )
         {
-            ctxt->counters[i] = msr_content;
+            counter_regs[i] = msr_content;
             return;
         }
     }
@@ -292,7 +292,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         vpmu_set(vpmu, VPMU_RUNNING);
 
         if ( !is_pv_domain(v->domain) &&
-             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             !((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
             amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -302,7 +302,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     {
         vpmu_reset(vpmu, VPMU_RUNNING);
         if ( !is_pv_domain(v->domain) &&
-             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
             amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -343,7 +343,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 static int amd_vpmu_initialise(struct vcpu *v)
 {
-    struct amd_vpmu_context *ctxt;
+    struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
 
@@ -373,7 +373,9 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc(struct amd_vpmu_context);
+    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
+			 sizeof(uint64_t) * AMD_MAX_COUNTERS + 
+			 sizeof(uint64_t) * AMD_MAX_COUNTERS);
     if ( !ctxt )
     {
         gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
@@ -382,6 +384,9 @@ static int amd_vpmu_initialise(struct vcpu *v)
         return -ENOMEM;
     }
 
+    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
+
     vpmu->context = ctxt;
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
     return 0;
@@ -395,7 +400,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
         return;
 
     if ( !is_pv_domain(v->domain) &&
-         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+         ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
@@ -412,7 +417,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
 static void amd_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct amd_vpmu_context *ctxt = vpmu->context;
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
     unsigned int i;
 
     printk("    VPMU state: 0x%x ", vpmu->flags);
@@ -442,8 +449,8 @@ static void amd_vpmu_dump(const struct vcpu *v)
         rdmsrl(ctrls[i], ctrl);
         rdmsrl(counters[i], cntr);
         printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctxt->ctrls[i], ctrl,
-               counters[i], ctxt->counters[i], cntr);
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
     }
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 217c1f7..3c3bedc 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -35,8 +35,8 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <public/sched.h>
 #include <public/hvm/save.h>
+#include <public/xenpmu.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
@@ -68,6 +68,10 @@
 #define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
 static bool_t __read_mostly full_width_write;
 
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
 /*
  * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
  * counters. 4 bits for every counter.
@@ -75,16 +79,6 @@ static bool_t __read_mostly full_width_write;
 #define FIXED_CTR_CTRL_BITS 4
 #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
 
-#define VPMU_CORE2_MAX_FIXED_PMCS     4
-struct core2_vpmu_context {
-    u64 fixed_ctrl;
-    u64 ds_area;
-    u64 pebs_enable;
-    u64 global_ovf_status;
-    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 /* Number of general-purpose and fixed performance counters */
 static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
 
@@ -224,6 +218,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     return 0;
 }
 
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
@@ -293,12 +288,15 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 static inline void __core2_vpmu_save(struct vcpu *v)
 {
     int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -320,10 +318,13 @@ static int core2_vpmu_save(struct vcpu *v)
 static inline void __core2_vpmu_load(struct vcpu *v)
 {
     unsigned int i, pmc_start;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
@@ -331,8 +332,8 @@ static inline void __core2_vpmu_load(struct vcpu *v)
         pmc_start = MSR_IA32_PERFCTR0;
     for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL0 + i, core2_vpmu_cxt->arch_msr_pair[i].control);
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
     }
 
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
@@ -353,7 +354,7 @@ static void core2_vpmu_load(struct vcpu *v)
 static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
@@ -366,11 +367,16 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
+    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+				   sizeof(uint64_t) * fixed_pmc_cnt +
+				   sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt);
     if ( !core2_vpmu_cxt )
         goto out_err;
 
+    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+      sizeof(uint64_t) * fixed_pmc_cnt;
+
     vpmu->context = (void *)core2_vpmu_cxt;
 
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
@@ -418,7 +424,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
 
     if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -447,7 +453,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_status &= ~msr_content;
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -510,11 +516,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         tmp = msr - MSR_P6_EVNTSEL0;
         if ( tmp >= 0 && tmp < arch_pmc_cnt )
         {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            xen_pmu_cntr_pair[tmp].control = msr_content;
             for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
                 pmu_enable += (global_ctrl >> i) &
-                    (core2_vpmu_cxt->arch_msr_pair[i].control >> 22) & 1;
+                    (xen_pmu_cntr_pair[i].control >> 22) & 1;
         }
     }
 
@@ -565,7 +574,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
 
     if ( core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -576,7 +585,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = 0;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_ovf_status;
+            *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
@@ -625,8 +634,11 @@ static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int i;
-    const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
     u64 val;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
          return;
@@ -645,12 +657,9 @@ static void core2_vpmu_dump(const struct vcpu *v)
 
     /* Print the contents of the counter and its configuration msr. */
     for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
-
         printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, msr_pair[i].counter, msr_pair[i].control);
-    }
+            i, xen_pmu_cntr_pair[i].counter, xen_pmu_cntr_pair[i].control);
+
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
@@ -659,7 +668,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, core2_vpmu_cxt->fix_counters[i],
+               i, fixed_counters[i],
                val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
@@ -670,14 +679,14 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     u64 msr_content;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
 
     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
     if ( msr_content )
     {
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_ovf_status |= msr_content;
+        core2_vpmu_cxt->global_status |= msr_content;
         msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
@@ -740,12 +749,6 @@ func_out:
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
-    {
-        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
-        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
-               fixed_pmc_cnt);
-    }
     check_pmc_quirk();
 
     return 0;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 0770bcf..8c263a5 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <public/xenpmu.h>
 
 /*
  * "vpmu" :     vpmu generally enabled
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index 3225937..5aae2e7 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -20,11 +20,15 @@
 #include <asm/regs.h>
 #include <asm/current.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
 
+struct arch_msr_pair {
+    u64 counter;
+    u64 control;
+};
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
deleted file mode 100644
index 410372d..0000000
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-
-/*
- * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_CORE_H_
-#define __ASM_X86_HVM_VPMU_CORE_H_
-
-struct arch_msr_pair {
-    u64 counter;
-    u64 control;
-};
-
-#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
-
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 7ee0f01..9992887 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -22,6 +22,8 @@
 #ifndef __ASM_X86_HVM_VPMU_H_
 #define __ASM_X86_HVM_VPMU_H_
 
+#include <public/xenpmu.h>
+
 /*
  * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
  * See arch/x86/hvm/vpmu.c.
@@ -29,12 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.hvm_vcpu.vpmu))
-#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
@@ -42,6 +41,9 @@
 #define MSR_TYPE_ARCH_COUNTER       3
 #define MSR_TYPE_ARCH_CTRL          4
 
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
 
 /* Arch specific operations shared by all vpmus */
 struct arch_vpmu_ops {
@@ -76,11 +78,6 @@ struct vpmu_struct {
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
 
-/* VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-
 #define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
 #define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
 #define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
diff --git a/xen/include/public/arch-x86/xenpmu.h b/xen/include/public/arch-x86/xenpmu.h
new file mode 100644
index 0000000..7778a45
--- /dev/null
+++ b/xen/include/public/arch-x86/xenpmu.h
@@ -0,0 +1,66 @@
+#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
+#define __XEN_PUBLIC_ARCH_X86_PMU_H__
+
+/* x86-specific PMU definitions */
+
+#include "xen.h"
+
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+    uint64_t counters;       /* Offset to counter MSRs */
+    uint64_t ctrls;          /* Offset to control MSRs */
+    uint64_t msr_bitmap_set; /* Used by HVM only */
+};
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+    uint64_t counter;
+    uint64_t control;
+};
+
+struct xen_pmu_intel_ctxt {
+    uint64_t global_ctrl;
+    uint64_t global_ovf_ctrl;
+    uint64_t global_status;
+    uint64_t fixed_ctrl;
+    uint64_t ds_area;
+    uint64_t pebs_enable;
+    uint64_t debugctl;
+    uint64_t fixed_counters;  /* Offset to fixed counter MSRs */
+    uint64_t arch_counters;   /* Offset to architectural counter MSRs */
+};
+
+#define XENPMU_MAX_CTXT_SZ        (sizeof(struct xen_pmu_amd_ctxt) > \
+                                    sizeof(struct xen_pmu_intel_ctxt) ? \
+                                     sizeof(struct xen_pmu_amd_ctxt) : \
+                                     sizeof(struct xen_pmu_intel_ctxt))
+#define XENPMU_CTXT_PAD_SZ        (((XENPMU_MAX_CTXT_SZ + 64) & ~63) + 128)
+struct xen_arch_pmu {
+    union {
+        struct cpu_user_regs regs;
+        uint8_t pad1[256];
+    } r;
+    union {
+        uint32_t lapic_lvtpc;
+        uint64_t pad2;
+    } l;
+    union {
+        struct xen_pmu_amd_ctxt amd;
+        struct xen_pmu_intel_ctxt intel;
+        uint8_t pad3[XENPMU_CTXT_PAD_SZ];
+    } c;
+};
+typedef struct xen_arch_pmu xen_arch_pmu_t;
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
new file mode 100644
index 0000000..4757db9
--- /dev/null
+++ b/xen/include/public/xenpmu.h
@@ -0,0 +1,38 @@
+#ifndef __XEN_PUBLIC_XENPMU_H__
+#define __XEN_PUBLIC_XENPMU_H__
+
+#include "xen.h"
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/xenpmu.h"
+#elif defined (__arm__) || defined (__aarch64__)
+#include "arch-arm.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    0
+
+
+/* Shared between hypervisor and PV domain */
+struct xen_pmu_data {
+    uint32_t domain_id;
+    uint32_t vcpu_id;
+    uint32_t pcpu_id;
+    uint32_t pmu_flags;
+
+    xen_arch_pmu_t pmu;
+};
+typedef struct xen_pmu_data xen_pmu_data_t;
+
+#endif /* __XEN_PUBLIC_XENPMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 08/17] x86/VPMU: Make vpmu not HVM-specific
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-24 14:59   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 5 ++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 9d39061..f352a84 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -396,6 +396,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    struct vpmu_struct vpmu;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 122ab0d..9beeaa9 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -152,9 +152,6 @@ struct hvm_vcpu {
     u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
 
-    /* VPMU */
-    struct vpmu_struct  vpmu;
-
     union {
         struct arch_vmx_struct vmx;
         struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 9992887..8646fd6 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,8 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
+#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, arch.vpmu))
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 08/17] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-24 15:10   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 10/17] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* PMU off
* PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0
  profiles itself and the hypervisor.
* dom0-only PMU: dom0 collects samples for both itself and guests.

For feature flags only Intel's BTS is currently supported.

Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c        |  2 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c  |  4 +-
 xen/arch/x86/hvm/vpmu.c            | 77 ++++++++++++++++++++++++++++++++++----
 xen/arch/x86/x86_64/compat/entry.S |  4 ++
 xen/arch/x86/x86_64/entry.S        |  4 ++
 xen/include/asm-x86/hvm/vpmu.h     |  9 +----
 xen/include/public/xen.h           |  1 +
 xen/include/public/xenpmu.h        | 51 +++++++++++++++++++++++++
 xen/include/xen/hypercall.h        |  4 ++
 9 files changed, 138 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index bf7f1f6..3dd6911 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -471,7 +471,7 @@ int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     int ret = 0;
 
     /* vpmu enabled? */
-    if ( !vpmu_flags )
+    if ( vpmu_flags == XENPMU_MODE_OFF )
         return 0;
 
     switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 3c3bedc..9e0e743 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -707,7 +707,7 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     u64 msr_content;
     struct cpuinfo_x86 *c = &current_cpu_data;
 
-    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+    if ( !(vpmu_flags & (XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT)) )
         goto func_out;
     /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
     if ( cpu_has(c, X86_FEATURE_DS) )
@@ -826,7 +826,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     int ret = 0;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( !vpmu_flags )
+    if ( vpmu_flags == XENPMU_MODE_OFF )
         return 0;
 
     if ( family == 6 )
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 8c263a5..309f858 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,7 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -38,7 +39,7 @@
  * "vpmu=off" : vpmu generally disabled
  * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
  */
-static unsigned int __read_mostly opt_vpmu_enabled;
+uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
@@ -52,7 +53,7 @@ static void __init parse_vpmu_param(char *s)
         break;
     default:
         if ( !strcmp(s, "bts") )
-            opt_vpmu_enabled |= VPMU_BOOT_BTS;
+            vpmu_mode |= XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT;
         else if ( *s )
         {
             printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
@@ -60,7 +61,7 @@ static void __init parse_vpmu_param(char *s)
         }
         /* fall through */
     case 1:
-        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
+        vpmu_mode |= XENPMU_MODE_ON;
         break;
     }
 }
@@ -234,19 +235,19 @@ void vpmu_initialise(struct vcpu *v)
     switch ( vendor )
     {
     case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( svm_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         return;
 
     case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         return;
 
     default:
         printk("VPMU: Initialization failed. "
                "Unknown CPU vendor %d\n", vendor);
-        opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
         return;
     }
 }
@@ -268,3 +269,63 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret = -EINVAL;
+    xen_pmu_params_t pmu_params;
+    uint32_t mode;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
+        if ( mode & ~XENPMU_MODE_ON )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_MODE_MASK;
+        vpmu_mode |= mode;
+
+        ret = 0;
+        break;
+
+    case XENPMU_mode_get:
+        pmu_params.d.val = vpmu_mode & XENPMU_MODE_MASK;
+        pmu_params.v.version.maj = XENPMU_VER_MAJ;
+        pmu_params.v.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_feature_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( (uint32_t)pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_FEATURE_MASK;
+        vpmu_mode |= (uint32_t)pmu_params.d.val << XENPMU_FEATURE_SHIFT;
+
+        ret = 0;
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.d.val = vpmu_mode & XENPMU_FEATURE_MASK;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+     }
+
+    return ret;
+}
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 594b0b9..07c736d 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -416,6 +416,8 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall           /* reserved for XenClient */
+        .quad do_xenpmu_op              /* 40 */
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -464,6 +466,8 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 0 /* reserved for XenClient   */
+        .byte 2 /* do_xenpmu_op             */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 3ea4683..c36ffce 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -757,6 +757,8 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall       /* reserved for XenClient */
+        .quad do_xenpmu_op          /* 40 */
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -805,6 +807,8 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 0 /* reserved for XenClient */
+        .byte 2 /* do_xenpmu_op         */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 8646fd6..8c5c772 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -24,13 +24,6 @@
 
 #include <public/xenpmu.h>
 
-/*
- * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
- * See arch/x86/hvm/vpmu.c.
- */
-#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
-#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
-
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, arch.vpmu))
 
@@ -98,5 +91,7 @@ void vpmu_dump(struct vcpu *v);
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
+extern uint32_t vpmu_mode;
+
 #endif /* __ASM_X86_HVM_VPMU_H_*/
 
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 8c5697e..a00ab21 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index 4757db9..fac29a6 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -13,6 +13,57 @@
 #define XENPMU_VER_MAJ    0
 #define XENPMU_VER_MIN    0
 
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_feature_get     2
+#define XENPMU_feature_set     3
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xen_pmu_params {
+    /* IN/OUT parameters */
+    union {
+        struct version {
+            uint8_t maj;
+            uint8_t min;
+        } version;
+        uint64_t pad;
+    } v;
+    union {
+        uint64_t val;
+        XEN_GUEST_HANDLE(void) valp;
+    } d;
+
+    /* IN parameters */
+    uint64_t vcpu;
+};
+typedef struct xen_pmu_params xen_pmu_params_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_ON:    Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ */
+#define XENPMU_FEATURE_SHIFT      16
+#define XENPMU_MODE_MASK          ((1U << XENPMU_FEATURE_SHIFT) - 1)
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_ON            (1<<0)
+
+/*
+ * PMU features:
+ * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FEATURE_MASK       ((uint32_t)(~XENPMU_MODE_MASK))
+#define XENPMU_FEATURE_INTEL_BTS  1
 
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index a9e5229..acf50e8 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -14,6 +14,7 @@
 #include <public/event_channel.h>
 #include <public/tmem.h>
 #include <public/version.h>
+#include <public/xenpmu.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -139,6 +140,9 @@ do_tmem_op(
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+extern long
+do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
+
 #ifdef CONFIG_COMPAT
 
 extern int
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 10/17] x86/VPMU: Initialize PMU for PV guests
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-01-31 16:58   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 38 ++++++++++---------
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 50 ++++++++++++++++---------
 xen/arch/x86/hvm/vpmu.c           | 77 ++++++++++++++++++++++++++++++++++++++-
 xen/common/event_channel.c        |  1 +
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 xen/include/public/xen.h          |  1 +
 xen/include/public/xenpmu.h       |  2 +
 xen/include/xen/softirq.h         |  1 +
 8 files changed, 136 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 3dd6911..e2bff67 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -373,16 +373,21 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
-			 sizeof(uint64_t) * AMD_MAX_COUNTERS + 
-			 sizeof(uint64_t) * AMD_MAX_COUNTERS);
-    if ( !ctxt )
+    if ( !is_pv_domain(v->domain) )
     {
-        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-            " PMU feature is unavailable on domain %d vcpu %d.\n",
-            v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
+        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS + 
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                     " PMU feature is unavailable on domain %d vcpu %d.\n",
+                     v->vcpu_id, v->domain->domain_id);
+            return -ENOMEM;
+        }
     }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
 
     ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
     ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
@@ -399,18 +404,17 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( !is_pv_domain(v->domain) &&
-         ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
-        amd_vpmu_unset_msr_bitmap(v);
-
-    xfree(vpmu->context);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
-
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
+    if ( !is_pv_domain(v->domain) )
     {
-        vpmu_reset(vpmu, VPMU_RUNNING);
+        if ( ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_unset_msr_bitmap(v);
+
+        xfree(vpmu->context);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 9e0e743..1254c04 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -356,22 +356,30 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
 
-    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-        return 0;
+    if ( !is_pv_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            return 0;
 
-    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        goto out_err;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
 
-    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
-				   sizeof(uint64_t) * fixed_pmc_cnt +
-				   sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt);
-    if ( !core2_vpmu_cxt )
-        goto out_err;
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    }
 
     core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
     core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
@@ -751,6 +759,10 @@ func_out:
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
     check_pmc_quirk();
 
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
+            return 1;
+
     return 0;
 }
 
@@ -761,11 +773,15 @@ static void core2_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap && !is_pv_domain(v->domain) )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+    if ( !is_pv_domain(v->domain) )
+    {
+        xfree(vpmu->context);
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+    }
+
     release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+    vpmu_clear(vpmu);
 }
 
 struct arch_vpmu_ops core2_vpmu_ops = {
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 309f858..23b3040 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,10 +21,14 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
 #include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
+#include <asm/p2m.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
@@ -257,7 +261,13 @@ void vpmu_destroy(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, (void *)v, 1);
+
         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
 }
 
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
@@ -269,6 +279,59 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct page_info *page;
+    uint64_t gmfn = params->d.val;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if (v != current)
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if (v != current)
+        vcpu_unpause(v);
+}
+
 long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 {
     int ret = -EINVAL;
@@ -325,7 +388,19 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         ret = 0;
         break;
-     }
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+    }
 
     return ret;
 }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 34efd24..daf381c 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -108,6 +108,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_XENPMU:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 8c5c772..29bb977 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -60,6 +60,7 @@ struct vpmu_struct {
     u32 hw_lapic_lvtpc;
     void *context;
     struct arch_vpmu_ops *arch_vpmu_ops;
+    xen_pmu_data_t *xenpmu_data;
 };
 
 /* VPMU states */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index a00ab21..2eb5fd7 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
+#define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index fac29a6..9424313 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -25,6 +25,8 @@
 #define XENPMU_mode_set        1
 #define XENPMU_feature_get     2
 #define XENPMU_feature_set     3
+#define XENPMU_init            4
+#define XENPMU_finish          5
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index 0c0d481..5829fa4 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -8,6 +8,7 @@ enum {
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
     TASKLET_SOFTIRQ,
+    PMU_SOFTIRQ,
     NR_COMMON_SOFTIRQS
 };
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on PV guests
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (9 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 10/17] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-02-04 11:14   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Intercept accesses to PMU MSRs and process them in VPMU module.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/domain.c             |  3 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 60 ++++++++++++++++++++++++++++++++-------
 xen/arch/x86/hvm/vpmu.c           |  8 ++++++
 xen/arch/x86/traps.c              | 31 +++++++++++++++++++-
 xen/include/public/xenpmu.h       |  1 +
 5 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index da8e522..25572d5 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1972,8 +1972,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
     paging_dump_vcpu_info(v);
 
-    if ( is_hvm_vcpu(v) )
-        vpmu_dump(v);
+    vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 1254c04..5213c11 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/apic.h>
+#include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
 #include <asm/hvm/support.h>
@@ -297,6 +298,9 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
         rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( is_pv_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -306,10 +310,14 @@ static int core2_vpmu_save(struct vcpu *v)
     if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
+    if ( is_pv_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
     __core2_vpmu_save(v);
 
     /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap
+        && !is_pv_domain(v->domain) )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
 
     return 1;
@@ -339,6 +347,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
     wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
     wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( is_pv_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -424,6 +439,14 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
     return 1;
 }
 
+static void inject_trap(struct vcpu *v, unsigned int trapno)
+{
+    if ( !is_pv_domain(v->domain) )
+        hvm_inject_hw_exception(trapno, 0);
+    else
+        send_guest_trap(v->domain, v->vcpu_id, trapno);
+}
+
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     u64 global_ctrl, non_global_ctrl;
@@ -450,7 +473,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
                     return 1;
                 gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                inject_trap(v, TRAP_gp_fault);
                 return 0;
             }
         }
@@ -462,11 +485,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
                  "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        inject_trap(v, TRAP_gp_fault);
         return 1;
     case MSR_IA32_PEBS_ENABLE:
         if ( msr_content & 1 )
@@ -482,7 +506,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 gdprintk(XENLOG_WARNING,
                          "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
                          msr_content);
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                inject_trap(v, TRAP_gp_fault);
                 return 1;
             }
             core2_vpmu_cxt->ds_area = msr_content;
@@ -507,10 +531,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
             global_ctrl >>= 1;
         }
+        core2_vpmu_cxt->global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         non_global_ctrl = msr_content;
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        if ( !is_pv_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
         global_ctrl >>= 32;
         for ( i = 0; i < fixed_pmc_cnt; i++ )
         {
@@ -527,7 +555,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
                 vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            if ( !is_pv_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
             xen_pmu_cntr_pair[tmp].control = msr_content;
             for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
                 pmu_enable += (global_ctrl >> i) &
@@ -566,13 +597,19 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 inject_gp = 1;
             break;
         }
-        if (inject_gp)
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+
+        if (inject_gp) 
+            inject_trap(v, TRAP_gp_fault);
         else
             wrmsrl(msr, msr_content);
     }
     else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    {
+       if ( !is_pv_domain(v->domain) )
+           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+       else
+           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
 
     return 1;
 }
@@ -596,7 +633,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            if ( !is_pv_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
         default:
             rdmsrl(msr, *msr_content);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 23b3040..d32325c 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -400,6 +400,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         pvpmu_finish(current->domain, &pmu_params);
         break;
+
+    case XENPMU_lvtpc_set:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        vpmu_lvtpc_update((uint32_t)pmu_params.d.val);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3f7a3c7..7ff8401 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,6 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
+#include <asm/hvm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
@@ -865,8 +866,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
         __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
         break;
 
+    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
+        break; 
+
     case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
     case 0x0000000b: /* Extended Topology Enumeration */
     case 0x8000000a: /* SVM revision and features */
     case 0x8000001b: /* Instruction Based Sampling */
@@ -882,6 +885,8 @@ void pv_cpuid(struct cpu_user_regs *regs)
     }
 
  out:
+    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
+
     regs->eax = a;
     regs->ebx = b;
     regs->ecx = c;
@@ -2499,6 +2504,14 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
                 goto fail;
             break;
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
+                goto invalid;
+            break;
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
                 break;
@@ -2587,6 +2600,22 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             regs->eax = (uint32_t)msr_content;
             regs->edx = (uint32_t)(msr_content >> 32);
             break;
+        case MSR_IA32_PERF_CAPABILITIES:
+            /* No extra capabilities are supported */
+            regs->eax = regs->edx = 0;
+            break;
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( vpmu_do_rdmsr(regs->ecx, &msr_content) ) 
+            {
+                regs->eax = (uint32_t)msr_content;
+                regs->edx = (uint32_t)(msr_content >> 32);
+                break;
+            }
+            goto rdmsr_normal;
         default:
             if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
             {
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index 9424313..c22cd18 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -27,6 +27,7 @@
 #define XENPMU_feature_set     3
 #define XENPMU_init            4
 #define XENPMU_finish          5
+#define XENPMU_lvtpc_set       6
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for PV guests
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (10 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-02-04 11:22   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Add support for handling PMU interrupts for PV guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vpmu.c     | 116 +++++++++++++++++++++++++++++++++++++++++---
 xen/include/public/xenpmu.h |   7 +++
 2 files changed, 116 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index d32325c..aead6af 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -75,7 +75,12 @@ void vpmu_lvtpc_update(uint32_t val)
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+    /* Postpone APIC updates for PV guests if PMU interrupt is pending */
+    if ( !is_pv_domain(current->domain) ||
+         !(current->arch.vpmu.xenpmu_data &&
+           current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
@@ -83,7 +88,23 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+
+        /*
+         * We may have received a PMU interrupt during WRMSR handling
+         * and since do_wrmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_domain(current->domain) &&
+            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
@@ -92,16 +113,86 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+        if ( !is_hvm_domain(current->domain) &&
+            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct vpmu_struct *vpmu;
 
-    if ( vpmu->arch_vpmu_ops )
+    /* dom0 will handle this interrupt */
+    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
+        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
+
+    vpmu = vcpu_vpmu(v);
+    if ( !is_hvm_domain(v->domain) )
+    {
+        /* PV guest or dom0 is doing system profiling */
+        const struct cpu_user_regs *gregs;
+        int err;
+
+        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
+            return 1;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        /* Store appropriate registers in xenpmu_data */
+        if ( is_pv_32bit_domain(current->domain) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+             * and therefore we treat it the same way as a non-priviledged
+             * PV 32-bit domain.
+             */
+            struct compat_cpu_user_regs *cmp;
+
+            gregs = guest_cpu_user_regs();
+
+            cmp = (struct compat_cpu_user_regs *)
+                    &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+            XLAT_cpu_user_regs(cmp, gregs);
+        }
+        else if ( !is_control_domain(current->domain) &&
+                 !is_idle_vcpu(current) )
+        {
+            /* PV guest */
+            gregs = guest_cpu_user_regs();
+            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                   gregs, sizeof(struct cpu_user_regs));
+        }
+        else
+            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                   regs, sizeof(struct cpu_user_regs));
+
+        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
+        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
+        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
+
+        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+
+        send_guest_vcpu_virq(v, VIRQ_XENPMU);
+
+        return 1;
+    }
+    else if ( vpmu->arch_vpmu_ops )
     {
         struct vlapic *vlapic = vcpu_vlapic(v);
         u32 vlapic_lvtpc;
@@ -213,8 +304,13 @@ void vpmu_load(struct vcpu *v)
 
     local_irq_enable();
 
-    /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    /* 
+     * Only when PMU is counting and is not cached (for PV guests) do
+     * we load PMU context immediately.
+     */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (is_pv_domain(v->domain) &&
+          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
         return;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
@@ -408,6 +504,12 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         vpmu_lvtpc_update((uint32_t)pmu_params.d.val);
         ret = 0;
         break;
+    case XENPMU_flush:
+        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
+        vpmu_load(current);
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index c22cd18..df85209 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -28,6 +28,7 @@
 #define XENPMU_init            4
 #define XENPMU_finish          5
 #define XENPMU_lvtpc_set       6
+#define XENPMU_flush           7 /* Write cached MSR values to HW     */
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
@@ -68,6 +69,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
 #define XENPMU_FEATURE_MASK       ((uint32_t)(~XENPMU_MODE_MASK))
 #define XENPMU_FEATURE_INTEL_BTS  1
 
+/*
+ * PMU MSRs are cached in the context so the PV guest doesn't need to trap to
+ * the hypervisor
+ */
+#define PMU_CACHED 1
+
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
     uint32_t domain_id;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (11 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-02-04 11:31   ` Jan Beulich
  2014-01-21 19:08 ` [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Add support for privileged PMU mode which allows privileged domain (dom0)
profile both itself (and the hypervisor) and the guests. While this mode is on
profiling in guests is disabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vpmu.c     | 88 ++++++++++++++++++++++++++++++++-------------
 xen/arch/x86/traps.c        |  6 +++-
 xen/include/public/xenpmu.h |  3 ++
 3 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index aead6af..214300d 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -87,6 +87,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && !is_control_domain(current->domain) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
     {
         int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
@@ -112,6 +115,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && !is_control_domain(current->domain) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
     {
         int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
@@ -134,14 +140,18 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vpmu_struct *vpmu;
 
     /* dom0 will handle this interrupt */
-    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+         (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
         v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
 
     vpmu = vcpu_vpmu(v);
-    if ( !is_hvm_domain(v->domain) )
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
     {
         /* PV guest or dom0 is doing system profiling */
-        const struct cpu_user_regs *gregs;
+        struct cpu_user_regs *gregs;
         int err;
 
         if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
@@ -152,33 +162,62 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
         vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
 
-        /* Store appropriate registers in xenpmu_data */
-        if ( is_pv_32bit_domain(current->domain) )
+        if ( !is_hvm_domain(current->domain) )
         {
-            /*
-             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
-             * and therefore we treat it the same way as a non-priviledged
-             * PV 32-bit domain.
-             */
-            struct compat_cpu_user_regs *cmp;
-
-            gregs = guest_cpu_user_regs();
-
-            cmp = (struct compat_cpu_user_regs *)
-                    &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-            XLAT_cpu_user_regs(cmp, gregs);
+            uint16_t cs = (current->arch.flags & TF_kernel_mode) ? 0 : 0x3;
+
+            /* Store appropriate registers in xenpmu_data */
+            if ( is_pv_32bit_domain(current->domain) )
+            {
+                gregs = guest_cpu_user_regs();
+
+                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
+                     !is_pv_32bit_domain(v->domain) )
+                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                           gregs, sizeof(struct cpu_user_regs));
+                else 
+                {
+                    /*
+                     * 32-bit dom0 cannot process Xen's addresses (which are
+                     * 64 bit) and therefore we treat it the same way as a
+                     * non-priviledged PV 32-bit domain.
+                     */
+
+                    struct compat_cpu_user_regs *cmp;
+
+                    cmp = (struct compat_cpu_user_regs *)
+                        &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+                    XLAT_cpu_user_regs(cmp, gregs);
+                }
+            }
+            else if ( !is_control_domain(current->domain) &&
+                      !is_idle_vcpu(current) )
+            {
+                /* PV guest */
+                gregs = guest_cpu_user_regs();
+                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                       gregs, sizeof(struct cpu_user_regs));
+            }
+            else
+                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                       regs, sizeof(struct cpu_user_regs));
+
+            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+            gregs->cs = cs;
         }
-        else if ( !is_control_domain(current->domain) &&
-                 !is_idle_vcpu(current) )
+        else
         {
-            /* PV guest */
+            /* HVM guest */
+            struct segment_register cs;
+
             gregs = guest_cpu_user_regs();
             memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
                    gregs, sizeof(struct cpu_user_regs));
+
+            hvm_get_segment_register(current, x86_seg_cs, &cs);
+            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+            gregs->cs = cs.attr.fields.dpl;
         }
-        else
-            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                   regs, sizeof(struct cpu_user_regs));
 
         v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
         v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
@@ -444,7 +483,8 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
 
         mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
-        if ( mode & ~XENPMU_MODE_ON )
+        if ( (mode & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
+             ((mode & XENPMU_MODE_ON) && (mode & XENPMU_MODE_PRIV)) )
             return -EINVAL;
 
         vpmu_mode &= ~XENPMU_MODE_MASK;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 7ff8401..1854230 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2510,7 +2510,11 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
             if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
-                goto invalid;
+            {
+                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
+                      is_control_domain(v->domain) )
+                    goto invalid;
+            }
             break;
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index df85209..f715f30 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -56,11 +56,14 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  * - XENPMU_MODE_OFF:   No PMU virtualization
  * - XENPMU_MODE_ON:    Guests can profile themselves, dom0 profiles
  *                      itself and Xen
+ * - XENPMU_MODE_PRIV:  Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
  */
 #define XENPMU_FEATURE_SHIFT      16
 #define XENPMU_MODE_MASK          ((1U << XENPMU_FEATURE_SHIFT) - 1)
 #define XENPMU_MODE_OFF           0
 #define XENPMU_MODE_ON            (1<<0)
+#define XENPMU_MODE_PRIV          (1<<1)
 
 /*
  * PMU features:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (12 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-01-21 19:08 ` Boris Ostrovsky
  2014-02-04 11:38   ` Jan Beulich
  2014-01-21 19:09 ` [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:08 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Save VPMU state during context switch for both HVM and PV guests unless we
are in PMU privileged mode (i.e. dom0 is doing all profiling) and the switched
out domain is not the control domain. The latter condition is needed because
me may have just turned the privileged PMU mode on and thus need to save last
domain.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/domain.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 25572d5..124c0e7 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1444,17 +1444,16 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     }
 
     if (prev != next)
-        update_runstate_area(prev);
-
-    if ( is_hvm_vcpu(prev) )
     {
-        if (prev != next)
+        update_runstate_area(prev);
+        if ( !(vpmu_mode & XENPMU_MODE_PRIV) ||
+             !is_control_domain(prev->domain) )
             vpmu_save(prev);
-
-        if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
-            pt_save_timer(prev);
     }
 
+    if ( is_hvm_vcpu(prev) && !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+        pt_save_timer(prev);
+
     local_irq_disable();
 
     set_current(next);
@@ -1491,7 +1490,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            (next->domain->domain_id != 0));
     }
 
-    if (is_hvm_vcpu(next) && (prev != next) )
+    if ( (prev != next) && !(vpmu_mode & XENPMU_MODE_PRIV) )
         /* Must be done with interrupts enabled */
         vpmu_load(next);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (13 preceding siblings ...)
  2014-01-21 19:08 ` [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-01-21 19:09 ` Boris Ostrovsky
  2014-02-04 11:48   ` Jan Beulich
  2014-01-21 19:09 ` [PATCH v4 16/17] x86/VPMU: Suport for PVH guests Boris Ostrovsky
  2014-01-21 19:09 ` [PATCH v4 17/17] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:09 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Add support for using NMIs as PMU interrupts.

Most of processing is still performed by vpmu_do_interrupt(). However, since
certain operations are not NMI-safe we defer them to a softint that vpmu_do_interrupt()
will schedule:
* For PV guests that would be send_guest_vcpu_virq() and hvm_get_segment_register().
* For HVM guests it's VLAPIC accesses.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vpmu.c | 169 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 135 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 214300d..e76b538 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -36,6 +36,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 #include <public/xenpmu.h>
 
 /*
@@ -48,33 +49,57 @@ static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
 static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
+
+uint32_t vpmu_apic_vector = PMU_APIC_VECTOR;
 
 static void __init parse_vpmu_param(char *s)
 {
-    switch ( parse_bool(s) )
-    {
-    case 0:
-        break;
-    default:
-        if ( !strcmp(s, "bts") )
-            vpmu_mode |= XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT;
-        else if ( *s )
+    char *ss;
+
+    vpmu_mode = XENPMU_MODE_ON;
+    if (*s == '\0')
+        return;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch  (parse_bool(s) )
         {
-            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+        case 0:
+            vpmu_mode = XENPMU_MODE_OFF;
+            return;
+        case -1:
+            if ( !strcmp(s, "nmi") )
+                vpmu_apic_vector = APIC_DM_NMI;
+            else if ( !strcmp(s, "bts") )
+                vpmu_mode |= XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT;
+            else if ( !strcmp(s, "priv") )
+            {
+                vpmu_mode &= ~XENPMU_MODE_ON;
+                vpmu_mode |= XENPMU_MODE_PRIV;
+            }
+            else
+            {
+                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+                vpmu_mode = XENPMU_MODE_OFF;
+                return;
+            }
+        default:
             break;
         }
-        /* fall through */
-    case 1:
-        vpmu_mode |= XENPMU_MODE_ON;
-        break;
-    }
+
+        s = ss + 1;
+    } while ( ss );
 }
 
 void vpmu_lvtpc_update(uint32_t val)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    vpmu->hw_lapic_lvtpc = vpmu_apic_vector | (val & APIC_LVT_MASKED);
 
     /* Postpone APIC updates for PV guests if PMU interrupt is pending */
     if ( !is_pv_domain(current->domain) ||
@@ -83,6 +108,24 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
+static void vpmu_send_nmi(struct vcpu *v)
+{
+    struct vlapic *vlapic = vcpu_vlapic(v);
+    u32 vlapic_lvtpc;
+    unsigned char int_vec;
+
+    if ( !is_vlapic_lvtpc_enabled(vlapic) )
+        return;
+
+    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+    int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
+
+    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+        vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+    else
+        v->nmi_pending = 1;
+}
+
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
@@ -134,6 +177,7 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     return 0;
 }
 
+/* This routine may be called in NMI context */
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -214,9 +258,13 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
                    gregs, sizeof(struct cpu_user_regs));
 
-            hvm_get_segment_register(current, x86_seg_cs, &cs);
-            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-            gregs->cs = cs.attr.fields.dpl;
+            /* This is unsafe in NMI context, we'll do it in softint handler */
+            if ( !(vpmu_apic_vector & APIC_DM_NMI ) )
+            {
+                hvm_get_segment_register(current, x86_seg_cs, &cs);
+                gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+                gregs->cs = cs.attr.fields.dpl;
+            }
         }
 
         v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
@@ -227,29 +275,29 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
 
-        send_guest_vcpu_virq(v, VIRQ_XENPMU);
+        if ( vpmu_apic_vector & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            send_guest_vcpu_virq(v, VIRQ_XENPMU);
 
         return 1;
     }
     else if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(v);
-        u32 vlapic_lvtpc;
-        unsigned char int_vec;
-
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
-        if ( !is_vlapic_lvtpc_enabled(vlapic) )
-            return 1;
-
-        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-        int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
-
-        if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+        if ( vpmu_apic_vector & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
         else
-            v->nmi_pending = 1;
+            vpmu_send_nmi(v);
+
         return 1;
     }
 
@@ -299,7 +347,7 @@ void vpmu_save(struct vcpu *v)
         if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu_apic_vector | APIC_LVT_MASKED);
 }
 
 void vpmu_load(struct vcpu *v)
@@ -414,12 +462,50 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+/* Process the softirq set by PMU NMI handler */
+static void pmu_softnmi(void)
+{
+    struct cpu_user_regs *regs;
+    struct vcpu *v, *sampled = per_cpu(sampled_vcpu, smp_processor_id());
+
+    if ( vpmu_mode & XENPMU_MODE_PRIV ||
+         sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
+    else
+    {
+        if ( is_hvm_domain(sampled->domain) )
+        {
+            vpmu_send_nmi(sampled);
+            return;
+        }
+        v = sampled;
+    }
+
+    regs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+    if ( !is_pv_domain(sampled->domain) )
+    {
+        struct segment_register cs;
+
+        hvm_get_segment_register(sampled, x86_seg_cs, &cs);
+        regs->cs = cs.attr.fields.dpl;
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
+
 static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
 {
     struct vcpu *v;
     struct page_info *page;
     uint64_t gmfn = params->d.val;
-
+    static int pvpmu_initted = 0;
+ 
     if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
         return -EINVAL;
 
@@ -435,6 +521,21 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
         return -EINVAL;
     }
 
+    if ( !pvpmu_initted )
+    {
+        if (reserve_lapic_nmi() == 0)
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            printk("Failed to reserve PMU NMI\n");
+            put_page(page);
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
+
+        pvpmu_initted = 1;
+    }
+
     vpmu_initialise(v);
 
     return 0;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 16/17] x86/VPMU: Suport for PVH guests
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (14 preceding siblings ...)
  2014-01-21 19:09 ` [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-01-21 19:09 ` Boris Ostrovsky
  2014-02-04 11:51   ` Jan Beulich
  2014-01-21 19:09 ` [PATCH v4 17/17] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
  16 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:09 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Add support for PVH guests. Most of operations are performed as in an HVM guest.
However, interrupt management is done in PV-like manner.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/hvm.c     |  3 ++-
 xen/arch/x86/hvm/vmx/vmx.c |  4 +++-
 xen/arch/x86/hvm/vpmu.c    | 22 ++++++++++++++++++----
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 69f7e74..1e50c35 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3451,7 +3451,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
     [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(domctl)
+    HYPERCALL(domctl),
+    HYPERCALL(xenpmu_op)
 };
 
 int hvm_do_hypercall(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index dfff628..59b8ef1 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -112,7 +112,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH will initialize VPMU using PV path */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     vmx_install_vlapic_mapping(v);
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index e76b538..f736de0 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -37,6 +37,7 @@
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
 #include <asm/nmi.h>
+#include <asm/p2m.h>
 #include <public/xenpmu.h>
 
 /*
@@ -194,13 +195,17 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
 
     if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
     {
-        /* PV guest or dom0 is doing system profiling */
+        /* PV(H) guest or dom0 is doing system profiling */
         struct cpu_user_regs *gregs;
         int err;
 
         if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
             return 1;
 
+        if ( is_pvh_domain(current->domain) && !(vpmu_mode & XENPMU_MODE_PRIV) )
+            if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+                return 0;
+
         /* PV guest will be reading PMU MSRs from xenpmu_data */
         vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
         err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
@@ -237,7 +242,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             else if ( !is_control_domain(current->domain) &&
                       !is_idle_vcpu(current) )
             {
-                /* PV guest */
+                /* PV(H) guest */
                 gregs = guest_cpu_user_regs();
                 memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
                        gregs, sizeof(struct cpu_user_regs));
@@ -247,7 +252,15 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
                        regs, sizeof(struct cpu_user_regs));
 
             gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-            gregs->cs = cs;
+            if ( !is_pvh_domain(current->domain) )
+                gregs->cs = cs;
+            else if ( !(vpmu_apic_vector & APIC_DM_NMI) )
+            {
+                struct segment_register seg_cs;
+
+                hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+                gregs->cs = seg_cs.attr.fields.dpl;
+            }
         }
         else
         {
@@ -271,7 +284,8 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
         v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
 
-        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
+        if ( !is_pvh_domain(current->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
+            v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v4 17/17] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
                   ` (15 preceding siblings ...)
  2014-01-21 19:09 ` [PATCH v4 16/17] x86/VPMU: Suport for PVH guests Boris Ostrovsky
@ 2014-01-21 19:09 ` Boris Ostrovsky
  16 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-21 19:09 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, JBeulich, jun.nakajima, boris.ostrovsky

Since PMU is now not HVM specific we can move VPMU-related files up from
arch/x86/hvm/ directory.

Specifically:
    arch/x86/hvm/vpmu.c -> arch/x86/vpmu.c
    arch/x86/hvm/svm/vpmu.c -> arch/x86/vpmu_amd.c
    arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/vpmu_intel.c
    include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/Makefile                 |   1 +
 xen/arch/x86/hvm/Makefile             |   1 -
 xen/arch/x86/hvm/svm/Makefile         |   1 -
 xen/arch/x86/hvm/svm/vpmu.c           | 499 ------------------
 xen/arch/x86/hvm/vlapic.c             |   2 +-
 xen/arch/x86/hvm/vmx/Makefile         |   1 -
 xen/arch/x86/hvm/vmx/vpmu_core2.c     | 936 ----------------------------------
 xen/arch/x86/hvm/vpmu.c               | 671 ------------------------
 xen/arch/x86/oprofile/op_model_ppro.c |   2 +-
 xen/arch/x86/traps.c                  |   2 +-
 xen/arch/x86/vpmu.c                   | 671 ++++++++++++++++++++++++
 xen/arch/x86/vpmu_amd.c               | 499 ++++++++++++++++++
 xen/arch/x86/vpmu_intel.c             | 936 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h    |   2 +-
 xen/include/asm-x86/hvm/vpmu.h        |  98 ----
 xen/include/asm-x86/vpmu.h            |  98 ++++
 16 files changed, 2209 insertions(+), 2211 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d502bdf..cf85dda 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -58,6 +58,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..742b83b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,3 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
index a10a55e..760d295 100644
--- a/xen/arch/x86/hvm/svm/Makefile
+++ b/xen/arch/x86/hvm/svm/Makefile
@@ -6,4 +6,3 @@ obj-y += nestedsvm.o
 obj-y += svm.o
 obj-y += svmdebug.o
 obj-y += vmcb.o
-obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
deleted file mode 100644
index e2bff67..0000000
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ /dev/null
@@ -1,499 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2010, Advanced Micro Devices, Inc.
- * Parts of this code are Copyright (c) 2007, Intel Corporation
- *
- * Author: Wei Wang <wei.wang2@amd.com>
- * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- */
-
-#include <xen/config.h>
-#include <xen/xenoprof.h>
-#include <xen/hvm/save.h>
-#include <xen/sched.h>
-#include <xen/irq.h>
-#include <asm/apic.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vpmu.h>
-#include <public/xenpmu.h>
-
-#define MSR_F10H_EVNTSEL_GO_SHIFT   40
-#define MSR_F10H_EVNTSEL_EN_SHIFT   22
-#define MSR_F10H_COUNTER_LENGTH     48
-
-#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT))
-#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1))))
-
-static unsigned int __read_mostly num_counters;
-static const u32 __read_mostly *counters;
-static const u32 __read_mostly *ctrls;
-static bool_t __read_mostly k7_counters_mirrored;
-
-#define F10H_NUM_COUNTERS   4
-#define F15H_NUM_COUNTERS   6
-#define AMD_MAX_COUNTERS    6
-
-/* PMU Counter MSRs. */
-static const u32 AMD_F10H_COUNTERS[] = {
-    MSR_K7_PERFCTR0,
-    MSR_K7_PERFCTR1,
-    MSR_K7_PERFCTR2,
-    MSR_K7_PERFCTR3
-};
-
-/* PMU Control MSRs. */
-static const u32 AMD_F10H_CTRLS[] = {
-    MSR_K7_EVNTSEL0,
-    MSR_K7_EVNTSEL1,
-    MSR_K7_EVNTSEL2,
-    MSR_K7_EVNTSEL3
-};
-
-static const u32 AMD_F15H_COUNTERS[] = {
-    MSR_AMD_FAM15H_PERFCTR0,
-    MSR_AMD_FAM15H_PERFCTR1,
-    MSR_AMD_FAM15H_PERFCTR2,
-    MSR_AMD_FAM15H_PERFCTR3,
-    MSR_AMD_FAM15H_PERFCTR4,
-    MSR_AMD_FAM15H_PERFCTR5
-};
-
-static const u32 AMD_F15H_CTRLS[] = {
-    MSR_AMD_FAM15H_EVNTSEL0,
-    MSR_AMD_FAM15H_EVNTSEL1,
-    MSR_AMD_FAM15H_EVNTSEL2,
-    MSR_AMD_FAM15H_EVNTSEL3,
-    MSR_AMD_FAM15H_EVNTSEL4,
-    MSR_AMD_FAM15H_EVNTSEL5
-};
-
-static inline int get_pmu_reg_type(u32 addr)
-{
-    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
-        return MSR_TYPE_CTRL;
-
-    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
-        return MSR_TYPE_COUNTER;
-
-    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
-         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
-    {
-        if (addr & 1)
-            return MSR_TYPE_COUNTER;
-        else
-            return MSR_TYPE_CTRL;
-    }
-
-    /* unsupported registers */
-    return -1;
-}
-
-static inline u32 get_fam15h_addr(u32 addr)
-{
-    switch ( addr )
-    {
-    case MSR_K7_PERFCTR0:
-        return MSR_AMD_FAM15H_PERFCTR0;
-    case MSR_K7_PERFCTR1:
-        return MSR_AMD_FAM15H_PERFCTR1;
-    case MSR_K7_PERFCTR2:
-        return MSR_AMD_FAM15H_PERFCTR2;
-    case MSR_K7_PERFCTR3:
-        return MSR_AMD_FAM15H_PERFCTR3;
-    case MSR_K7_EVNTSEL0:
-        return MSR_AMD_FAM15H_EVNTSEL0;
-    case MSR_K7_EVNTSEL1:
-        return MSR_AMD_FAM15H_EVNTSEL1;
-    case MSR_K7_EVNTSEL2:
-        return MSR_AMD_FAM15H_EVNTSEL2;
-    case MSR_K7_EVNTSEL3:
-        return MSR_AMD_FAM15H_EVNTSEL3;
-    default:
-        break;
-    }
-
-    return addr;
-}
-
-static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
-    }
-
-    ctxt->msr_bitmap_set = 1;
-}
-
-static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
-    }
-
-    ctxt->msr_bitmap_set = 0;
-}
-
-static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    return 1;
-}
-
-static inline void context_load(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        wrmsrl(counters[i], counter_regs[i]);
-        wrmsrl(ctrls[i], ctrl_regs[i]);
-    }
-}
-
-static void amd_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-
-    vpmu_reset(vpmu, VPMU_FROZEN);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        unsigned int i;
-	uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctrl_regs[i]);
-
-        return;
-    }
-
-    context_load(v);
-}
-
-static inline void context_save(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-
-    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
-    for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], counter_regs[i]);
-}
-
-static int amd_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctx = vpmu->context;
-    unsigned int i;
-
-    if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], 0);
-
-        vpmu_set(vpmu, VPMU_FROZEN);
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-            return 0;
-
-    context_save(v);
-
-    if ( !is_pv_domain(v->domain) && 
-        !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
-        amd_vpmu_unset_msr_bitmap(v);
-
-    return 1;
-}
-
-static void context_update(unsigned int msr, u64 msr_content)
-{
-    unsigned int i;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    if ( k7_counters_mirrored &&
-        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
-    {
-        msr = get_fam15h_addr(msr);
-    }
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-       if ( msr == ctrls[i] )
-       {
-           ctrl_regs[i] = msr_content;
-           return;
-       }
-        else if (msr == counters[i] )
-        {
-            counter_regs[i] = msr_content;
-            return;
-        }
-    }
-}
-
-static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    /* For all counters, enable guest only mode for HVM guest */
-    if ( !is_pv_domain(v->domain) && (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        !(is_guest_mode(msr_content)) )
-    {
-        set_guest_mode(msr_content);
-    }
-
-    /* check if the first counter is enabled */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 1;
-        vpmu_set(vpmu, VPMU_RUNNING);
-
-        if ( !is_pv_domain(v->domain) &&
-             !((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_set_msr_bitmap(v);
-    }
-
-    /* stop saving & restore if guest stops first counter */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( !is_pv_domain(v->domain) &&
-             ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    /* Update vpmu context immediately */
-    context_update(msr, msr_content);
-
-    /* Write to hw counters */
-    wrmsrl(msr, msr_content);
-    return 1;
-}
-
-static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    rdmsrl(msr, *msr_content);
-
-    return 1;
-}
-
-static int amd_vpmu_initialise(struct vcpu *v)
-{
-    struct xen_pmu_amd_ctxt *ctxt;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return 0;
-
-    if ( counters == NULL )
-    {
-         switch ( family )
-	 {
-	 case 0x15:
-	     num_counters = F15H_NUM_COUNTERS;
-	     counters = AMD_F15H_COUNTERS;
-	     ctrls = AMD_F15H_CTRLS;
-	     k7_counters_mirrored = 1;
-	     break;
-	 case 0x10:
-	 case 0x12:
-	 case 0x14:
-	 case 0x16:
-	 default:
-	     num_counters = F10H_NUM_COUNTERS;
-	     counters = AMD_F10H_COUNTERS;
-	     ctrls = AMD_F10H_CTRLS;
-	     k7_counters_mirrored = 0;
-	     break;
-	 }
-    }
-
-    if ( !is_pv_domain(v->domain) )
-    {
-        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
-                             sizeof(uint64_t) * AMD_MAX_COUNTERS + 
-                             sizeof(uint64_t) * AMD_MAX_COUNTERS);
-        if ( !ctxt )
-        {
-            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-                     " PMU feature is unavailable on domain %d vcpu %d.\n",
-                     v->vcpu_id, v->domain->domain_id);
-            return -ENOMEM;
-        }
-    }
-    else
-        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
-
-    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
-    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
-
-    vpmu->context = ctxt;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    return 0;
-}
-
-static void amd_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( !is_pv_domain(v->domain) )
-    {
-        if ( ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
-
-        xfree(vpmu->context);
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    vpmu->context = NULL;
-    vpmu_clear(vpmu);
-}
-
-/* VPMU part of the 'q' keyhandler */
-static void amd_vpmu_dump(const struct vcpu *v)
-{
-    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-    unsigned int i;
-
-    printk("    VPMU state: 0x%x ", vpmu->flags);
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-    {
-         printk("\n");
-         return;
-    }
-
-    printk("(");
-    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
-        printk("PASSIVE_DOMAIN_ALLOCATED, ");
-    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
-        printk("FROZEN, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        printk("SAVE, ");
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-        printk("RUNNING, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        printk("LOADED, ");
-    printk("ALLOCATED)\n");
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        uint64_t ctrl, cntr;
-
-        rdmsrl(ctrls[i], ctrl);
-        rdmsrl(counters[i], cntr);
-        printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctrl_regs[i], ctrl,
-               counters[i], counter_regs[i], cntr);
-    }
-}
-
-struct arch_vpmu_ops amd_vpmu_ops = {
-    .do_wrmsr = amd_vpmu_do_wrmsr,
-    .do_rdmsr = amd_vpmu_do_rdmsr,
-    .do_interrupt = amd_vpmu_do_interrupt,
-    .arch_vpmu_destroy = amd_vpmu_destroy,
-    .arch_vpmu_save = amd_vpmu_save,
-    .arch_vpmu_load = amd_vpmu_load,
-    .arch_vpmu_dump = amd_vpmu_dump
-};
-
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    int ret = 0;
-
-    /* vpmu enabled? */
-    if ( vpmu_flags == XENPMU_MODE_OFF )
-        return 0;
-
-    switch ( family )
-    {
-    case 0x10:
-    case 0x12:
-    case 0x14:
-    case 0x15:
-    case 0x16:
-        ret = amd_vpmu_initialise(v);
-        if ( !ret )
-            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
-        return ret;
-    }
-
-    printk("VPMU: Initialization failed. "
-           "AMD processor family %d has not "
-           "been supported\n", family);
-    return -EINVAL;
-}
-
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index d954f4f..d49ed3a 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,7 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..04a29ce 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -3,5 +3,4 @@ obj-y += intr.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
-obj-y += vpmu_core2.o
 obj-y += vvmx.o
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
deleted file mode 100644
index 5213c11..0000000
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ /dev/null
@@ -1,936 +0,0 @@
-/*
- * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/irq.h>
-#include <asm/system.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/apic.h>
-#include <asm/traps.h>
-#include <asm/msr.h>
-#include <asm/msr-index.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <public/sched.h>
-#include <public/hvm/save.h>
-#include <public/xenpmu.h>
-#include <asm/hvm/vpmu.h>
-
-/*
- * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
- * instruction.
- * cpuid 0xa - Architectural Performance Monitoring Leaf
- * Register eax
- */
-#define PMU_VERSION_SHIFT        0  /* Version ID */
-#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
-#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT)
-
-#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
-#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
-#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT)
-
-#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
-#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
-#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT)
-/* Register edx */
-#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
-#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
-#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT)
-
-#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
-#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
-#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT)
-
-/* Alias registers (0x4c1) for full-width writes to PMCs */
-#define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
-static bool_t __read_mostly full_width_write;
-
-/* Intel-specific VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-/*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
- */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* Number of general-purpose and fixed performance counters */
-static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
-
-/*
- * QUIRK to workaround an issue on various family 6 cpus.
- * The issue leads to endless PMC interrupt loops on the processor.
- * If the interrupt handler is running and a pmc reaches the value 0, this
- * value remains forever and it triggers immediately a new interrupt after
- * finishing the handler.
- * A workaround is to read all flagged counters and if the value is 0 write
- * 1 (or another value != 0) into it.
- * There exist no errata and the real cause of this behaviour is unknown.
- */
-bool_t __read_mostly is_pmc_quirk;
-
-static void check_pmc_quirk(void)
-{
-    if ( current_cpu_data.x86 == 6 )
-        is_pmc_quirk = 1;
-    else
-        is_pmc_quirk = 0;    
-}
-
-static void handle_pmc_quirk(u64 msr_content)
-{
-    int i;
-    u64 val;
-
-    if ( !is_pmc_quirk )
-        return;
-
-    val = msr_content;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-    val = msr_content >> 32;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-}
-
-/*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
- */
-static int core2_get_arch_pmc_count(void)
-{
-    u32 eax;
-
-    eax = cpuid_eax(0xa);
-    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
-}
-
-/*
- * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
- */
-static int core2_get_fixed_pmc_count(void)
-{
-    u32 eax;
-
-    eax = cpuid_eax(0xa);
-    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
-}
-
-/* edx bits 5-12: Bit width of fixed-function performance counters  */
-static int core2_get_bitwidth_fix_count(void)
-{
-    u32 edx;
-
-    edx = cpuid_edx(0xa);
-    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
-}
-
-static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
-{
-    int i;
-    u32 msr_index_pmc;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
-        {
-            *type = MSR_TYPE_COUNTER;
-            *index = i;
-            return 1;
-        }
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
-        (msr_index == MSR_IA32_DS_AREA) ||
-        (msr_index == MSR_IA32_PEBS_ENABLE) )
-    {
-        *type = MSR_TYPE_CTRL;
-        return 1;
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
-    {
-        *type = MSR_TYPE_GLOBAL;
-        return 1;
-    }
-
-    msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
-    if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-         (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
-    {
-        *type = MSR_TYPE_ARCH_COUNTER;
-        *index = msr_index_pmc - MSR_IA32_PERFCTR0;
-        return 1;
-    }
-
-    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
-    {
-        *type = MSR_TYPE_ARCH_CTRL;
-        *index = msr_index - MSR_P6_EVNTSEL0;
-        return 1;
-    }
-
-    return 0;
-}
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
-static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-
-        if ( full_width_write )
-        {
-            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
-            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
-                      msr_bitmap + 0x800/BYTES_PER_LONG);
-        }
-    }
-
-    /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
-
-    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-
-        if ( full_width_write )
-        {
-            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
-            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
-                      msr_bitmap + 0x800/BYTES_PER_LONG);
-        }
-    }
-
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
-
-    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static inline void __core2_vpmu_save(struct vcpu *v)
-{
-    int i;
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
-
-    if ( is_pv_domain(v->domain) )
-        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
-}
-
-static int core2_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
-        return 0;
-
-    if ( is_pv_domain(v->domain) )
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-    __core2_vpmu_save(v);
-
-    /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap
-        && !is_pv_domain(v->domain) )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-
-    return 1;
-}
-
-static inline void __core2_vpmu_load(struct vcpu *v)
-{
-    unsigned int i, pmc_start;
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
-
-    if ( full_width_write )
-        pmc_start = MSR_IA32_A_PERFCTR0;
-    else
-        pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
-    }
-
-    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
-    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
-    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
-
-    if ( is_pv_domain(v->domain) )
-    {
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
-        core2_vpmu_cxt->global_ovf_ctrl = 0;
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-    }
-}
-
-static void core2_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return;
-
-    __core2_vpmu_load(v);
-}
-
-static int core2_vpmu_alloc_resource(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
-
-    if ( !is_pv_domain(v->domain) )
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 0;
-
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-            goto out_err;
-
-        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-            goto out_err;
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
-                                       sizeof(uint64_t) * fixed_pmc_cnt +
-                                       sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt);
-        if ( !core2_vpmu_cxt )
-            goto out_err;
-    }
-    else
-    {
-        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
-        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    }
-
-    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
-    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
-      sizeof(uint64_t) * fixed_pmc_cnt;
-
-    vpmu->context = (void *)core2_vpmu_cxt;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-
-    return 1;
-
-out_err:
-    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
-    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
-    release_pmu_ownship(PMU_OWNER_HVM);
-
-    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
-           v->vcpu_id, v->domain->domain_id);
-
-    return 0;
-}
-
-static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( !is_core2_vpmu_msr(msr_index, type, index) )
-        return 0;
-
-    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-         !core2_vpmu_alloc_resource(current) )
-        return 0;
-
-    /* Do the lazy load staff. */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        __core2_vpmu_load(current);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap && !is_pv_domain(current->domain) )
-            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
-    }
-    return 1;
-}
-
-static void inject_trap(struct vcpu *v, unsigned int trapno)
-{
-    if ( !is_pv_domain(v->domain) )
-        hvm_inject_hw_exception(trapno, 0);
-    else
-        send_guest_trap(v->domain, v->vcpu_id, trapno);
-}
-
-static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    u64 global_ctrl, non_global_ctrl;
-    unsigned pmu_enable = 0;
-    int i, tmp;
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-
-    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        /* Special handling for BTS */
-        if ( msr == MSR_IA32_DEBUGCTLMSR )
-        {
-            uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
-                                 IA32_DEBUGCTLMSR_BTINT;
-
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
-                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
-            if ( msr_content & supported )
-            {
-                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                    return 1;
-                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
-                inject_trap(v, TRAP_gp_fault);
-                return 0;
-            }
-        }
-        return 0;
-    }
-
-    core2_vpmu_cxt = vpmu->context;
-    switch ( msr )
-    {
-    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_status &= ~msr_content;
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-        return 1;
-    case MSR_CORE_PERF_GLOBAL_STATUS:
-        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
-                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        inject_trap(v, TRAP_gp_fault);
-        return 1;
-    case MSR_IA32_PEBS_ENABLE:
-        if ( msr_content & 1 )
-            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
-                     "which is not supported.\n");
-        core2_vpmu_cxt->pebs_enable = msr_content;
-        return 1;
-    case MSR_IA32_DS_AREA:
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            if ( !is_canonical_address(msr_content) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
-                         msr_content);
-                inject_trap(v, TRAP_gp_fault);
-                return 1;
-            }
-            core2_vpmu_cxt->ds_area = msr_content;
-            break;
-        }
-        gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
-        return 1;
-    case MSR_CORE_PERF_GLOBAL_CTRL:
-        global_ctrl = msr_content;
-        for ( i = 0; i < arch_pmc_cnt; i++ )
-        {
-            rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
-            pmu_enable += global_ctrl & (non_global_ctrl >> 22) & 1;
-            global_ctrl >>= 1;
-        }
-
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
-        global_ctrl = msr_content >> 32;
-        for ( i = 0; i < fixed_pmc_cnt; i++ )
-        {
-            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1 : 0);
-            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
-            global_ctrl >>= 1;
-        }
-        core2_vpmu_cxt->global_ctrl = msr_content;
-        break;
-    case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        non_global_ctrl = msr_content;
-        if ( !is_pv_domain(v->domain) )
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        else
-            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-        global_ctrl >>= 32;
-        for ( i = 0; i < fixed_pmc_cnt; i++ )
-        {
-            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1 : 0);
-            non_global_ctrl >>= 4;
-            global_ctrl >>= 1;
-        }
-        core2_vpmu_cxt->fixed_ctrl = msr_content;
-        break;
-    default:
-        tmp = msr - MSR_P6_EVNTSEL0;
-        if ( tmp >= 0 && tmp < arch_pmc_cnt )
-        {
-            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-            if ( !is_pv_domain(v->domain) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-            xen_pmu_cntr_pair[tmp].control = msr_content;
-            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
-                pmu_enable += (global_ctrl >> i) &
-                    (xen_pmu_cntr_pair[i].control >> 22) & 1;
-        }
-    }
-
-    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
-    if ( pmu_enable )
-        vpmu_set(vpmu, VPMU_RUNNING);
-    else
-        vpmu_reset(vpmu, VPMU_RUNNING);
-
-    if ( type != MSR_TYPE_GLOBAL )
-    {
-        u64 mask;
-        int inject_gp = 0;
-        switch ( type )
-        {
-        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
-            mask = ~((1ull << 32) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
-            if  ( msr == MSR_IA32_DS_AREA )
-                break;
-            /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
-            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        }
-
-        if (inject_gp) 
-            inject_trap(v, TRAP_gp_fault);
-        else
-            wrmsrl(msr, msr_content);
-    }
-    else
-    {
-       if ( !is_pv_domain(v->domain) )
-           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-       else
-           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-    }
-
-    return 1;
-}
-
-static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-
-    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        core2_vpmu_cxt = vpmu->context;
-        switch ( msr )
-        {
-        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-            *msr_content = 0;
-            break;
-        case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_status;
-            break;
-        case MSR_CORE_PERF_GLOBAL_CTRL:
-            if ( !is_pv_domain(v->domain) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
-            break;
-        default:
-            rdmsrl(msr, *msr_content);
-        }
-    }
-    else
-    {
-        /* Extension for BTS */
-        if ( msr == MSR_IA32_MISC_ENABLE )
-        {
-            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-        }
-        else
-            return 0;
-    }
-
-    return 1;
-}
-
-static void core2_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    if (input == 0x1)
-    {
-        struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            /* Switch on the 'Debug Store' feature in CPUID.EAX[1]:EDX[21] */
-            *edx |= cpufeat_mask(X86_FEATURE_DS);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
-        }
-    }
-}
-
-/* Dump vpmu info on console, called in the context of keyhandler 'q'. */
-static void core2_vpmu_dump(const struct vcpu *v)
-{
-    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i;
-    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-    u64 val;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-         return;
-
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-            printk("    vPMU loaded\n");
-        else
-            printk("    vPMU allocated\n");
-        return;
-    }
-
-    printk("    vPMU running\n");
-    core2_vpmu_cxt = vpmu->context;
-
-    /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-            i, xen_pmu_cntr_pair[i].counter, xen_pmu_cntr_pair[i].control);
-
-    /*
-     * The configuration of the fixed counter is 4 bits each in the
-     * MSR_CORE_PERF_FIXED_CTR_CTRL.
-     */
-    val = core2_vpmu_cxt->fixed_ctrl;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, fixed_counters[i],
-               val & FIXED_CTR_CTRL_MASK);
-        val >>= FIXED_CTR_CTRL_BITS;
-    }
-}
-
-static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    u64 msr_content;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
-
-    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
-    if ( msr_content )
-    {
-        if ( is_pmc_quirk )
-            handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-    }
-    else
-    {
-        /* No PMC overflow but perhaps a Trace Message interrupt. */
-        __vmread(GUEST_IA32_DEBUGCTL, &msr_content);
-        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
-            return 0;
-    }
-
-    return 1;
-}
-
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    u64 msr_content;
-    struct cpuinfo_x86 *c = &current_cpu_data;
-
-    if ( !(vpmu_flags & (XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT)) )
-        goto func_out;
-    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
-    if ( cpu_has(c, X86_FEATURE_DS) )
-    {
-        if ( !cpu_has(c, X86_FEATURE_DTES64) )
-        {
-            printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
-                   " - Debug Store disabled for d%d:v%d\n",
-                   v->domain->domain_id, v->vcpu_id);
-            goto func_out;
-        }
-        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
-        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
-        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
-        {
-            /* If BTS_UNAVAIL is set reset the DS feature. */
-            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
-            printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
-                   " - Debug Store disabled for d%d:v%d\n",
-                   v->domain->domain_id, v->vcpu_id);
-        }
-        else
-        {
-            vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
-            if ( !cpu_has(c, X86_FEATURE_DSCPL) )
-                printk(XENLOG_G_INFO
-                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
-            printk("******************************************************\n");
-            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
-            printk("** Using this processor feature in a virtualized    **\n");
-            printk("** environment is not 100%% safe.                    **\n");
-            printk("** Setting the DS buffer address with wrong values  **\n");
-            printk("** may lead to hypervisor hangs or crashes.         **\n");
-            printk("** It is NOT recommended for production use!        **\n");
-            printk("******************************************************\n");
-        }
-    }
-func_out:
-
-    arch_pmc_cnt = core2_get_arch_pmc_count();
-    fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    check_pmc_quirk();
-
-    /* PV domains can allocate resources immediately */
-    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
-            return 1;
-
-    return 0;
-}
-
-static void core2_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( !is_pv_domain(v->domain) )
-    {
-        xfree(vpmu->context);
-        if ( cpu_has_vmx_msr_bitmap )
-            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-    }
-
-    release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_clear(vpmu);
-}
-
-struct arch_vpmu_ops core2_vpmu_ops = {
-    .do_wrmsr = core2_vpmu_do_wrmsr,
-    .do_rdmsr = core2_vpmu_do_rdmsr,
-    .do_interrupt = core2_vpmu_do_interrupt,
-    .do_cpuid = core2_vpmu_do_cpuid,
-    .arch_vpmu_destroy = core2_vpmu_destroy,
-    .arch_vpmu_save = core2_vpmu_save,
-    .arch_vpmu_load = core2_vpmu_load,
-    .arch_vpmu_dump = core2_vpmu_dump
-};
-
-static void core2_no_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    /*
-     * As in this case the vpmu is not enabled reset some bits in the
-     * architectural performance monitoring related part.
-     */
-    if ( input == 0xa )
-    {
-        *eax &= ~PMU_VERSION_MASK;
-        *eax &= ~PMU_GENERAL_NR_MASK;
-        *eax &= ~PMU_GENERAL_WIDTH_MASK;
-
-        *edx &= ~PMU_FIXED_NR_MASK;
-        *edx &= ~PMU_FIXED_WIDTH_MASK;
-    }
-}
-
-/*
- * If its a vpmu msr set it to 0.
- */
-static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    if ( !is_core2_vpmu_msr(msr, &type, &index) )
-        return 0;
-    *msr_content = 0;
-    return 1;
-}
-
-/*
- * These functions are used in case vpmu is not enabled.
- */
-struct arch_vpmu_ops core2_no_vpmu_ops = {
-    .do_rdmsr = core2_no_vpmu_do_rdmsr,
-    .do_cpuid = core2_no_vpmu_do_cpuid,
-};
-
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    uint8_t cpu_model = current_cpu_data.x86_model;
-    int ret = 0;
-
-    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( vpmu_flags == XENPMU_MODE_OFF )
-        return 0;
-
-    if ( family == 6 )
-    {
-        u64 caps;
-
-        rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
-        full_width_write = (caps >> 13) & 1;
-
-        switch ( cpu_model )
-        {
-        /* Core2: */
-        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
-        case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
-        case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
-        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
-
-        case 0x2a: /* SandyBridge */
-        case 0x2d: /* SandyBridge, "Romley-EP" */
-
-        /* Nehalem: */
-        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
-        case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */
-        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
-
-        /* Westmere: */
-        case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */
-        case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */
-        case 0x27: /* 32 nm Westmere-EX */
-
-        case 0x3a: /* IvyBridge */
-        case 0x3e: /* IvyBridge EP */
-
-        /* Haswell: */
-        case 0x3c:
-        case 0x3f:
-        case 0x45:
-        case 0x46:
-            ret = core2_vpmu_initialise(v, vpmu_flags);
-            if ( !ret )
-                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
-            return ret;
-        }
-    }
-
-    printk("VPMU: Initialization failed. "
-           "Intel processor family %d model %d has not "
-           "been supported\n", family, cpu_model);
-    return -EINVAL;
-}
-
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
deleted file mode 100644
index f736de0..0000000
--- a/xen/arch/x86/hvm/vpmu.c
+++ /dev/null
@@ -1,671 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/event.h>
-#include <xen/softirq.h>
-#include <xen/hypercall.h>
-#include <xen/guest_access.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/msr.h>
-#include <asm/p2m.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <asm/hvm/vpmu.h>
-#include <asm/hvm/svm/svm.h>
-#include <asm/hvm/svm/vmcb.h>
-#include <asm/apic.h>
-#include <asm/nmi.h>
-#include <asm/p2m.h>
-#include <public/xenpmu.h>
-
-/*
- * "vpmu" :     vpmu generally enabled
- * "vpmu=off" : vpmu generally disabled
- * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
- */
-uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
-static void parse_vpmu_param(char *s);
-custom_param("vpmu", parse_vpmu_param);
-
-static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
-static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
-
-uint32_t vpmu_apic_vector = PMU_APIC_VECTOR;
-
-static void __init parse_vpmu_param(char *s)
-{
-    char *ss;
-
-    vpmu_mode = XENPMU_MODE_ON;
-    if (*s == '\0')
-        return;
-
-    do {
-        ss = strchr(s, ',');
-        if ( ss )
-            *ss = '\0';
-
-        switch  (parse_bool(s) )
-        {
-        case 0:
-            vpmu_mode = XENPMU_MODE_OFF;
-            return;
-        case -1:
-            if ( !strcmp(s, "nmi") )
-                vpmu_apic_vector = APIC_DM_NMI;
-            else if ( !strcmp(s, "bts") )
-                vpmu_mode |= XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT;
-            else if ( !strcmp(s, "priv") )
-            {
-                vpmu_mode &= ~XENPMU_MODE_ON;
-                vpmu_mode |= XENPMU_MODE_PRIV;
-            }
-            else
-            {
-                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
-                vpmu_mode = XENPMU_MODE_OFF;
-                return;
-            }
-        default:
-            break;
-        }
-
-        s = ss + 1;
-    } while ( ss );
-}
-
-void vpmu_lvtpc_update(uint32_t val)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    vpmu->hw_lapic_lvtpc = vpmu_apic_vector | (val & APIC_LVT_MASKED);
-
-    /* Postpone APIC updates for PV guests if PMU interrupt is pending */
-    if ( !is_pv_domain(current->domain) ||
-         !(current->arch.vpmu.xenpmu_data &&
-           current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
-        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-}
-
-static void vpmu_send_nmi(struct vcpu *v)
-{
-    struct vlapic *vlapic = vcpu_vlapic(v);
-    u32 vlapic_lvtpc;
-    unsigned char int_vec;
-
-    if ( !is_vlapic_lvtpc_enabled(vlapic) )
-        return;
-
-    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-    int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
-
-    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-        vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
-    else
-        v->nmi_pending = 1;
-}
-
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) && !is_control_domain(current->domain) )
-        return 0;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
-
-        /*
-         * We may have received a PMU interrupt during WRMSR handling
-         * and since do_wrmsr may load VPMU context we should save
-         * (and unload) it again.
-         */
-        if ( !is_hvm_domain(current->domain) &&
-            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
-    }
-    return 0;
-}
-
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) && !is_control_domain(current->domain) )
-        return 0;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
-
-        if ( !is_hvm_domain(current->domain) &&
-            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
-    }
-    return 0;
-}
-
-/* This routine may be called in NMI context */
-int vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu;
-
-    /* dom0 will handle this interrupt */
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
-         (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
-        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
-
-    vpmu = vcpu_vpmu(v);
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return 0;
-
-    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
-    {
-        /* PV(H) guest or dom0 is doing system profiling */
-        struct cpu_user_regs *gregs;
-        int err;
-
-        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
-            return 1;
-
-        if ( is_pvh_domain(current->domain) && !(vpmu_mode & XENPMU_MODE_PRIV) )
-            if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
-                return 0;
-
-        /* PV guest will be reading PMU MSRs from xenpmu_data */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
-        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-
-        if ( !is_hvm_domain(current->domain) )
-        {
-            uint16_t cs = (current->arch.flags & TF_kernel_mode) ? 0 : 0x3;
-
-            /* Store appropriate registers in xenpmu_data */
-            if ( is_pv_32bit_domain(current->domain) )
-            {
-                gregs = guest_cpu_user_regs();
-
-                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
-                     !is_pv_32bit_domain(v->domain) )
-                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                           gregs, sizeof(struct cpu_user_regs));
-                else 
-                {
-                    /*
-                     * 32-bit dom0 cannot process Xen's addresses (which are
-                     * 64 bit) and therefore we treat it the same way as a
-                     * non-priviledged PV 32-bit domain.
-                     */
-
-                    struct compat_cpu_user_regs *cmp;
-
-                    cmp = (struct compat_cpu_user_regs *)
-                        &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-                    XLAT_cpu_user_regs(cmp, gregs);
-                }
-            }
-            else if ( !is_control_domain(current->domain) &&
-                      !is_idle_vcpu(current) )
-            {
-                /* PV(H) guest */
-                gregs = guest_cpu_user_regs();
-                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                       gregs, sizeof(struct cpu_user_regs));
-            }
-            else
-                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                       regs, sizeof(struct cpu_user_regs));
-
-            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-            if ( !is_pvh_domain(current->domain) )
-                gregs->cs = cs;
-            else if ( !(vpmu_apic_vector & APIC_DM_NMI) )
-            {
-                struct segment_register seg_cs;
-
-                hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
-                gregs->cs = seg_cs.attr.fields.dpl;
-            }
-        }
-        else
-        {
-            /* HVM guest */
-            struct segment_register cs;
-
-            gregs = guest_cpu_user_regs();
-            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                   gregs, sizeof(struct cpu_user_regs));
-
-            /* This is unsafe in NMI context, we'll do it in softint handler */
-            if ( !(vpmu_apic_vector & APIC_DM_NMI ) )
-            {
-                hvm_get_segment_register(current, x86_seg_cs, &cs);
-                gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-                gregs->cs = cs.attr.fields.dpl;
-            }
-        }
-
-        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
-        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
-        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
-
-        if ( !is_pvh_domain(current->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
-            v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
-        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
-
-        if ( vpmu_apic_vector & APIC_DM_NMI )
-        {
-            per_cpu(sampled_vcpu, smp_processor_id()) = current;
-            raise_softirq(PMU_SOFTIRQ);
-        }
-        else
-            send_guest_vcpu_virq(v, VIRQ_XENPMU);
-
-        return 1;
-    }
-    else if ( vpmu->arch_vpmu_ops )
-    {
-        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
-            return 0;
-
-        if ( vpmu_apic_vector & APIC_DM_NMI )
-        {
-            per_cpu(sampled_vcpu, smp_processor_id()) = current;
-            raise_softirq(PMU_SOFTIRQ);
-        }
-        else
-            vpmu_send_nmi(v);
-
-        return 1;
-    }
-
-    return 0;
-}
-
-void vpmu_do_cpuid(unsigned int input,
-                   unsigned int *eax, unsigned int *ebx,
-                   unsigned int *ecx, unsigned int *edx)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid )
-        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
-}
-
-static void vpmu_save_force(void *arg)
-{
-    struct vcpu *v = (struct vcpu *)arg;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
-        return;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-
-    if ( vpmu->arch_vpmu_ops )
-        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
-
-    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-
-    per_cpu(last_vcpu, smp_processor_id()) = NULL;
-}
-
-void vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-
-    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
-       return;
-
-    vpmu->last_pcpu = pcpu;
-    per_cpu(last_vcpu, pcpu) = v;
-
-    if ( vpmu->arch_vpmu_ops )
-        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-    apic_write(APIC_LVTPC, vpmu_apic_vector | APIC_LVT_MASKED);
-}
-
-void vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-    struct vcpu *prev = NULL;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    /* First time this VCPU is running here */
-    if ( vpmu->last_pcpu != pcpu )
-    {
-        /*
-         * Get the context from last pcpu that we ran on. Note that if another
-         * VCPU is running there it must have saved this VPCU's context before
-         * startig to run (see below).
-         * There should be no race since remote pcpu will disable interrupts
-         * before saving the context.
-         */
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-                             vpmu_save_force, (void *)v, 1);
-    } 
-
-    /* Prevent forced context save from remote CPU */
-    local_irq_disable();
-
-    prev = per_cpu(last_vcpu, pcpu);
-
-    if ( prev != v && prev )
-    {
-        vpmu = vcpu_vpmu(prev);
-
-        /* Someone ran here before us */
-        vpmu_save_force(prev);
-
-        vpmu = vcpu_vpmu(v);
-    }
-
-    local_irq_enable();
-
-    /* 
-     * Only when PMU is counting and is not cached (for PV guests) do
-     * we load PMU context immediately.
-     */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
-         (is_pv_domain(v->domain) &&
-          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
-        return;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
-    {
-        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
-    }
-
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-}
-
-void vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t vendor = current_cpu_data.x86_vendor;
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        vpmu_destroy(v);
-    vpmu_clear(vpmu);
-    vpmu->context = NULL;
-
-    switch ( vendor )
-    {
-    case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, vpmu_mode) != 0 )
-            vpmu_mode = XENPMU_MODE_OFF;
-        return;
-
-    case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 )
-            vpmu_mode = XENPMU_MODE_OFF;
-        return;
-
-    default:
-        printk("VPMU: Initialization failed. "
-               "Unknown CPU vendor %d\n", vendor);
-        vpmu_mode = XENPMU_MODE_OFF;
-        return;
-    }
-}
-
-void vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
-    {
-        /* Unload VPMU first. This will stop counters */
-        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
-                         vpmu_save_force, (void *)v, 1);
-
-        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
-    }
-}
-
-/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
-void vpmu_dump(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump )
-        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
-}
-
-/* Process the softirq set by PMU NMI handler */
-static void pmu_softnmi(void)
-{
-    struct cpu_user_regs *regs;
-    struct vcpu *v, *sampled = per_cpu(sampled_vcpu, smp_processor_id());
-
-    if ( vpmu_mode & XENPMU_MODE_PRIV ||
-         sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
-        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
-    else
-    {
-        if ( is_hvm_domain(sampled->domain) )
-        {
-            vpmu_send_nmi(sampled);
-            return;
-        }
-        v = sampled;
-    }
-
-    regs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
-    if ( !is_pv_domain(sampled->domain) )
-    {
-        struct segment_register cs;
-
-        hvm_get_segment_register(sampled, x86_seg_cs, &cs);
-        regs->cs = cs.attr.fields.dpl;
-    }
-
-    send_guest_vcpu_virq(v, VIRQ_XENPMU);
-}
-
-int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
-{
-    return vpmu_do_interrupt(regs);
-}
-
-
-static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
-{
-    struct vcpu *v;
-    struct page_info *page;
-    uint64_t gmfn = params->d.val;
-    static int pvpmu_initted = 0;
- 
-    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
-        return -EINVAL;
-
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-    if ( !page )
-        return -EINVAL;
-
-    v = d->vcpu[params->vcpu];
-    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
-    if ( !v->arch.vpmu.xenpmu_data )
-    {
-        put_page(page);
-        return -EINVAL;
-    }
-
-    if ( !pvpmu_initted )
-    {
-        if (reserve_lapic_nmi() == 0)
-            set_nmi_callback(pmu_nmi_interrupt);
-        else
-        {
-            printk("Failed to reserve PMU NMI\n");
-            put_page(page);
-            return -EBUSY;
-        }
-        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
-
-        pvpmu_initted = 1;
-    }
-
-    vpmu_initialise(v);
-
-    return 0;
-}
-
-static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
-{
-    struct vcpu *v;
-    uint64_t mfn;
-
-    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
-        return;
-
-    v = d->vcpu[params->vcpu];
-    if (v != current)
-        vcpu_pause(v);
-
-    if ( v->arch.vpmu.xenpmu_data )
-    {
-        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
-        if ( mfn_valid(mfn) )
-        {
-            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
-            put_page(mfn_to_page(mfn));
-        }
-    }
-    vpmu_destroy(v);
-
-    if (v != current)
-        vcpu_unpause(v);
-}
-
-long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
-{
-    int ret = -EINVAL;
-    xen_pmu_params_t pmu_params;
-    uint32_t mode;
-
-    switch ( op )
-    {
-    case XENPMU_mode_set:
-        if ( !is_control_domain(current->domain) )
-            return -EPERM;
-
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
-        if ( (mode & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
-             ((mode & XENPMU_MODE_ON) && (mode & XENPMU_MODE_PRIV)) )
-            return -EINVAL;
-
-        vpmu_mode &= ~XENPMU_MODE_MASK;
-        vpmu_mode |= mode;
-
-        ret = 0;
-        break;
-
-    case XENPMU_mode_get:
-        pmu_params.d.val = vpmu_mode & XENPMU_MODE_MASK;
-        pmu_params.v.version.maj = XENPMU_VER_MAJ;
-        pmu_params.v.version.min = XENPMU_VER_MIN;
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-        ret = 0;
-        break;
-
-    case XENPMU_feature_set:
-        if ( !is_control_domain(current->domain) )
-            return -EPERM;
-
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        if ( (uint32_t)pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
-            return -EINVAL;
-
-        vpmu_mode &= ~XENPMU_FEATURE_MASK;
-        vpmu_mode |= (uint32_t)pmu_params.d.val << XENPMU_FEATURE_SHIFT;
-
-        ret = 0;
-        break;
-
-    case XENPMU_feature_get:
-        pmu_params.d.val = vpmu_mode & XENPMU_FEATURE_MASK;
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-        ret = 0;
-        break;
-
-    case XENPMU_init:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-        ret = pvpmu_init(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_finish:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-        pvpmu_finish(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_lvtpc_set:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        vpmu_lvtpc_update((uint32_t)pmu_params.d.val);
-        ret = 0;
-        break;
-    case XENPMU_flush:
-        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
-        vpmu_load(current);
-        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
-        ret = 0;
-        break;
-    }
-
-    return ret;
-}
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index 5aae2e7..bf5d9a5 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -19,7 +19,7 @@
 #include <asm/processor.h>
 #include <asm/regs.h>
 #include <asm/current.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 1854230..11f6821 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,7 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
diff --git a/xen/arch/x86/vpmu.c b/xen/arch/x86/vpmu.c
new file mode 100644
index 0000000..f736de0
--- /dev/null
+++ b/xen/arch/x86/vpmu.c
@@ -0,0 +1,671 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/msr.h>
+#include <asm/p2m.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <asm/hvm/vpmu.h>
+#include <asm/hvm/svm/svm.h>
+#include <asm/hvm/svm/vmcb.h>
+#include <asm/apic.h>
+#include <asm/nmi.h>
+#include <asm/p2m.h>
+#include <public/xenpmu.h>
+
+/*
+ * "vpmu" :     vpmu generally enabled
+ * "vpmu=off" : vpmu generally disabled
+ * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
+ */
+uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+static void parse_vpmu_param(char *s);
+custom_param("vpmu", parse_vpmu_param);
+
+static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
+
+uint32_t vpmu_apic_vector = PMU_APIC_VECTOR;
+
+static void __init parse_vpmu_param(char *s)
+{
+    char *ss;
+
+    vpmu_mode = XENPMU_MODE_ON;
+    if (*s == '\0')
+        return;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch  (parse_bool(s) )
+        {
+        case 0:
+            vpmu_mode = XENPMU_MODE_OFF;
+            return;
+        case -1:
+            if ( !strcmp(s, "nmi") )
+                vpmu_apic_vector = APIC_DM_NMI;
+            else if ( !strcmp(s, "bts") )
+                vpmu_mode |= XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT;
+            else if ( !strcmp(s, "priv") )
+            {
+                vpmu_mode &= ~XENPMU_MODE_ON;
+                vpmu_mode |= XENPMU_MODE_PRIV;
+            }
+            else
+            {
+                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+                vpmu_mode = XENPMU_MODE_OFF;
+                return;
+            }
+        default:
+            break;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    vpmu->hw_lapic_lvtpc = vpmu_apic_vector | (val & APIC_LVT_MASKED);
+
+    /* Postpone APIC updates for PV guests if PMU interrupt is pending */
+    if ( !is_pv_domain(current->domain) ||
+         !(current->arch.vpmu.xenpmu_data &&
+           current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
+static void vpmu_send_nmi(struct vcpu *v)
+{
+    struct vlapic *vlapic = vcpu_vlapic(v);
+    u32 vlapic_lvtpc;
+    unsigned char int_vec;
+
+    if ( !is_vlapic_lvtpc_enabled(vlapic) )
+        return;
+
+    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+    int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
+
+    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+        vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+    else
+        v->nmi_pending = 1;
+}
+
+int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && !is_control_domain(current->domain) )
+        return 0;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
+    {
+        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+
+        /*
+         * We may have received a PMU interrupt during WRMSR handling
+         * and since do_wrmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_domain(current->domain) &&
+            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
+    return 0;
+}
+
+int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && !is_control_domain(current->domain) )
+        return 0;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
+    {
+        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+        if ( !is_hvm_domain(current->domain) &&
+            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
+    return 0;
+}
+
+/* This routine may be called in NMI context */
+int vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu;
+
+    /* dom0 will handle this interrupt */
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+         (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
+        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
+
+    vpmu = vcpu_vpmu(v);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
+    {
+        /* PV(H) guest or dom0 is doing system profiling */
+        struct cpu_user_regs *gregs;
+        int err;
+
+        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
+            return 1;
+
+        if ( is_pvh_domain(current->domain) && !(vpmu_mode & XENPMU_MODE_PRIV) )
+            if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+                return 0;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        if ( !is_hvm_domain(current->domain) )
+        {
+            uint16_t cs = (current->arch.flags & TF_kernel_mode) ? 0 : 0x3;
+
+            /* Store appropriate registers in xenpmu_data */
+            if ( is_pv_32bit_domain(current->domain) )
+            {
+                gregs = guest_cpu_user_regs();
+
+                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
+                     !is_pv_32bit_domain(v->domain) )
+                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                           gregs, sizeof(struct cpu_user_regs));
+                else 
+                {
+                    /*
+                     * 32-bit dom0 cannot process Xen's addresses (which are
+                     * 64 bit) and therefore we treat it the same way as a
+                     * non-priviledged PV 32-bit domain.
+                     */
+
+                    struct compat_cpu_user_regs *cmp;
+
+                    cmp = (struct compat_cpu_user_regs *)
+                        &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+                    XLAT_cpu_user_regs(cmp, gregs);
+                }
+            }
+            else if ( !is_control_domain(current->domain) &&
+                      !is_idle_vcpu(current) )
+            {
+                /* PV(H) guest */
+                gregs = guest_cpu_user_regs();
+                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                       gregs, sizeof(struct cpu_user_regs));
+            }
+            else
+                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                       regs, sizeof(struct cpu_user_regs));
+
+            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+            if ( !is_pvh_domain(current->domain) )
+                gregs->cs = cs;
+            else if ( !(vpmu_apic_vector & APIC_DM_NMI) )
+            {
+                struct segment_register seg_cs;
+
+                hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+                gregs->cs = seg_cs.attr.fields.dpl;
+            }
+        }
+        else
+        {
+            /* HVM guest */
+            struct segment_register cs;
+
+            gregs = guest_cpu_user_regs();
+            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                   gregs, sizeof(struct cpu_user_regs));
+
+            /* This is unsafe in NMI context, we'll do it in softint handler */
+            if ( !(vpmu_apic_vector & APIC_DM_NMI ) )
+            {
+                hvm_get_segment_register(current, x86_seg_cs, &cs);
+                gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+                gregs->cs = cs.attr.fields.dpl;
+            }
+        }
+
+        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
+        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
+        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
+
+        if ( !is_pvh_domain(current->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
+            v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+
+        if ( vpmu_apic_vector & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            send_guest_vcpu_virq(v, VIRQ_XENPMU);
+
+        return 1;
+    }
+    else if ( vpmu->arch_vpmu_ops )
+    {
+        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        if ( vpmu_apic_vector & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            vpmu_send_nmi(v);
+
+        return 1;
+    }
+
+    return 0;
+}
+
+void vpmu_do_cpuid(unsigned int input,
+                   unsigned int *eax, unsigned int *ebx,
+                   unsigned int *ecx, unsigned int *edx)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid )
+        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
+}
+
+static void vpmu_save_force(void *arg)
+{
+    struct vcpu *v = (struct vcpu *)arg;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
+        return;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
+    if ( vpmu->arch_vpmu_ops )
+        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+
+    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+    per_cpu(last_vcpu, smp_processor_id()) = NULL;
+}
+
+void vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
+       return;
+
+    vpmu->last_pcpu = pcpu;
+    per_cpu(last_vcpu, pcpu) = v;
+
+    if ( vpmu->arch_vpmu_ops )
+        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+    apic_write(APIC_LVTPC, vpmu_apic_vector | APIC_LVT_MASKED);
+}
+
+void vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+    struct vcpu *prev = NULL;
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    /* First time this VCPU is running here */
+    if ( vpmu->last_pcpu != pcpu )
+    {
+        /*
+         * Get the context from last pcpu that we ran on. Note that if another
+         * VCPU is running there it must have saved this VPCU's context before
+         * startig to run (see below).
+         * There should be no race since remote pcpu will disable interrupts
+         * before saving the context.
+         */
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                             vpmu_save_force, (void *)v, 1);
+    } 
+
+    /* Prevent forced context save from remote CPU */
+    local_irq_disable();
+
+    prev = per_cpu(last_vcpu, pcpu);
+
+    if ( prev != v && prev )
+    {
+        vpmu = vcpu_vpmu(prev);
+
+        /* Someone ran here before us */
+        vpmu_save_force(prev);
+
+        vpmu = vcpu_vpmu(v);
+    }
+
+    local_irq_enable();
+
+    /* 
+     * Only when PMU is counting and is not cached (for PV guests) do
+     * we load PMU context immediately.
+     */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (is_pv_domain(v->domain) &&
+          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+        return;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
+    {
+        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
+    }
+
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+}
+
+void vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t vendor = current_cpu_data.x86_vendor;
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        vpmu_destroy(v);
+    vpmu_clear(vpmu);
+    vpmu->context = NULL;
+
+    switch ( vendor )
+    {
+    case X86_VENDOR_AMD:
+        if ( svm_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
+        return;
+
+    case X86_VENDOR_INTEL:
+        if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
+        return;
+
+    default:
+        printk("VPMU: Initialization failed. "
+               "Unknown CPU vendor %d\n", vendor);
+        vpmu_mode = XENPMU_MODE_OFF;
+        return;
+    }
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, (void *)v, 1);
+
+        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
+}
+
+/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
+void vpmu_dump(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump )
+        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
+}
+
+/* Process the softirq set by PMU NMI handler */
+static void pmu_softnmi(void)
+{
+    struct cpu_user_regs *regs;
+    struct vcpu *v, *sampled = per_cpu(sampled_vcpu, smp_processor_id());
+
+    if ( vpmu_mode & XENPMU_MODE_PRIV ||
+         sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
+    else
+    {
+        if ( is_hvm_domain(sampled->domain) )
+        {
+            vpmu_send_nmi(sampled);
+            return;
+        }
+        v = sampled;
+    }
+
+    regs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
+    if ( !is_pv_domain(sampled->domain) )
+    {
+        struct segment_register cs;
+
+        hvm_get_segment_register(sampled, x86_seg_cs, &cs);
+        regs->cs = cs.attr.fields.dpl;
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
+
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct page_info *page;
+    uint64_t gmfn = params->d.val;
+    static int pvpmu_initted = 0;
+ 
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    if ( !pvpmu_initted )
+    {
+        if (reserve_lapic_nmi() == 0)
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            printk("Failed to reserve PMU NMI\n");
+            put_page(page);
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
+
+        pvpmu_initted = 1;
+    }
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if (v != current)
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if (v != current)
+        vcpu_unpause(v);
+}
+
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret = -EINVAL;
+    xen_pmu_params_t pmu_params;
+    uint32_t mode;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
+        if ( (mode & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
+             ((mode & XENPMU_MODE_ON) && (mode & XENPMU_MODE_PRIV)) )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_MODE_MASK;
+        vpmu_mode |= mode;
+
+        ret = 0;
+        break;
+
+    case XENPMU_mode_get:
+        pmu_params.d.val = vpmu_mode & XENPMU_MODE_MASK;
+        pmu_params.v.version.maj = XENPMU_VER_MAJ;
+        pmu_params.v.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_feature_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( (uint32_t)pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_FEATURE_MASK;
+        vpmu_mode |= (uint32_t)pmu_params.d.val << XENPMU_FEATURE_SHIFT;
+
+        ret = 0;
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.d.val = vpmu_mode & XENPMU_FEATURE_MASK;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_lvtpc_set:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        vpmu_lvtpc_update((uint32_t)pmu_params.d.val);
+        ret = 0;
+        break;
+    case XENPMU_flush:
+        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
+        vpmu_load(current);
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        ret = 0;
+        break;
+    }
+
+    return ret;
+}
diff --git a/xen/arch/x86/vpmu_amd.c b/xen/arch/x86/vpmu_amd.c
new file mode 100644
index 0000000..a0629d4
--- /dev/null
+++ b/xen/arch/x86/vpmu_amd.c
@@ -0,0 +1,499 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2010, Advanced Micro Devices, Inc.
+ * Parts of this code are Copyright (c) 2007, Intel Corporation
+ *
+ * Author: Wei Wang <wei.wang2@amd.com>
+ * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/xenoprof.h>
+#include <xen/hvm/save.h>
+#include <xen/sched.h>
+#include <xen/irq.h>
+#include <asm/apic.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/vpmu.h>
+#include <public/xenpmu.h>
+
+#define MSR_F10H_EVNTSEL_GO_SHIFT   40
+#define MSR_F10H_EVNTSEL_EN_SHIFT   22
+#define MSR_F10H_COUNTER_LENGTH     48
+
+#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT))
+#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1))))
+
+static unsigned int __read_mostly num_counters;
+static const u32 __read_mostly *counters;
+static const u32 __read_mostly *ctrls;
+static bool_t __read_mostly k7_counters_mirrored;
+
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+#define AMD_MAX_COUNTERS    6
+
+/* PMU Counter MSRs. */
+static const u32 AMD_F10H_COUNTERS[] = {
+    MSR_K7_PERFCTR0,
+    MSR_K7_PERFCTR1,
+    MSR_K7_PERFCTR2,
+    MSR_K7_PERFCTR3
+};
+
+/* PMU Control MSRs. */
+static const u32 AMD_F10H_CTRLS[] = {
+    MSR_K7_EVNTSEL0,
+    MSR_K7_EVNTSEL1,
+    MSR_K7_EVNTSEL2,
+    MSR_K7_EVNTSEL3
+};
+
+static const u32 AMD_F15H_COUNTERS[] = {
+    MSR_AMD_FAM15H_PERFCTR0,
+    MSR_AMD_FAM15H_PERFCTR1,
+    MSR_AMD_FAM15H_PERFCTR2,
+    MSR_AMD_FAM15H_PERFCTR3,
+    MSR_AMD_FAM15H_PERFCTR4,
+    MSR_AMD_FAM15H_PERFCTR5
+};
+
+static const u32 AMD_F15H_CTRLS[] = {
+    MSR_AMD_FAM15H_EVNTSEL0,
+    MSR_AMD_FAM15H_EVNTSEL1,
+    MSR_AMD_FAM15H_EVNTSEL2,
+    MSR_AMD_FAM15H_EVNTSEL3,
+    MSR_AMD_FAM15H_EVNTSEL4,
+    MSR_AMD_FAM15H_EVNTSEL5
+};
+
+static inline int get_pmu_reg_type(u32 addr)
+{
+    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
+        return MSR_TYPE_CTRL;
+
+    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
+        return MSR_TYPE_COUNTER;
+
+    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
+         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
+    {
+        if (addr & 1)
+            return MSR_TYPE_COUNTER;
+        else
+            return MSR_TYPE_CTRL;
+    }
+
+    /* unsupported registers */
+    return -1;
+}
+
+static inline u32 get_fam15h_addr(u32 addr)
+{
+    switch ( addr )
+    {
+    case MSR_K7_PERFCTR0:
+        return MSR_AMD_FAM15H_PERFCTR0;
+    case MSR_K7_PERFCTR1:
+        return MSR_AMD_FAM15H_PERFCTR1;
+    case MSR_K7_PERFCTR2:
+        return MSR_AMD_FAM15H_PERFCTR2;
+    case MSR_K7_PERFCTR3:
+        return MSR_AMD_FAM15H_PERFCTR3;
+    case MSR_K7_EVNTSEL0:
+        return MSR_AMD_FAM15H_EVNTSEL0;
+    case MSR_K7_EVNTSEL1:
+        return MSR_AMD_FAM15H_EVNTSEL1;
+    case MSR_K7_EVNTSEL2:
+        return MSR_AMD_FAM15H_EVNTSEL2;
+    case MSR_K7_EVNTSEL3:
+        return MSR_AMD_FAM15H_EVNTSEL3;
+    default:
+        break;
+    }
+
+    return addr;
+}
+
+static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
+    }
+
+    ctxt->msr_bitmap_set = 1;
+}
+
+static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
+    }
+
+    ctxt->msr_bitmap_set = 0;
+}
+
+static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    return 1;
+}
+
+static inline void context_load(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
+    }
+}
+
+static void amd_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+
+    vpmu_reset(vpmu, VPMU_FROZEN);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        unsigned int i;
+	uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], ctrl_regs[i]);
+
+        return;
+    }
+
+    context_load(v);
+}
+
+static inline void context_save(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+
+    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
+    for ( i = 0; i < num_counters; i++ )
+        rdmsrl(counters[i], counter_regs[i]);
+}
+
+static int amd_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctx = vpmu->context;
+    unsigned int i;
+
+    if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], 0);
+
+        vpmu_set(vpmu, VPMU_FROZEN);
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+            return 0;
+
+    context_save(v);
+
+    if ( !is_pv_domain(v->domain) && 
+        !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+        amd_vpmu_unset_msr_bitmap(v);
+
+    return 1;
+}
+
+static void context_update(unsigned int msr, u64 msr_content)
+{
+    unsigned int i;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    if ( k7_counters_mirrored &&
+        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
+    {
+        msr = get_fam15h_addr(msr);
+    }
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+       if ( msr == ctrls[i] )
+       {
+           ctrl_regs[i] = msr_content;
+           return;
+       }
+        else if (msr == counters[i] )
+        {
+            counter_regs[i] = msr_content;
+            return;
+        }
+    }
+}
+
+static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    /* For all counters, enable guest only mode for HVM guest */
+    if ( !is_pv_domain(v->domain) && (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        !(is_guest_mode(msr_content)) )
+    {
+        set_guest_mode(msr_content);
+    }
+
+    /* check if the first counter is enabled */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            return 1;
+        vpmu_set(vpmu, VPMU_RUNNING);
+
+        if ( !is_pv_domain(v->domain) &&
+             !((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_set_msr_bitmap(v);
+    }
+
+    /* stop saving & restore if guest stops first counter */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        vpmu_reset(vpmu, VPMU_RUNNING);
+        if ( !is_pv_domain(v->domain) &&
+             ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_unset_msr_bitmap(v);
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    /* Update vpmu context immediately */
+    context_update(msr, msr_content);
+
+    /* Write to hw counters */
+    wrmsrl(msr, msr_content);
+    return 1;
+}
+
+static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    rdmsrl(msr, *msr_content);
+
+    return 1;
+}
+
+static int amd_vpmu_initialise(struct vcpu *v)
+{
+    struct xen_pmu_amd_ctxt *ctxt;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( counters == NULL )
+    {
+         switch ( family )
+	 {
+	 case 0x15:
+	     num_counters = F15H_NUM_COUNTERS;
+	     counters = AMD_F15H_COUNTERS;
+	     ctrls = AMD_F15H_CTRLS;
+	     k7_counters_mirrored = 1;
+	     break;
+	 case 0x10:
+	 case 0x12:
+	 case 0x14:
+	 case 0x16:
+	 default:
+	     num_counters = F10H_NUM_COUNTERS;
+	     counters = AMD_F10H_COUNTERS;
+	     ctrls = AMD_F10H_CTRLS;
+	     k7_counters_mirrored = 0;
+	     break;
+	 }
+    }
+
+    if ( !is_pv_domain(v->domain) )
+    {
+        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS + 
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                     " PMU feature is unavailable on domain %d vcpu %d.\n",
+                     v->vcpu_id, v->domain->domain_id);
+            return -ENOMEM;
+        }
+    }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
+
+    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
+
+    vpmu->context = ctxt;
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    return 0;
+}
+
+static void amd_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    if ( !is_pv_domain(v->domain) )
+    {
+        if ( ((struct xen_pmu_amd_ctxt *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_unset_msr_bitmap(v);
+
+        xfree(vpmu->context);
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
+}
+
+/* VPMU part of the 'q' keyhandler */
+static void amd_vpmu_dump(const struct vcpu *v)
+{
+    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+    unsigned int i;
+
+    printk("    VPMU state: 0x%x ", vpmu->flags);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    {
+         printk("\n");
+         return;
+    }
+
+    printk("(");
+    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
+        printk("PASSIVE_DOMAIN_ALLOCATED, ");
+    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
+        printk("FROZEN, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+        printk("SAVE, ");
+    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
+        printk("RUNNING, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        printk("LOADED, ");
+    printk("ALLOCATED)\n");
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        uint64_t ctrl, cntr;
+
+        rdmsrl(ctrls[i], ctrl);
+        rdmsrl(counters[i], cntr);
+        printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
+    }
+}
+
+struct arch_vpmu_ops amd_vpmu_ops = {
+    .do_wrmsr = amd_vpmu_do_wrmsr,
+    .do_rdmsr = amd_vpmu_do_rdmsr,
+    .do_interrupt = amd_vpmu_do_interrupt,
+    .arch_vpmu_destroy = amd_vpmu_destroy,
+    .arch_vpmu_save = amd_vpmu_save,
+    .arch_vpmu_load = amd_vpmu_load,
+    .arch_vpmu_dump = amd_vpmu_dump
+};
+
+int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+    int ret = 0;
+
+    /* vpmu enabled? */
+    if ( vpmu_flags == XENPMU_MODE_OFF )
+        return 0;
+
+    switch ( family )
+    {
+    case 0x10:
+    case 0x12:
+    case 0x14:
+    case 0x15:
+    case 0x16:
+        ret = amd_vpmu_initialise(v);
+        if ( !ret )
+            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
+        return ret;
+    }
+
+    printk("VPMU: Initialization failed. "
+           "AMD processor family %d has not "
+           "been supported\n", family);
+    return -EINVAL;
+}
+
diff --git a/xen/arch/x86/vpmu_intel.c b/xen/arch/x86/vpmu_intel.c
new file mode 100644
index 0000000..4323aaf
--- /dev/null
+++ b/xen/arch/x86/vpmu_intel.c
@@ -0,0 +1,936 @@
+/*
+ * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/irq.h>
+#include <asm/system.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/apic.h>
+#include <asm/traps.h>
+#include <asm/msr.h>
+#include <asm/msr-index.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <public/sched.h>
+#include <public/hvm/save.h>
+#include <public/xenpmu.h>
+#include <asm/vpmu.h>
+
+/*
+ * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
+ * instruction.
+ * cpuid 0xa - Architectural Performance Monitoring Leaf
+ * Register eax
+ */
+#define PMU_VERSION_SHIFT        0  /* Version ID */
+#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
+#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT)
+
+#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
+#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
+#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT)
+
+#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
+#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
+#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT)
+/* Register edx */
+#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
+#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
+#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT)
+
+#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
+#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
+#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT)
+
+/* Alias registers (0x4c1) for full-width writes to PMCs */
+#define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
+static bool_t __read_mostly full_width_write;
+
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
+/*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
+ * QUIRK to workaround an issue on various family 6 cpus.
+ * The issue leads to endless PMC interrupt loops on the processor.
+ * If the interrupt handler is running and a pmc reaches the value 0, this
+ * value remains forever and it triggers immediately a new interrupt after
+ * finishing the handler.
+ * A workaround is to read all flagged counters and if the value is 0 write
+ * 1 (or another value != 0) into it.
+ * There exist no errata and the real cause of this behaviour is unknown.
+ */
+bool_t __read_mostly is_pmc_quirk;
+
+static void check_pmc_quirk(void)
+{
+    if ( current_cpu_data.x86 == 6 )
+        is_pmc_quirk = 1;
+    else
+        is_pmc_quirk = 0;    
+}
+
+static void handle_pmc_quirk(u64 msr_content)
+{
+    int i;
+    u64 val;
+
+    if ( !is_pmc_quirk )
+        return;
+
+    val = msr_content;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+    val = msr_content >> 32;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+}
+
+/*
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ */
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
+
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
+}
+
+/*
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
+ */
+static int core2_get_fixed_pmc_count(void)
+{
+    u32 eax;
+
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
+}
+
+/* edx bits 5-12: Bit width of fixed-function performance counters  */
+static int core2_get_bitwidth_fix_count(void)
+{
+    u32 edx;
+
+    edx = cpuid_edx(0xa);
+    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
+}
+
+static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
+{
+    int i;
+    u32 msr_index_pmc;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
+        {
+            *type = MSR_TYPE_COUNTER;
+            *index = i;
+            return 1;
+        }
+    }
+
+    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
+        (msr_index == MSR_IA32_DS_AREA) ||
+        (msr_index == MSR_IA32_PEBS_ENABLE) )
+    {
+        *type = MSR_TYPE_CTRL;
+        return 1;
+    }
+
+    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
+         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
+         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
+    {
+        *type = MSR_TYPE_GLOBAL;
+        return 1;
+    }
+
+    msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
+    if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
+         (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
+    {
+        *type = MSR_TYPE_ARCH_COUNTER;
+        *index = msr_index_pmc - MSR_IA32_PERFCTR0;
+        return 1;
+    }
+
+    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
+         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
+    {
+        *type = MSR_TYPE_ARCH_CTRL;
+        *index = msr_index - MSR_P6_EVNTSEL0;
+        return 1;
+    }
+
+    return 0;
+}
+
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
+static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    /* Allow Read/Write PMU Counters MSR Directly. */
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+
+        if ( full_width_write )
+        {
+            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
+            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
+                      msr_bitmap + 0x800/BYTES_PER_LONG);
+        }
+    }
+
+    /* Allow Read PMU Non-global Controls Directly. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+
+        if ( full_width_write )
+        {
+            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
+            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
+                      msr_bitmap + 0x800/BYTES_PER_LONG);
+        }
+    }
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static inline void __core2_vpmu_save(struct vcpu *v)
+{
+    int i;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( is_pv_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
+}
+
+static int core2_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
+        return 0;
+
+    if ( is_pv_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+    __core2_vpmu_save(v);
+
+    /* Unset PMU MSR bitmap to trap lazy load. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap
+        && !is_pv_domain(v->domain) )
+        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+    return 1;
+}
+
+static inline void __core2_vpmu_load(struct vcpu *v)
+{
+    unsigned int i, pmc_start;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
+
+    if ( full_width_write )
+        pmc_start = MSR_IA32_A_PERFCTR0;
+    else
+        pmc_start = MSR_IA32_PERFCTR0;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
+    }
+
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( is_pv_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
+}
+
+static void core2_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return;
+
+    __core2_vpmu_load(v);
+}
+
+static int core2_vpmu_alloc_resource(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
+
+    if ( !is_pv_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            return 0;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
+
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    }
+
+    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+      sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = (void *)core2_vpmu_cxt;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
+    return 1;
+
+out_err:
+    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    release_pmu_ownship(PMU_OWNER_HVM);
+
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
+}
+
+static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( !is_core2_vpmu_msr(msr_index, type, index) )
+        return 0;
+
+    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
+         !core2_vpmu_alloc_resource(current) )
+        return 0;
+
+    /* Do the lazy load staff. */
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        __core2_vpmu_load(current);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        if ( cpu_has_vmx_msr_bitmap && !is_pv_domain(current->domain) )
+            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
+    }
+    return 1;
+}
+
+static void inject_trap(struct vcpu *v, unsigned int trapno)
+{
+    if ( !is_pv_domain(v->domain) )
+        hvm_inject_hw_exception(trapno, 0);
+    else
+        send_guest_trap(v->domain, v->vcpu_id, trapno);
+}
+
+static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    u64 global_ctrl, non_global_ctrl;
+    unsigned pmu_enable = 0;
+    int i, tmp;
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+
+    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        /* Special handling for BTS */
+        if ( msr == MSR_IA32_DEBUGCTLMSR )
+        {
+            uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
+                                 IA32_DEBUGCTLMSR_BTINT;
+
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
+                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
+            if ( msr_content & supported )
+            {
+                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                    return 1;
+                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
+                inject_trap(v, TRAP_gp_fault);
+                return 0;
+            }
+        }
+        return 0;
+    }
+
+    core2_vpmu_cxt = vpmu->context;
+    switch ( msr )
+    {
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+        return 1;
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
+                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
+        inject_trap(v, TRAP_gp_fault);
+        return 1;
+    case MSR_IA32_PEBS_ENABLE:
+        if ( msr_content & 1 )
+            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
+                     "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
+        return 1;
+    case MSR_IA32_DS_AREA:
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            if ( !is_canonical_address(msr_content) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
+                         msr_content);
+                inject_trap(v, TRAP_gp_fault);
+                return 1;
+            }
+            core2_vpmu_cxt->ds_area = msr_content;
+            break;
+        }
+        gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
+        return 1;
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+        global_ctrl = msr_content;
+        for ( i = 0; i < arch_pmc_cnt; i++ )
+        {
+            rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
+            pmu_enable += global_ctrl & (non_global_ctrl >> 22) & 1;
+            global_ctrl >>= 1;
+        }
+
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
+        global_ctrl = msr_content >> 32;
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
+        {
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1 : 0);
+            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
+            global_ctrl >>= 1;
+        }
+        core2_vpmu_cxt->global_ctrl = msr_content;
+        break;
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+        non_global_ctrl = msr_content;
+        if ( !is_pv_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+        global_ctrl >>= 32;
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
+        {
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1 : 0);
+            non_global_ctrl >>= 4;
+            global_ctrl >>= 1;
+        }
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
+        break;
+    default:
+        tmp = msr - MSR_P6_EVNTSEL0;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+            if ( !is_pv_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+            xen_pmu_cntr_pair[tmp].control = msr_content;
+            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
+                pmu_enable += (global_ctrl >> i) &
+                    (xen_pmu_cntr_pair[i].control >> 22) & 1;
+        }
+    }
+
+    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
+    if ( pmu_enable )
+        vpmu_set(vpmu, VPMU_RUNNING);
+    else
+        vpmu_reset(vpmu, VPMU_RUNNING);
+
+    if ( type != MSR_TYPE_GLOBAL )
+    {
+        u64 mask;
+        int inject_gp = 0;
+        switch ( type )
+        {
+        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
+            mask = ~((1ull << 32) - 1);
+            if (msr_content & mask)
+                inject_gp = 1;
+            break;
+        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
+            if  ( msr == MSR_IA32_DS_AREA )
+                break;
+            /* 4 bits per counter, currently 3 fixed counters implemented. */
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
+            if (msr_content & mask)
+                inject_gp = 1;
+            break;
+        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
+            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
+            if (msr_content & mask)
+                inject_gp = 1;
+            break;
+        }
+
+        if (inject_gp) 
+            inject_trap(v, TRAP_gp_fault);
+        else
+            wrmsrl(msr, msr_content);
+    }
+    else
+    {
+       if ( !is_pv_domain(v->domain) )
+           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+       else
+           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
+
+    return 1;
+}
+
+static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+
+    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        core2_vpmu_cxt = vpmu->context;
+        switch ( msr )
+        {
+        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            *msr_content = 0;
+            break;
+        case MSR_CORE_PERF_GLOBAL_STATUS:
+            *msr_content = core2_vpmu_cxt->global_status;
+            break;
+        case MSR_CORE_PERF_GLOBAL_CTRL:
+            if ( !is_pv_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
+            break;
+        default:
+            rdmsrl(msr, *msr_content);
+        }
+    }
+    else
+    {
+        /* Extension for BTS */
+        if ( msr == MSR_IA32_MISC_ENABLE )
+        {
+            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
+        }
+        else
+            return 0;
+    }
+
+    return 1;
+}
+
+static void core2_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    if (input == 0x1)
+    {
+        struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            /* Switch on the 'Debug Store' feature in CPUID.EAX[1]:EDX[21] */
+            *edx |= cpufeat_mask(X86_FEATURE_DS);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
+        }
+    }
+}
+
+/* Dump vpmu info on console, called in the context of keyhandler 'q'. */
+static void core2_vpmu_dump(const struct vcpu *v)
+{
+    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int i;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    u64 val;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+         return;
+
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            printk("    vPMU loaded\n");
+        else
+            printk("    vPMU allocated\n");
+        return;
+    }
+
+    printk("    vPMU running\n");
+    core2_vpmu_cxt = vpmu->context;
+
+    /* Print the contents of the counter and its configuration msr. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+            i, xen_pmu_cntr_pair[i].counter, xen_pmu_cntr_pair[i].control);
+
+    /*
+     * The configuration of the fixed counter is 4 bits each in the
+     * MSR_CORE_PERF_FIXED_CTR_CTRL.
+     */
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, fixed_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
+        val >>= FIXED_CTR_CTRL_BITS;
+    }
+}
+
+static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    u64 msr_content;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
+
+    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
+    if ( msr_content )
+    {
+        if ( is_pmc_quirk )
+            handle_pmc_quirk(msr_content);
+        core2_vpmu_cxt->global_status |= msr_content;
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+    }
+    else
+    {
+        /* No PMC overflow but perhaps a Trace Message interrupt. */
+        __vmread(GUEST_IA32_DEBUGCTL, &msr_content);
+        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
+            return 0;
+    }
+
+    return 1;
+}
+
+static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    u64 msr_content;
+    struct cpuinfo_x86 *c = &current_cpu_data;
+
+    if ( !(vpmu_flags & (XENPMU_FEATURE_INTEL_BTS << XENPMU_FEATURE_SHIFT)) )
+        goto func_out;
+    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
+    if ( cpu_has(c, X86_FEATURE_DS) )
+    {
+        if ( !cpu_has(c, X86_FEATURE_DTES64) )
+        {
+            printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
+                   " - Debug Store disabled for d%d:v%d\n",
+                   v->domain->domain_id, v->vcpu_id);
+            goto func_out;
+        }
+        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
+        {
+            /* If BTS_UNAVAIL is set reset the DS feature. */
+            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
+            printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
+                   " - Debug Store disabled for d%d:v%d\n",
+                   v->domain->domain_id, v->vcpu_id);
+        }
+        else
+        {
+            vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
+            if ( !cpu_has(c, X86_FEATURE_DSCPL) )
+                printk(XENLOG_G_INFO
+                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
+            printk("******************************************************\n");
+            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
+            printk("** Using this processor feature in a virtualized    **\n");
+            printk("** environment is not 100%% safe.                    **\n");
+            printk("** Setting the DS buffer address with wrong values  **\n");
+            printk("** may lead to hypervisor hangs or crashes.         **\n");
+            printk("** It is NOT recommended for production use!        **\n");
+            printk("******************************************************\n");
+        }
+    }
+func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    check_pmc_quirk();
+
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
+            return 1;
+
+    return 0;
+}
+
+static void core2_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    if ( !is_pv_domain(v->domain) )
+    {
+        xfree(vpmu->context);
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+    }
+
+    release_pmu_ownship(PMU_OWNER_HVM);
+    vpmu_clear(vpmu);
+}
+
+struct arch_vpmu_ops core2_vpmu_ops = {
+    .do_wrmsr = core2_vpmu_do_wrmsr,
+    .do_rdmsr = core2_vpmu_do_rdmsr,
+    .do_interrupt = core2_vpmu_do_interrupt,
+    .do_cpuid = core2_vpmu_do_cpuid,
+    .arch_vpmu_destroy = core2_vpmu_destroy,
+    .arch_vpmu_save = core2_vpmu_save,
+    .arch_vpmu_load = core2_vpmu_load,
+    .arch_vpmu_dump = core2_vpmu_dump
+};
+
+static void core2_no_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    /*
+     * As in this case the vpmu is not enabled reset some bits in the
+     * architectural performance monitoring related part.
+     */
+    if ( input == 0xa )
+    {
+        *eax &= ~PMU_VERSION_MASK;
+        *eax &= ~PMU_GENERAL_NR_MASK;
+        *eax &= ~PMU_GENERAL_WIDTH_MASK;
+
+        *edx &= ~PMU_FIXED_NR_MASK;
+        *edx &= ~PMU_FIXED_WIDTH_MASK;
+    }
+}
+
+/*
+ * If its a vpmu msr set it to 0.
+ */
+static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    if ( !is_core2_vpmu_msr(msr, &type, &index) )
+        return 0;
+    *msr_content = 0;
+    return 1;
+}
+
+/*
+ * These functions are used in case vpmu is not enabled.
+ */
+struct arch_vpmu_ops core2_no_vpmu_ops = {
+    .do_rdmsr = core2_no_vpmu_do_rdmsr,
+    .do_cpuid = core2_no_vpmu_do_cpuid,
+};
+
+int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+    uint8_t cpu_model = current_cpu_data.x86_model;
+    int ret = 0;
+
+    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
+    if ( vpmu_flags == XENPMU_MODE_OFF )
+        return 0;
+
+    if ( family == 6 )
+    {
+        u64 caps;
+
+        rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
+        full_width_write = (caps >> 13) & 1;
+
+        switch ( cpu_model )
+        {
+        /* Core2: */
+        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
+        case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
+        case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
+        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
+
+        case 0x2a: /* SandyBridge */
+        case 0x2d: /* SandyBridge, "Romley-EP" */
+
+        /* Nehalem: */
+        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
+        case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */
+        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
+
+        /* Westmere: */
+        case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */
+        case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */
+        case 0x27: /* 32 nm Westmere-EX */
+
+        case 0x3a: /* IvyBridge */
+        case 0x3e: /* IvyBridge EP */
+
+        /* Haswell: */
+        case 0x3c:
+        case 0x3f:
+        case 0x45:
+        case 0x46:
+            ret = core2_vpmu_initialise(v, vpmu_flags);
+            if ( !ret )
+                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
+            return ret;
+        }
+    }
+
+    printk("VPMU: Initialization failed. "
+           "Intel processor family %d model %d has not "
+           "been supported\n", family, cpu_model);
+    return -EINVAL;
+}
+
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index ed81cfb..d27df39 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -20,7 +20,7 @@
 #define __ASM_X86_HVM_VMX_VMCS_H__
 
 #include <asm/hvm/io.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <irq_vectors.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
deleted file mode 100644
index 29bb977..0000000
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ /dev/null
@@ -1,98 +0,0 @@
-/*
- * vpmu.h: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_H_
-#define __ASM_X86_HVM_VPMU_H_
-
-#include <public/xenpmu.h>
-
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, arch.vpmu))
-
-#define MSR_TYPE_COUNTER            0
-#define MSR_TYPE_CTRL               1
-#define MSR_TYPE_GLOBAL             2
-#define MSR_TYPE_ARCH_COUNTER       3
-#define MSR_TYPE_ARCH_CTRL          4
-
-/* Start of PMU register bank */
-#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
-                                                 (uintptr_t)ctxt->offset))
-
-/* Arch specific operations shared by all vpmus */
-struct arch_vpmu_ops {
-    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
-    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-    int (*do_interrupt)(struct cpu_user_regs *regs);
-    void (*do_cpuid)(unsigned int input,
-                     unsigned int *eax, unsigned int *ebx,
-                     unsigned int *ecx, unsigned int *edx);
-    void (*arch_vpmu_destroy)(struct vcpu *v);
-    int (*arch_vpmu_save)(struct vcpu *v);
-    void (*arch_vpmu_load)(struct vcpu *v);
-    void (*arch_vpmu_dump)(const struct vcpu *);
-};
-
-int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
-int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
-
-struct vpmu_struct {
-    u32 flags;
-    u32 last_pcpu;
-    u32 hw_lapic_lvtpc;
-    void *context;
-    struct arch_vpmu_ops *arch_vpmu_ops;
-    xen_pmu_data_t *xenpmu_data;
-};
-
-/* VPMU states */
-#define VPMU_CONTEXT_ALLOCATED              0x1
-#define VPMU_CONTEXT_LOADED                 0x2
-#define VPMU_RUNNING                        0x4
-#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
-#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
-#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
-
-#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
-#define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
-#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
-
-void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-int vpmu_do_interrupt(struct cpu_user_regs *regs);
-void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
-                                       unsigned int *ecx, unsigned int *edx);
-void vpmu_initialise(struct vcpu *v);
-void vpmu_destroy(struct vcpu *v);
-void vpmu_save(struct vcpu *v);
-void vpmu_load(struct vcpu *v);
-void vpmu_dump(struct vcpu *v);
-
-extern int acquire_pmu_ownership(int pmu_ownership);
-extern void release_pmu_ownership(int pmu_ownership);
-
-extern uint32_t vpmu_mode;
-
-#endif /* __ASM_X86_HVM_VPMU_H_*/
-
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
new file mode 100644
index 0000000..863be59
--- /dev/null
+++ b/xen/include/asm-x86/vpmu.h
@@ -0,0 +1,98 @@
+/*
+ * vpmu.h: PMU virtualization.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#ifndef __ASM_X86_VPMU_H_
+#define __ASM_X86_VPMU_H_
+
+#include <public/xenpmu.h>
+
+#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
+#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, arch.vpmu))
+
+#define MSR_TYPE_COUNTER            0
+#define MSR_TYPE_CTRL               1
+#define MSR_TYPE_GLOBAL             2
+#define MSR_TYPE_ARCH_COUNTER       3
+#define MSR_TYPE_ARCH_CTRL          4
+
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
+
+/* Arch specific operations shared by all vpmus */
+struct arch_vpmu_ops {
+    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
+    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
+    int (*do_interrupt)(struct cpu_user_regs *regs);
+    void (*do_cpuid)(unsigned int input,
+                     unsigned int *eax, unsigned int *ebx,
+                     unsigned int *ecx, unsigned int *edx);
+    void (*arch_vpmu_destroy)(struct vcpu *v);
+    int (*arch_vpmu_save)(struct vcpu *v);
+    void (*arch_vpmu_load)(struct vcpu *v);
+    void (*arch_vpmu_dump)(const struct vcpu *);
+};
+
+int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
+int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
+
+struct vpmu_struct {
+    u32 flags;
+    u32 last_pcpu;
+    u32 hw_lapic_lvtpc;
+    void *context;
+    struct arch_vpmu_ops *arch_vpmu_ops;
+    xen_pmu_data_t *xenpmu_data;
+};
+
+/* VPMU states */
+#define VPMU_CONTEXT_ALLOCATED              0x1
+#define VPMU_CONTEXT_LOADED                 0x2
+#define VPMU_RUNNING                        0x4
+#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
+#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
+#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
+
+#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
+#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
+#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
+#define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
+#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
+
+void vpmu_lvtpc_update(uint32_t val);
+int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
+int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_interrupt(struct cpu_user_regs *regs);
+void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
+                                       unsigned int *ecx, unsigned int *edx);
+void vpmu_initialise(struct vcpu *v);
+void vpmu_destroy(struct vcpu *v);
+void vpmu_save(struct vcpu *v);
+void vpmu_load(struct vcpu *v);
+void vpmu_dump(struct vcpu *v);
+
+extern int acquire_pmu_ownership(int pmu_ownership);
+extern void release_pmu_ownership(int pmu_ownership);
+
+extern uint32_t vpmu_mode;
+
+#endif /* __ASM_X86_VPMU_H_*/
+
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest
  2014-01-21 19:08 ` [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-01-24 14:16   ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-01-24 14:16 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> @@ -601,6 +602,23 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>      }
>      break;
>  
> +    case XENPF_get_symbol:
> +    {
> +        char name[XEN_KSYM_NAME_LEN + 1];
> +        XEN_GUEST_HANDLE_64(char) nameh;

Why _64?

> +
> +        guest_from_compat_handle(nameh, op->u.symdata.u.name);
> +
> +        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
> +                           &op->u.symdata.address, name);
> +
> +        if ( !ret && copy_to_guest(nameh, name, XEN_KSYM_NAME_LEN + 1) )

Afaict symbols_expand_symbol() always zero terminates its
output, so I can't see why you're not properly using strlen() here.
The way you do it now you're leaking hypervisor stack contents
to the caller.

> +int xensyms_read(uint32_t *symnum, uint32_t *type, uint64_t *address, char *name)
> +{
> +    if ( *symnum > symbols_num_syms )
> +        return -ERANGE;
> +    if ( *symnum == symbols_num_syms )
> +        return 0;
> +
> +    spin_lock(&symbols_mutex);
> +
> +    if ( *symnum == 0 )
> +        next_offset = next_symbol = 0;
> +    if ( next_symbol != *symnum )
> +        /* Non-sequential access */
> +        next_offset = get_symbol_offset(*symnum);
> +
> +    *type = symbols_get_symbol_type(next_offset);
> +    next_offset = symbols_expand_symbol(next_offset, name);
> +    *address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
> +
> +    next_symbol = ++(*symnum);

Pointless parentheses.

> +#define XENPF_get_symbol   61
> +#define XEN_KSYM_NAME_LEN 127
> +struct xenpf_symdata {
> +    /* IN variables */
> +    uint32_t symnum;
> +
> +    /* OUT variables */
> +    uint32_t type;
> +    uint64_t address;
> +
> +    union {
> +        XEN_GUEST_HANDLE(char) name;
> +        uint64_t pad;
> +    } u;

Since you need to do translation anyway, I don't see what good
the padding field (and hence the union) here does.

> --- a/xen/include/xen/symbols.h
> +++ b/xen/include/xen/symbols.h
> @@ -2,8 +2,8 @@
>  #define _XEN_SYMBOLS_H
>  
>  #include <xen/types.h>
> -
> -#define KSYM_NAME_LEN 127
> +#include <public/xen.h>

I don't think you really need this one.

> +#include <public/platform.h>
>  
>  /* Lookup an address. */
>  const char *symbols_lookup(unsigned long addr,
> @@ -11,4 +11,7 @@ const char *symbols_lookup(unsigned long addr,
>                             unsigned long *offset,
>                             char *namebuf);
>  
> +extern int xensyms_read(uint32_t *symnum, uint32_t *type,
> +                        uint64_t *address, char *name);
> +

Please be consistent at least within individual files: There's no
"extern" in the existing function declaration here, so there
shouldn't be one here.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 03/17] x86/VPMU: Minor VPMU cleanup
  2014-01-21 19:08 ` [PATCH v4 03/17] x86/VPMU: Minor VPMU cleanup Boris Ostrovsky
@ 2014-01-24 14:28   ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-01-24 14:28 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -236,7 +236,8 @@ static int amd_vpmu_save(struct vcpu *v)
>  
>      context_save(v);
>  
> -    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
> +    if ( !is_pv_domain(v->domain) && 

If I understand the intentions right, this is supposed to be
is_hvm_container_domain(). See the mail archives or talk to George D
if you want to learn about the intended distinction between the two.

Further this is inconsistent with the patch description saying "Make
sure that we only touch MSR bitmap on HVM guests", as that would
exclude PVH ones. With the three models in place now, you have to
be careful to not cause confusion by imprecise statements.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h
  2014-01-21 19:08 ` [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-01-24 14:54   ` Jan Beulich
  2014-01-24 16:49     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-01-24 14:54 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> Add xenpmu.h header file,

To me, naming a public Xen header (other than the core one) xen*.h
is redundant. There's no information lost if you just called it pmu.h.

Also I think you ought to use plural here.

> --- /dev/null
> +++ b/xen/include/public/arch-x86/xenpmu.h
> @@ -0,0 +1,66 @@
> +#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
> +#define __XEN_PUBLIC_ARCH_X86_PMU_H__
> +
> +/* x86-specific PMU definitions */
> +
> +#include "xen.h"

Why?

> +struct xen_pmu_intel_ctxt {
> +    uint64_t global_ctrl;
> +    uint64_t global_ovf_ctrl;
> +    uint64_t global_status;
> +    uint64_t fixed_ctrl;
> +    uint64_t ds_area;
> +    uint64_t pebs_enable;
> +    uint64_t debugctl;
> +    uint64_t fixed_counters;  /* Offset to fixed counter MSRs */
> +    uint64_t arch_counters;   /* Offset to architectural counter MSRs */

I think these last two could easily be uint32_t.

> +/* Shared between hypervisor and PV domain */
> +struct xen_pmu_data {
> +    uint32_t domain_id;
> +    uint32_t vcpu_id;
> +    uint32_t pcpu_id;
> +    uint32_t pmu_flags;
> +
> +    xen_arch_pmu_t pmu;
> +};

So if this got included by an architecture independent source file
on ARM, how would this build? You at least need a stub definition
there for xen_arch_pmu_t afaict (if already give the impression -
further up - that you're supporting ARM compilation of this header).

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 08/17] x86/VPMU: Make vpmu not HVM-specific
  2014-01-21 19:08 ` [PATCH v4 08/17] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2014-01-24 14:59   ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-01-24 14:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> -#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
> -#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
> -                                          arch.hvm_vcpu.vpmu))
> +#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
> +#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, arch.vpmu))

If you already edit this, I'd prefer if you also stripped the various
redundant parentheses to make them better readable:

#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
#define vpmu_vcpu(vpmu)   container_of(vpmu, struct vcpu, arch.vpmu)

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags
  2014-01-21 19:08 ` [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-01-24 15:10   ` Jan Beulich
  2014-01-24 17:13     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-01-24 15:10 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> +{
> +    int ret = -EINVAL;
> +    xen_pmu_params_t pmu_params;
> +    uint32_t mode;
> +
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +        if ( !is_control_domain(current->domain) )
> +            return -EPERM;
> +
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
> +        if ( mode & ~XENPMU_MODE_ON )
> +            return -EINVAL;

Please, if you add a new interface, think carefully about future
extension room: Here you ignore the upper 32 bits of .val instead
of making sure they're zero, thus making it impossible to assign
them some meaning later on.

> +
> +        vpmu_mode &= ~XENPMU_MODE_MASK;
> +        vpmu_mode |= mode;
> +
> +        ret = 0;
> +        break;
> +
> +    case XENPMU_mode_get:
> +        pmu_params.d.val = vpmu_mode & XENPMU_MODE_MASK;
> +        pmu_params.v.version.maj = XENPMU_VER_MAJ;
> +        pmu_params.v.version.min = XENPMU_VER_MIN;
> +        if ( copy_to_guest(arg, &pmu_params, 1) )

__copy_to_guest().

> +            return -EFAULT;
> +        ret = 0;
> +        break;
> +
> +    case XENPMU_feature_set:
> +        if ( !is_control_domain(current->domain) )
> +            return -EPERM;
> +
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( (uint32_t)pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
> +            return -EINVAL;

See above.

> +
> +        vpmu_mode &= ~XENPMU_FEATURE_MASK;
> +        vpmu_mode |= (uint32_t)pmu_params.d.val << XENPMU_FEATURE_SHIFT;
> +
> +        ret = 0;
> +        break;
> +
> +    case XENPMU_feature_get:
> +        pmu_params.d.val = vpmu_mode & XENPMU_FEATURE_MASK;
> +        if ( copy_to_guest(arg, &pmu_params, 1) )

See above.

> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  #define __HYPERVISOR_kexec_op             37
>  #define __HYPERVISOR_tmem_op              38
>  #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
> +#define __HYPERVISOR_xenpmu_op            40
>  
>  /* Architecture-specific hypercall definitions. */
>  #define __HYPERVISOR_arch_0               48

Are you certain this wouldn't better be an architecture-specific
hypercall? Just like with Machine Check, I don't think all
architectures are guaranteed to have (or ever get) performance
monitoring capabilities.

> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
> +struct xen_pmu_params {
> +    /* IN/OUT parameters */
> +    union {
> +        struct version {
> +            uint8_t maj;
> +            uint8_t min;
> +        } version;
> +        uint64_t pad;
> +    } v;

Looking at the implementation above I don't see this ever being an
IN parameter.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h
  2014-01-24 14:54   ` Jan Beulich
@ 2014-01-24 16:49     ` Boris Ostrovsky
  2014-01-24 16:57       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-24 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 01/24/2014 09:54 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> Add xenpmu.h header file,
> To me, naming a public Xen header (other than the core one) xen*.h
> is redundant. There's no information lost if you just called it pmu.h.

I was trying to keep filename and top-level data structures the same 
(although now that I changed xenpmu_ prefix to xen_pmu_ they no longer 
are).

>
> Also I think you ought to use plural here.

I'd prefer to keep the arch-independent and -dependent file names the same.

...

>
>> +struct xen_pmu_intel_ctxt {
>> +    uint64_t global_ctrl;
>> +    uint64_t global_ovf_ctrl;
>> +    uint64_t global_status;
>> +    uint64_t fixed_ctrl;
>> +    uint64_t ds_area;
>> +    uint64_t pebs_enable;
>> +    uint64_t debugctl;
>> +    uint64_t fixed_counters;  /* Offset to fixed counter MSRs */
>> +    uint64_t arch_counters;   /* Offset to architectural counter MSRs */
> I think these last two could easily be uint32_t.
>
>> +/* Shared between hypervisor and PV domain */
>> +struct xen_pmu_data {
>> +    uint32_t domain_id;
>> +    uint32_t vcpu_id;
>> +    uint32_t pcpu_id;
>> +    uint32_t pmu_flags;
>> +
>> +    xen_arch_pmu_t pmu;
>> +};
> So if this got included by an architecture independent source file
> on ARM, how would this build? You at least need a stub definition
> there for xen_arch_pmu_t afaict (if already give the impression -
> further up - that you're supporting ARM compilation of this header).

I was supposed to have an entry in arch-arm.h but dropped it somewhere 
along the way. I'll put it back.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h
  2014-01-24 16:49     ` Boris Ostrovsky
@ 2014-01-24 16:57       ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-01-24 16:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 24.01.14 at 17:49, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 01/24/2014 09:54 AM, Jan Beulich wrote:
>>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> Add xenpmu.h header file,
>> To me, naming a public Xen header (other than the core one) xen*.h
>> is redundant. There's no information lost if you just called it pmu.h.
> 
> I was trying to keep filename and top-level data structures the same 
> (although now that I changed xenpmu_ prefix to xen_pmu_ they no longer 
> are).
> 
>>
>> Also I think you ought to use plural here.
> 
> I'd prefer to keep the arch-independent and -dependent file names the same.

Right, that's appreciated. Nevertheless it's two of them, i.e.
"Add pmu.h header files, ..."

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags
  2014-01-24 15:10   ` Jan Beulich
@ 2014-01-24 17:13     ` Boris Ostrovsky
  2014-01-27  8:34       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-24 17:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 01/24/2014 10:10 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    int ret = -EINVAL;
>> +    xen_pmu_params_t pmu_params;
>> +    uint32_t mode;
>> +
>> +    switch ( op )
>> +    {
>> +    case XENPMU_mode_set:
>> +        if ( !is_control_domain(current->domain) )
>> +            return -EPERM;
>> +
>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
>> +        if ( mode & ~XENPMU_MODE_ON )
>> +            return -EINVAL;
> Please, if you add a new interface, think carefully about future
> extension room: Here you ignore the upper 32 bits of .val instead
> of making sure they're zero, thus making it impossible to assign
> them some meaning later on.

I think I can leave this as is for now --- I am storing VPMU mode and 
VPMU features in the Xen-private vpmu_mode, which is a 64-bit value.

What I probably should do is remove XENPMU_MODE_MASK (and 
XENPMU_FEATURE_SHIFT  and XENPMU_FEATURE_MASK) from the public header 
since Linux passes down 64-bit pmu_params.d.val without any format 
assumptions anyway.

>
>> --- a/xen/include/public/xen.h
>> +++ b/xen/include/public/xen.h
>> @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>>   #define __HYPERVISOR_kexec_op             37
>>   #define __HYPERVISOR_tmem_op              38
>>   #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
>> +#define __HYPERVISOR_xenpmu_op            40
>>   
>>   /* Architecture-specific hypercall definitions. */
>>   #define __HYPERVISOR_arch_0               48
> Are you certain this wouldn't better be an architecture-specific
> hypercall? Just like with Machine Check, I don't think all
> architectures are guaranteed to have (or ever get) performance
> monitoring capabilities.

An architecture doesn't necessarily need to have HW performance 
monitoring support. In principle this interface can be used for passing 
any performance-related data (e.g. collected by the hypervisor) to the 
guest.

>> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
>> +struct xen_pmu_params {
>> +    /* IN/OUT parameters */
>> +    union {
>> +        struct version {
>> +            uint8_t maj;
>> +            uint8_t min;
>> +        } version;
>> +        uint64_t pad;
>> +    } v;
> Looking at the implementation above I don't see this ever being an
> IN parameter.

Currently Xen doesn't care about version but in the future a guest may 
specify what version of PMU it wants to use (I hope this day will never 
come though...)

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags
  2014-01-24 17:13     ` Boris Ostrovsky
@ 2014-01-27  8:34       ` Jan Beulich
  2014-01-27 15:20         ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-01-27  8:34 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 24.01.14 at 18:13, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 01/24/2014 10:10 AM, Jan Beulich wrote:
>>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>> +{
>>> +    int ret = -EINVAL;
>>> +    xen_pmu_params_t pmu_params;
>>> +    uint32_t mode;
>>> +
>>> +    switch ( op )
>>> +    {
>>> +    case XENPMU_mode_set:
>>> +        if ( !is_control_domain(current->domain) )
>>> +            return -EPERM;
>>> +
>>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>> +            return -EFAULT;
>>> +
>>> +        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
>>> +        if ( mode & ~XENPMU_MODE_ON )
>>> +            return -EINVAL;
>> Please, if you add a new interface, think carefully about future
>> extension room: Here you ignore the upper 32 bits of .val instead
>> of making sure they're zero, thus making it impossible to assign
>> them some meaning later on.
> 
> I think I can leave this as is for now --- I am storing VPMU mode and 
> VPMU features in the Xen-private vpmu_mode, which is a 64-bit value.

You should drop the cast to a 32-bit value at the very least -
"leave this as is for now" reads like you don#t need to make
any changes.

>>> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
>>> +struct xen_pmu_params {
>>> +    /* IN/OUT parameters */
>>> +    union {
>>> +        struct version {
>>> +            uint8_t maj;
>>> +            uint8_t min;
>>> +        } version;
>>> +        uint64_t pad;
>>> +    } v;
>> Looking at the implementation above I don't see this ever being an
>> IN parameter.
> 
> Currently Xen doesn't care about version but in the future a guest may 
> specify what version of PMU it wants to use (I hope this day will never 
> come though...)

At which time you'd need to add something like a set-version sub-op,
which would then also be the time to make this as IN/OUT. Right now
it is just IN and hence should be marked as such.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags
  2014-01-27  8:34       ` Jan Beulich
@ 2014-01-27 15:20         ` Boris Ostrovsky
  2014-01-27 15:29           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-01-27 15:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 01/27/2014 03:34 AM, Jan Beulich wrote:
>>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>>> +{
>>>> +    int ret = -EINVAL;
>>>> +    xen_pmu_params_t pmu_params;
>>>> +    uint32_t mode;
>>>> +
>>>> +    switch ( op )
>>>> +    {
>>>> +    case XENPMU_mode_set:
>>>> +        if ( !is_control_domain(current->domain) )
>>>> +            return -EPERM;
>>>> +
>>>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>>> +            return -EFAULT;
>>>> +
>>>> +        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
>>>> +        if ( mode & ~XENPMU_MODE_ON )
>>>> +            return -EINVAL;
>>> Please, if you add a new interface, think carefully about future
>>> extension room: Here you ignore the upper 32 bits of .val instead
>>> of making sure they're zero, thus making it impossible to assign
>>> them some meaning later on.
>> I think I can leave this as is for now --- I am storing VPMU mode and
>> VPMU features in the Xen-private vpmu_mode, which is a 64-bit value.
> You should drop the cast to a 32-bit value at the very least -
> "leave this as is for now" reads like you don#t need to make
> any changes.

mode is stored in the lower 32 bits of vpmu_mode variable a few lines below

      vpmu_mode &= ~XENPMU_MODE_MASK; // XENPMU_MODE_MASK - 0xffffffff
      vpmu_mode |= mode;

so the cast needs to happen somewhere. I can move it to the line above 
although I
am not sure what difference that would make.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags
  2014-01-27 15:20         ` Boris Ostrovsky
@ 2014-01-27 15:29           ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-01-27 15:29 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 27.01.14 at 16:20, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 01/27/2014 03:34 AM, Jan Beulich wrote:
>>>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>>>> +{
>>>>> +    int ret = -EINVAL;
>>>>> +    xen_pmu_params_t pmu_params;
>>>>> +    uint32_t mode;
>>>>> +
>>>>> +    switch ( op )
>>>>> +    {
>>>>> +    case XENPMU_mode_set:
>>>>> +        if ( !is_control_domain(current->domain) )
>>>>> +            return -EPERM;
>>>>> +
>>>>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>>>> +            return -EFAULT;
>>>>> +
>>>>> +        mode = (uint32_t)pmu_params.d.val & XENPMU_MODE_MASK;
>>>>> +        if ( mode & ~XENPMU_MODE_ON )
>>>>> +            return -EINVAL;
>>>> Please, if you add a new interface, think carefully about future
>>>> extension room: Here you ignore the upper 32 bits of .val instead
>>>> of making sure they're zero, thus making it impossible to assign
>>>> them some meaning later on.
>>> I think I can leave this as is for now --- I am storing VPMU mode and
>>> VPMU features in the Xen-private vpmu_mode, which is a 64-bit value.
>> You should drop the cast to a 32-bit value at the very least -
>> "leave this as is for now" reads like you don#t need to make
>> any changes.
> 
> mode is stored in the lower 32 bits of vpmu_mode variable a few lines below
> 
>       vpmu_mode &= ~XENPMU_MODE_MASK; // XENPMU_MODE_MASK - 0xffffffff
>       vpmu_mode |= mode;
> 
> so the cast needs to happen somewhere. I can move it to the line above 
> although I
> am not sure what difference that would make.

I don't really care what you do here, so long as you don't ignore
data passed into the hypercall.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 10/17] x86/VPMU: Initialize PMU for PV guests
  2014-01-21 19:08 ` [PATCH v4 10/17] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
@ 2014-01-31 16:58   ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-01-31 16:58 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    struct page_info *page;
> +    uint64_t gmfn = params->d.val;
> +
> +    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
> +        return -EINVAL;
> +
> +    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
> +    if ( !page )
> +        return -EINVAL;
> +
> +    v = d->vcpu[params->vcpu];
> +    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
> +    if ( !v->arch.vpmu.xenpmu_data )
> +    {
> +        put_page(page);
> +        return -EINVAL;
> +    }
> +
> +    vpmu_initialise(v);
> +
> +    return 0;
> +}

This being for a PV guest, you need to obtain a write type reference
to the page, or else you risk the guest re-using the page for
something that mustn't be written to in uncontrolled ways (like a
page or descriptor table). See e.g. map_vcpu_info().

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on PV guests
  2014-01-21 19:08 ` [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
@ 2014-02-04 11:14   ` Jan Beulich
  2014-02-04 15:07     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 11:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -400,6 +400,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>              return -EFAULT;
>          pvpmu_finish(current->domain, &pmu_params);
>          break;
> +
> +    case XENPMU_lvtpc_set:
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        vpmu_lvtpc_update((uint32_t)pmu_params.d.val);

Once again, please don't ignore (parts of) hypercall input values.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for PV guests
  2014-01-21 19:08 ` [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-02-04 11:22   ` Jan Beulich
  2014-02-04 15:26     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 11:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>  int vpmu_do_interrupt(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
> +    struct vpmu_struct *vpmu;
>  
> -    if ( vpmu->arch_vpmu_ops )
> +    /* dom0 will handle this interrupt */
> +    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
> +        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
> +
> +    vpmu = vcpu_vpmu(v);
> +    if ( !is_hvm_domain(v->domain) )
> +    {
> +        /* PV guest or dom0 is doing system profiling */
> +        const struct cpu_user_regs *gregs;
> +        int err;
> +
> +        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
> +            return 1;
> +
> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
> +        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
> +        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +
> +        /* Store appropriate registers in xenpmu_data */
> +        if ( is_pv_32bit_domain(current->domain) )
> +        {
> +            /*
> +             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
> +             * and therefore we treat it the same way as a non-priviledged
> +             * PV 32-bit domain.
> +             */
> +            struct compat_cpu_user_regs *cmp;
> +
> +            gregs = guest_cpu_user_regs();
> +
> +            cmp = (struct compat_cpu_user_regs *)
> +                    &v->arch.vpmu.xenpmu_data->pmu.r.regs;

Deliberate type changes like this can easily (and more readably as
well as more forward compatibly) be done using (void *).

> +            XLAT_cpu_user_regs(cmp, gregs);
> +        }
> +        else if ( !is_control_domain(current->domain) &&
> +                 !is_idle_vcpu(current) )
> +        {
> +            /* PV guest */
> +            gregs = guest_cpu_user_regs();
> +            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                   gregs, sizeof(struct cpu_user_regs));
> +        }
> +        else
> +            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                   regs, sizeof(struct cpu_user_regs));
> +
> +        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
> +
> +        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
> +        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
> +        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
> +
> +        send_guest_vcpu_virq(v, VIRQ_XENPMU);
> +
> +        return 1;
> +    }
> +    else if ( vpmu->arch_vpmu_ops )

If the previous (and only) if() branch returns unconditionally, using
"else if" is more confusing then clarifying imo (and in any case
needlessly growing the patch, even if just by a bit).

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode
  2014-01-21 19:08 ` [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-02-04 11:31   ` Jan Beulich
  2014-02-04 15:53     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 11:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> @@ -152,33 +162,62 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>          err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
>          vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>  
> -        /* Store appropriate registers in xenpmu_data */
> -        if ( is_pv_32bit_domain(current->domain) )
> +        if ( !is_hvm_domain(current->domain) )
>          {
> -            /*
> -             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
> -             * and therefore we treat it the same way as a non-priviledged
> -             * PV 32-bit domain.
> -             */
> -            struct compat_cpu_user_regs *cmp;
> -
> -            gregs = guest_cpu_user_regs();
> -
> -            cmp = (struct compat_cpu_user_regs *)
> -                    &v->arch.vpmu.xenpmu_data->pmu.r.regs;
> -            XLAT_cpu_user_regs(cmp, gregs);
> +            uint16_t cs = (current->arch.flags & TF_kernel_mode) ? 0 : 0x3;

The surrounding if checks !hvm, i.e. both PV and PVH can make it
here. But TF_kernel_mode is meaningful for PV only.

> +
> +            /* Store appropriate registers in xenpmu_data */
> +            if ( is_pv_32bit_domain(current->domain) )
> +            {
> +                gregs = guest_cpu_user_regs();
> +
> +                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
> +                     !is_pv_32bit_domain(v->domain) )
> +                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                           gregs, sizeof(struct cpu_user_regs));
> +                else 
> +                {
> +                    /*
> +                     * 32-bit dom0 cannot process Xen's addresses (which are
> +                     * 64 bit) and therefore we treat it the same way as a
> +                     * non-priviledged PV 32-bit domain.
> +                     */
> +
> +                    struct compat_cpu_user_regs *cmp;
> +
> +                    cmp = (struct compat_cpu_user_regs *)
> +                        &v->arch.vpmu.xenpmu_data->pmu.r.regs;
> +                    XLAT_cpu_user_regs(cmp, gregs);
> +                }
> +            }
> +            else if ( !is_control_domain(current->domain) &&
> +                      !is_idle_vcpu(current) )
> +            {
> +                /* PV guest */
> +                gregs = guest_cpu_user_regs();
> +                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                       gregs, sizeof(struct cpu_user_regs));
> +            }
> +            else
> +                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                       regs, sizeof(struct cpu_user_regs));
> +
> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
> +            gregs->cs = cs;

And now you store a NUL selector (i.e. just the RPL bits) into the
output field?

>          }
> -        else if ( !is_control_domain(current->domain) &&
> -                 !is_idle_vcpu(current) )
> +        else
>          {
> -            /* PV guest */
> +            /* HVM guest */
> +            struct segment_register cs;
> +
>              gregs = guest_cpu_user_regs();
>              memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>                     gregs, sizeof(struct cpu_user_regs));
> +
> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
> +            gregs->cs = cs.attr.fields.dpl;

And here too? If that's intended, a code comment is a must.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-01-21 19:08 ` [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-02-04 11:38   ` Jan Beulich
  2014-02-04 15:56     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 11:38 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> Save VPMU state during context switch for both HVM and PV guests unless we
> are in PMU privileged mode (i.e. dom0 is doing all profiling) and the 
> switched
> out domain is not the control domain. The latter condition is needed because
> me may have just turned the privileged PMU mode on and thus need to save 
> last domain.

While this is understandable, ...

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1444,17 +1444,16 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>      }
>  
>      if (prev != next)
> -        update_runstate_area(prev);
> -
> -    if ( is_hvm_vcpu(prev) )
>      {
> -        if (prev != next)
> +        update_runstate_area(prev);
> +        if ( !(vpmu_mode & XENPMU_MODE_PRIV) ||
> +             !is_control_domain(prev->domain) )
>              vpmu_save(prev);

... I'd really like you to investigate ways to achieve the same effect
without this extra second condition added to the context switch path.
E.g. by synchronously issuing a save on all affected vCPU-s when
privileged mode gets turned on.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support
  2014-01-21 19:09 ` [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-02-04 11:48   ` Jan Beulich
  2014-02-04 16:31     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 11:48 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:09, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> Add support for using NMIs as PMU interrupts.
> 
> Most of processing is still performed by vpmu_do_interrupt(). However, since
> certain operations are not NMI-safe we defer them to a softint that 
> vpmu_do_interrupt()
> will schedule:
> * For PV guests that would be send_guest_vcpu_virq() and 
> hvm_get_segment_register().

Makes no sense - why would hvm_get_segment_register() be of any
relevance to PV guests?

And then I'm still missing a reasonable level of analysis that the
previously non-NMI-only interrupt handler is now safe to use in NMI
context.

> +uint32_t vpmu_apic_vector = PMU_APIC_VECTOR;

Considering that you store APIC_DM_NMI into this variable in the
NMI case, it needs to be named differently (or else I'd be tempted
to convert it to uint8_t the first time I stumble across it).

> +static void vpmu_send_nmi(struct vcpu *v)
> +{
> +    struct vlapic *vlapic = vcpu_vlapic(v);

Please ASSERT() that you have HVM data available before doing
anything that would be unsafe in PV (and maybe PVH?) context.
This will then at once serve as documentation, clarifying that the
function must only be used for suitable vCPU-s.

> +/* Process the softirq set by PMU NMI handler */
> +static void pmu_softnmi(void)
> +{
> +    struct cpu_user_regs *regs;
> +    struct vcpu *v, *sampled = per_cpu(sampled_vcpu, smp_processor_id());
> +
> +    if ( vpmu_mode & XENPMU_MODE_PRIV ||

() around the & please.

>  static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>  {
>      struct vcpu *v;
>      struct page_info *page;
>      uint64_t gmfn = params->d.val;
> -
> +    static int pvpmu_initted = 0;

bool_t? __read_mostly?

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 16/17] x86/VPMU: Suport for PVH guests
  2014-01-21 19:09 ` [PATCH v4 16/17] x86/VPMU: Suport for PVH guests Boris Ostrovsky
@ 2014-02-04 11:51   ` Jan Beulich
  2014-02-04 16:44     ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 11:51 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 21.01.14 at 20:09, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> +        if ( is_pvh_domain(current->domain) && !(vpmu_mode & XENPMU_MODE_PRIV) )
> +            if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
> +                return 0;

Please fold chained if()s like this one.

> @@ -237,7 +242,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>              else if ( !is_control_domain(current->domain) &&
>                        !is_idle_vcpu(current) )
>              {
> -                /* PV guest */
> +                /* PV(H) guest */

I would have expected PVH guests to use the HVM paths here, not
the PV ones. Can you clarify why you do it the other way around?

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on PV guests
  2014-02-04 11:14   ` Jan Beulich
@ 2014-02-04 15:07     ` Boris Ostrovsky
  0 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 15:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 06:14 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -400,6 +400,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>               return -EFAULT;
>>           pvpmu_finish(current->domain, &pmu_params);
>>           break;
>> +
>> +    case XENPMU_lvtpc_set:
>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        vpmu_lvtpc_update((uint32_t)pmu_params.d.val);
> Once again, please don't ignore (parts of) hypercall input values.

I can actually pass this value in the shared area where I already have a 
uint32_t for LVTPC. It will also save us from doing the copy.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for PV guests
  2014-02-04 11:22   ` Jan Beulich
@ 2014-02-04 15:26     ` Boris Ostrovsky
  2014-02-04 15:50       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 15:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 06:22 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>   int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>   {
>>       struct vcpu *v = current;
>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>> +    struct vpmu_struct *vpmu;
>>   
>> -    if ( vpmu->arch_vpmu_ops )
>> +    /* dom0 will handle this interrupt */
>> +    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
>> +        v = dom0->vcpu[smp_processor_id() % dom0->max_vcpus];
>> +
>> +    vpmu = vcpu_vpmu(v);
>> +    if ( !is_hvm_domain(v->domain) )
>> +    {
>> +        /* PV guest or dom0 is doing system profiling */
>> +        const struct cpu_user_regs *gregs;
>> +        int err;
>> +
>> +        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
>> +            return 1;
>> +
>> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
>> +        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>> +        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
>> +        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>> +
>> +        /* Store appropriate registers in xenpmu_data */
>> +        if ( is_pv_32bit_domain(current->domain) )
>> +        {
>> +            /*
>> +             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
>> +             * and therefore we treat it the same way as a non-priviledged
>> +             * PV 32-bit domain.
>> +             */
>> +            struct compat_cpu_user_regs *cmp;
>> +
>> +            gregs = guest_cpu_user_regs();
>> +
>> +            cmp = (struct compat_cpu_user_regs *)
>> +                    &v->arch.vpmu.xenpmu_data->pmu.r.regs;
> Deliberate type changes like this can easily (and more readably as
> well as more forward compatibly) be done using (void *).
>
>> +            XLAT_cpu_user_regs(cmp, gregs);
>> +        }
>> +        else if ( !is_control_domain(current->domain) &&
>> +                 !is_idle_vcpu(current) )
>> +        {
>> +            /* PV guest */
>> +            gregs = guest_cpu_user_regs();
>> +            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>> +                   gregs, sizeof(struct cpu_user_regs));
>> +        }
>> +        else
>> +            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>> +                   regs, sizeof(struct cpu_user_regs));
>> +
>> +        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
>> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
>> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
>> +
>> +        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
>> +        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
>> +        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
>> +
>> +        send_guest_vcpu_virq(v, VIRQ_XENPMU);
>> +
>> +        return 1;
>> +    }
>> +    else if ( vpmu->arch_vpmu_ops )
> If the previous (and only) if() branch returns unconditionally, using
> "else if" is more confusing then clarifying imo (and in any case
> needlessly growing the patch, even if just by a bit).

Not sure I understand what you are saying here.

Here is the code structure:

int vpmu_do_interrupt(struct cpu_user_regs *regs)
{
      if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
{
         // work
         return 1;
}
     else if ( vpmu->arch_vpmu_ops )
{
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;

         // other work
         return 1;
}

     return 0;
}

What do you propose?


-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for PV guests
  2014-02-04 15:26     ` Boris Ostrovsky
@ 2014-02-04 15:50       ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 15:50 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 04.02.14 at 16:26, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 02/04/2014 06:22 AM, Jan Beulich wrote:
>>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> +        return 1;
>>> +    }
>>> +    else if ( vpmu->arch_vpmu_ops )
>> If the previous (and only) if() branch returns unconditionally, using
>> "else if" is more confusing then clarifying imo (and in any case
>> needlessly growing the patch, even if just by a bit).
> 
> Not sure I understand what you are saying here.
> 
> Here is the code structure:
> 
> int vpmu_do_interrupt(struct cpu_user_regs *regs)
> {
>       if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
> {
>          // work
>          return 1;
> }
>      else if ( vpmu->arch_vpmu_ops )
> {

      if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
{
         // work
         return 1;
}
     if ( vpmu->arch_vpmu_ops )
{
         ...

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode
  2014-02-04 11:31   ` Jan Beulich
@ 2014-02-04 15:53     ` Boris Ostrovsky
  2014-02-04 16:01       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 15:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 06:31 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> @@ -152,33 +162,62 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>           err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
>>           vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>>   
>> -        /* Store appropriate registers in xenpmu_data */
>> -        if ( is_pv_32bit_domain(current->domain) )
>> +        if ( !is_hvm_domain(current->domain) )
>>           {
>> -            /*
>> -             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
>> -             * and therefore we treat it the same way as a non-priviledged
>> -             * PV 32-bit domain.
>> -             */
>> -            struct compat_cpu_user_regs *cmp;
>> -
>> -            gregs = guest_cpu_user_regs();
>> -
>> -            cmp = (struct compat_cpu_user_regs *)
>> -                    &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>> -            XLAT_cpu_user_regs(cmp, gregs);
>> +            uint16_t cs = (current->arch.flags & TF_kernel_mode) ? 0 : 0x3;
> The surrounding if checks !hvm, i.e. both PV and PVH can make it
> here. But TF_kernel_mode is meaningful for PV only.

As of this patch PVH doesn't work and you won't get into this code path. 
And the later patch (#16) addresses this. (Although it unnecessary 
calculates cs in the line above for PVH, only to call 
hvm_get_segment_register() later. I'll move it to !pvh clause there).

>
>> +
>> +            /* Store appropriate registers in xenpmu_data */
>> +            if ( is_pv_32bit_domain(current->domain) )
>> +            {
>> +                gregs = guest_cpu_user_regs();
>> +
>> +                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
>> +                     !is_pv_32bit_domain(v->domain) )
>> +                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>> +                           gregs, sizeof(struct cpu_user_regs));
>> +                else
>> +                {
>> +                    /*
>> +                     * 32-bit dom0 cannot process Xen's addresses (which are
>> +                     * 64 bit) and therefore we treat it the same way as a
>> +                     * non-priviledged PV 32-bit domain.
>> +                     */
>> +
>> +                    struct compat_cpu_user_regs *cmp;
>> +
>> +                    cmp = (struct compat_cpu_user_regs *)
>> +                        &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>> +                    XLAT_cpu_user_regs(cmp, gregs);
>> +                }
>> +            }
>> +            else if ( !is_control_domain(current->domain) &&
>> +                      !is_idle_vcpu(current) )
>> +            {
>> +                /* PV guest */
>> +                gregs = guest_cpu_user_regs();
>> +                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>> +                       gregs, sizeof(struct cpu_user_regs));
>> +            }
>> +            else
>> +                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>> +                       regs, sizeof(struct cpu_user_regs));
>> +
>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>> +            gregs->cs = cs;
> And now you store a NUL selector (i.e. just the RPL bits) into the
> output field?
>>           }
>> -        else if ( !is_control_domain(current->domain) &&
>> -                 !is_idle_vcpu(current) )
>> +        else
>>           {
>> -            /* PV guest */
>> +            /* HVM guest */
>> +            struct segment_register cs;
>> +
>>               gregs = guest_cpu_user_regs();
>>               memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>>                      gregs, sizeof(struct cpu_user_regs));
>> +
>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>> +            gregs->cs = cs.attr.fields.dpl;
> And here too? If that's intended, a code comment is a must.

This is HVM-only path, PVH or PV don't go here so cs should be valid.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-02-04 11:38   ` Jan Beulich
@ 2014-02-04 15:56     ` Boris Ostrovsky
  0 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 15:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 06:38 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> Save VPMU state during context switch for both HVM and PV guests unless we
>> are in PMU privileged mode (i.e. dom0 is doing all profiling) and the
>> switched
>> out domain is not the control domain. The latter condition is needed because
>> me may have just turned the privileged PMU mode on and thus need to save
>> last domain.
> While this is understandable, ...
>
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1444,17 +1444,16 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>       }
>>   
>>       if (prev != next)
>> -        update_runstate_area(prev);
>> -
>> -    if ( is_hvm_vcpu(prev) )
>>       {
>> -        if (prev != next)
>> +        update_runstate_area(prev);
>> +        if ( !(vpmu_mode & XENPMU_MODE_PRIV) ||
>> +             !is_control_domain(prev->domain) )
>>               vpmu_save(prev);
> ... I'd really like you to investigate ways to achieve the same effect
> without this extra second condition added to the context switch path.
> E.g. by synchronously issuing a save on all affected vCPU-s when
> privileged mode gets turned on.

Yes, I should do something like that.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode
  2014-02-04 15:53     ` Boris Ostrovsky
@ 2014-02-04 16:01       ` Jan Beulich
  2014-02-04 16:13         ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 16:01 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 04.02.14 at 16:53, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 02/04/2014 06:31 AM, Jan Beulich wrote:
>>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>>> +            gregs->cs = cs;
>> And now you store a NUL selector (i.e. just the RPL bits) into the
>> output field?
>>>           }
>>> -        else if ( !is_control_domain(current->domain) &&
>>> -                 !is_idle_vcpu(current) )
>>> +        else
>>>           {
>>> -            /* PV guest */
>>> +            /* HVM guest */
>>> +            struct segment_register cs;
>>> +
>>>               gregs = guest_cpu_user_regs();
>>>               memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>>>                      gregs, sizeof(struct cpu_user_regs));
>>> +
>>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>>> +            gregs->cs = cs.attr.fields.dpl;
>> And here too? If that's intended, a code comment is a must.
> 
> This is HVM-only path, PVH or PV don't go here so cs should be valid.

Isn't the reply of mine a few lines up in PV code? And why would
the selector being wrong for HVM be okay?

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode
  2014-02-04 16:01       ` Jan Beulich
@ 2014-02-04 16:13         ` Boris Ostrovsky
  2014-02-04 16:39           ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 16:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, jun.nakajima, andrew.cooper3, eddie.dong, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 02/04/2014 11:01 AM, Jan Beulich wrote:
>>>> On 04.02.14 at 16:53, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> On 02/04/2014 06:31 AM, Jan Beulich wrote:
>>>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>>>> +            gregs->cs = cs;
>>> And now you store a NUL selector (i.e. just the RPL bits) into the
>>> output field?
>>>>            }
>>>> -        else if ( !is_control_domain(current->domain) &&
>>>> -                 !is_idle_vcpu(current) )
>>>> +        else
>>>>            {
>>>> -            /* PV guest */
>>>> +            /* HVM guest */
>>>> +            struct segment_register cs;
>>>> +
>>>>                gregs = guest_cpu_user_regs();
>>>>                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>>>>                       gregs, sizeof(struct cpu_user_regs));
>>>> +
>>>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>>>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>>>> +            gregs->cs = cs.attr.fields.dpl;
>>> And here too? If that's intended, a code comment is a must.
>> This is HVM-only path, PVH or PV don't go here so cs should be valid.
> Isn't the reply of mine a few lines up in PV code? And why would
> the selector being wrong for HVM be okay?


This clause is for privileged profiling: we are in PV clause even though 
the interrupt is taken by an HVM guest.

The diff is somewhat difficult to follow so here is the flow:

    // in privileged mode 'v' is dom0's CPU.
    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
    {
         if ( !is_hvm_domain(current->domain) )
         {
              // either PV (including dom0) or Xen is interrupted
         }
         else
         {
              // This is the clause we are discussing. 'current' is HVM
              hvm_get_segment_register(current, x86_seg_cs, &cs);
         }
         send_guest_vcpu_virq(v, VIRQ_XENPMU);
         return 1;
    }

So I think CS should be correct for the guest, no?


-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support
  2014-02-04 11:48   ` Jan Beulich
@ 2014-02-04 16:31     ` Boris Ostrovsky
  2014-02-04 16:41       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 16:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 06:48 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:09, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> Add support for using NMIs as PMU interrupts.
>>
>> Most of processing is still performed by vpmu_do_interrupt(). However, since
>> certain operations are not NMI-safe we defer them to a softint that
>> vpmu_do_interrupt()
>> will schedule:
>> * For PV guests that would be send_guest_vcpu_virq() and
>> hvm_get_segment_register().
> Makes no sense - why would hvm_get_segment_register() be of any
> relevance to PV guests?

Poorly written explanation. What I meant here is that if we are in 
privileged profiling mode and the interrupted guest is an HVM one then 
we'll need to get CS for that guest, not for the guest doing profiling 
(i.e. dom0). I'll rewrite this.

>
> And then I'm still missing a reasonable level of analysis that the
> previously non-NMI-only interrupt handler is now safe to use in NMI
> context.

How about this?

With send_guest_vcpu_virq() and hvm_get_segment_register() for PV(H) and 
vlapic accesses for HVM moved to sofint, the only routines/macros that 
vpmu_do_interrupt() calls in NMI mode are:
* memcpy()
* querying domain type (is_XX_domain())
* guest_cpu_user_regs()
* XLAT_cpu_user_regs()
* raise_softirq()
* vcpu_vpmu()
* vpmu_ops->arch_vpmu_save()
* vpmu_ops->do_interrupt() (in the future for PVH support)

The latter two can only access PMU MSRs.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode
  2014-02-04 16:13         ` Boris Ostrovsky
@ 2014-02-04 16:39           ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 16:39 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 04.02.14 at 17:13, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 02/04/2014 11:01 AM, Jan Beulich wrote:
>>>>> On 04.02.14 at 16:53, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>> On 02/04/2014 06:31 AM, Jan Beulich wrote:
>>>>>>> On 21.01.14 at 20:08, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>>>>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>>>>> +            gregs->cs = cs;
>>>> And now you store a NUL selector (i.e. just the RPL bits) into the
>>>> output field?
>>>>>            }
>>>>> -        else if ( !is_control_domain(current->domain) &&
>>>>> -                 !is_idle_vcpu(current) )
>>>>> +        else
>>>>>            {
>>>>> -            /* PV guest */
>>>>> +            /* HVM guest */
>>>>> +            struct segment_register cs;
>>>>> +
>>>>>                gregs = guest_cpu_user_regs();
>>>>>                memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
>>>>>                       gregs, sizeof(struct cpu_user_regs));
>>>>> +
>>>>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>>>>> +            gregs = &v->arch.vpmu.xenpmu_data->pmu.r.regs;
>>>>> +            gregs->cs = cs.attr.fields.dpl;
>>>> And here too? If that's intended, a code comment is a must.
>>> This is HVM-only path, PVH or PV don't go here so cs should be valid.
>> Isn't the reply of mine a few lines up in PV code? And why would
>> the selector being wrong for HVM be okay?
> 
> 
> This clause is for privileged profiling: we are in PV clause even though 
> the interrupt is taken by an HVM guest.
> 
> The diff is somewhat difficult to follow so here is the flow:
> 
>     // in privileged mode 'v' is dom0's CPU.
>     if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
>     {
>          if ( !is_hvm_domain(current->domain) )
>          {
>               // either PV (including dom0) or Xen is interrupted
>          }
>          else
>          {
>               // This is the clause we are discussing. 'current' is HVM
>               hvm_get_segment_register(current, x86_seg_cs, &cs);
>          }
>          send_guest_vcpu_virq(v, VIRQ_XENPMU);
>          return 1;
>     }
> 
> So I think CS should be correct for the guest, no?

Honestly - I can't tell. All I can tell is that there's a bogus setting
of cs to 0 or 3 (depending on whether in kernel mode) or to the
dpl field of the descriptor read from the hardware. All of which
are wrong without a cleat comment stating why doing it this way
is okay/acceptable/necessary-for-the-time-being.

So either drop this bogus code, or comment it properly.

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support
  2014-02-04 16:31     ` Boris Ostrovsky
@ 2014-02-04 16:41       ` Jan Beulich
  2014-02-04 16:50         ` Boris Ostrovsky
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2014-02-04 16:41 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 04.02.14 at 17:31, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
> On 02/04/2014 06:48 AM, Jan Beulich wrote:
>> And then I'm still missing a reasonable level of analysis that the
>> previously non-NMI-only interrupt handler is now safe to use in NMI
>> context.
> 
> How about this?

Looks okay, except ...

> With send_guest_vcpu_virq() and hvm_get_segment_register() for PV(H) and 
> vlapic accesses for HVM moved to sofint, the only routines/macros that 
> vpmu_do_interrupt() calls in NMI mode are:
> * memcpy()
> * querying domain type (is_XX_domain())
> * guest_cpu_user_regs()
> * XLAT_cpu_user_regs()
> * raise_softirq()
> * vcpu_vpmu()
> * vpmu_ops->arch_vpmu_save()
> * vpmu_ops->do_interrupt() (in the future for PVH support)
> 
> The latter two can only access PMU MSRs.

... that this additionally needs to exclude things like
{rd,wr}msr_safe() (i.e. stuff raising exceptions that normally
get recovered from).

Jan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 16/17] x86/VPMU: Suport for PVH guests
  2014-02-04 11:51   ` Jan Beulich
@ 2014-02-04 16:44     ` Boris Ostrovsky
  0 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 16:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 06:51 AM, Jan Beulich wrote:
>>>> On 21.01.14 at 20:09, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> +        if ( is_pvh_domain(current->domain) && !(vpmu_mode & XENPMU_MODE_PRIV) )
>> +            if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>> +                return 0;
> Please fold chained if()s like this one.
>
>> @@ -237,7 +242,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>               else if ( !is_control_domain(current->domain) &&
>>                         !is_idle_vcpu(current) )
>>               {
>> -                /* PV guest */
>> +                /* PV(H) guest */
> I would have expected PVH guests to use the HVM paths here, not
> the PV ones. Can you clarify why you do it the other way around?

I could go either way but because PVH uses event channels for interrupts 
I went with PV path. To use HVM route I'd need to send NMIs to the guest 
and that is currently not quite working.

-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support
  2014-02-04 16:41       ` Jan Beulich
@ 2014-02-04 16:50         ` Boris Ostrovsky
  0 siblings, 0 replies; 48+ messages in thread
From: Boris Ostrovsky @ 2014-02-04 16:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	dietmar.hahn, xen-devel, jun.nakajima

On 02/04/2014 11:41 AM, Jan Beulich wrote:
>>>> On 04.02.14 at 17:31, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote:
>> On 02/04/2014 06:48 AM, Jan Beulich wrote:
>>> And then I'm still missing a reasonable level of analysis that the
>>> previously non-NMI-only interrupt handler is now safe to use in NMI
>>> context.
>> How about this?
> Looks okay, except ...
>
>> With send_guest_vcpu_virq() and hvm_get_segment_register() for PV(H) and
>> vlapic accesses for HVM moved to sofint, the only routines/macros that
>> vpmu_do_interrupt() calls in NMI mode are:
>> * memcpy()
>> * querying domain type (is_XX_domain())
>> * guest_cpu_user_regs()
>> * XLAT_cpu_user_regs()
>> * raise_softirq()
>> * vcpu_vpmu()
>> * vpmu_ops->arch_vpmu_save()
>> * vpmu_ops->do_interrupt() (in the future for PVH support)
>>
>> The latter two can only access PMU MSRs.
> ... that this additionally needs to exclude things like
> {rd,wr}msr_safe() (i.e. stuff raising exceptions that normally
> get recovered from).


I'll add that. And probably a comment in vendor-specific code to remind 
people that these routines need to be NMI-safe.


-boris

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2014-02-04 16:50 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-21 19:08 [PATCH v4 00/17] x86/PMU: Xen PMU PV support Boris Ostrovsky
2014-01-21 19:08 ` [PATCH v4 01/17] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2014-01-24 14:16   ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 02/17] x86/VPMU: Stop AMD counters when called from vpmu_save_force() Boris Ostrovsky
2014-01-21 19:08 ` [PATCH v4 03/17] x86/VPMU: Minor VPMU cleanup Boris Ostrovsky
2014-01-24 14:28   ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 04/17] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
2014-01-21 19:08 ` [PATCH v4 05/17] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
2014-01-21 19:08 ` [PATCH v4 06/17] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
2014-01-21 19:08 ` [PATCH v4 07/17] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2014-01-24 14:54   ` Jan Beulich
2014-01-24 16:49     ` Boris Ostrovsky
2014-01-24 16:57       ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 08/17] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2014-01-24 14:59   ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 09/17] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2014-01-24 15:10   ` Jan Beulich
2014-01-24 17:13     ` Boris Ostrovsky
2014-01-27  8:34       ` Jan Beulich
2014-01-27 15:20         ` Boris Ostrovsky
2014-01-27 15:29           ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 10/17] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
2014-01-31 16:58   ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 11/17] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
2014-02-04 11:14   ` Jan Beulich
2014-02-04 15:07     ` Boris Ostrovsky
2014-01-21 19:08 ` [PATCH v4 12/17] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2014-02-04 11:22   ` Jan Beulich
2014-02-04 15:26     ` Boris Ostrovsky
2014-02-04 15:50       ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 13/17] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2014-02-04 11:31   ` Jan Beulich
2014-02-04 15:53     ` Boris Ostrovsky
2014-02-04 16:01       ` Jan Beulich
2014-02-04 16:13         ` Boris Ostrovsky
2014-02-04 16:39           ` Jan Beulich
2014-01-21 19:08 ` [PATCH v4 14/17] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2014-02-04 11:38   ` Jan Beulich
2014-02-04 15:56     ` Boris Ostrovsky
2014-01-21 19:09 ` [PATCH v4 15/17] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
2014-02-04 11:48   ` Jan Beulich
2014-02-04 16:31     ` Boris Ostrovsky
2014-02-04 16:41       ` Jan Beulich
2014-02-04 16:50         ` Boris Ostrovsky
2014-01-21 19:09 ` [PATCH v4 16/17] x86/VPMU: Suport for PVH guests Boris Ostrovsky
2014-02-04 11:51   ` Jan Beulich
2014-02-04 16:44     ` Boris Ostrovsky
2014-01-21 19:09 ` [PATCH v4 17/17] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.