All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support
@ 2014-06-06 17:39 Boris Ostrovsky
  2014-06-06 17:39 ` [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
                   ` (18 more replies)
  0 siblings, 19 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:39 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel


Here is the seventh version of PV(H) PMU patches.

Changes in v7:

* When reading hypervisor symbols make the caller pass buffer length
  (as opposed to having this length be part of the API). Make the
  hypervisor buffer static, make xensyms_read() return zero-length
  string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
  drop compat_pf_symdata definition.
* Spread PVH support across patches as opposed to lumping it into a
  separate patch
* Rename vpmu_is_set_all() to vpmu_are_all_set()
* Split VPMU cleanup patch in two
* Use memmove when copying VMX guest and host MSRs
* Make padding of xen_arch_pmu's context union a constand that does not
  depend on arch context size.
* Set interface version to 0.1
* Check pointer validity in pvpmu_init/destroy()
* Fixed crash in core2_vpmu_dump()
* Fixed crash in vmx_add_msr()
* Break handling of Intel and AMD MSRs in traps.c into separate cases
* Pass full CS selector to guests
* Add lock in pvpmu init code to prevent potential race


Changes in v6:

* Two new patches:
  o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
  o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
* Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled
  guest from overwriting sampled_vcpu written by de-scheduled VPCU.
* Keep track of enabled counters on Intel. This was removed in earlier patches and
  was a mistake. As result of this change struct vpmu will have a pointer to private
  context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer
  on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests
  earlier).
  Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed
  agan (core2_vpmu_do_wrmsr() routine, mostly)
* Replaced references to dom0 with hardware_domain (and is_control_domain with
  is_hardware_domain for consistency)
* Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
* Reverted unnecessary changes in vpmu_initialise()'s switch statement
* Fixed comment in vpmu_do_interrupt


Changes in v5:

* Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()")
  as no longer needed
* Added patch number 2 that marks context as loaded before PMU registers are
  loaded. This prevents situation where a PMU interrupt may occur while context
  is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
  code)
* Renamed xenpmu.h files to pmu.h
* More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
  has_hvm_container_domain(). Also explicitly disabled support for PVH until
  patch 16 to make distinction between usage of the above macros more clear.
* Added support for disabling VPMU support during runtime.
* Disable VPMUs for non-privileged domains when switching to privileged
  profiling mode
* Added ARM stub for xen_arch_pmu_t
* Separated vpmu_mode from vpmu_features
* Moved CS register query to make sure we use appropriate query mechanism for
  various guest types.
* LVTPC is now set from value in shared area, not copied from dom0
* Various code and comments cleanup as suggested by Jan.

Changes in v4:

* Added support for PVH guests:
  o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10
  o more careful use of is_hvm_domain
  o Additional patch (16)
* Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
* Fixed dom0's VCPU selection in privileged mode
* Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
  (don't want to expose compat_cpu_user_regs in a public header)
* Renamed public structures by prefixing them with "xen_"
* Added an entry for xenpf_symdata in xlat.lst
* Fixed pv_cpuid check for vpmu-specific cpuid adjustments
* Varios code style fixes
* Eliminated anonymous unions
* Added more verbiage to NMI patch description


Changes in v3:

* Moved PMU MSR banks out from architectural context data structures to allow
for future expansion without protocol changes
* PMU interrupts can be either NMIs or regular vector interrupts (the latter
is the default)
* Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
race conditions with the guest
* Fixed races with PV guest in MSR access handlers
* More Intel VPMU cleanup
* Moved NMI-unsafe code from NMI handler
* Dropped changes to vcpu->is_running
* Added LVTPC apic handling (cached for PV guests)
* Separated privileged profiling mode into a standalone patch
* Separated NMI handling into a standalone patch


Changes in v2:

* Xen symbols are exported as data structure (as opoosed to a set of formatted
strings in v1). Even though one symbol per hypercall is returned performance
appears to be acceptable: reading whole file from dom0 userland takes on average
about twice as long as reading /proc/kallsyms
* More cleanup of Intel VPMU code to simplify publicly exported structures
* There is an architecture-independent and x86-specific public include files (ARM
has a stub)
* General cleanup of public include files to make them more presentable (and
to make auto doc generation better)
* Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
changes to common/schedule.c architecture-independent). Note that this is not
tested since I don't have access to ARM hardware.
* PCPU ID of interrupted processor is now passed to PV guest


The following patch series adds PMU support in Xen for PV(H)
guests. There is a companion patchset for Linux kernel. In addition,
another set of changes will be provided (later) for userland perf
code.

This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that have running dom0 VCPUs
on them.
* No backtrace support.
* Will fail to load under XSM: we ran out of bits in permissions vector and
this needs to be fixed separately

A few notes that may help reviewing:

* A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
CPU is used for passing registers' values as well as PMU state at the time of
PMU interrupt.
* PMU interrupts are taken by hypervisor either as NMIs or regular vector
interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
and as virtual interrupts to PV(H) guests
* PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
accesses xenpmu_data_t and flushes it to HW it before returning.
* PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
in addition to 'vpmu' boot option (which is preserved for back compatibility).
The following modes are provided:
  * disable: VPMU is off
  * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
  * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
    in guests is suspended.
* /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
/proc/kallsyms)
* VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
up from hvm subtree



Boris Ostrovsky (19):
  common/symbols: Export hypervisor symbols to privileged guest
  VPMU: Mark context LOADED before registers are loaded
  x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  x86/VPMU: Make vpmu marcos a bit more efficient
  intel/VPMU: Clean up Intel VPMU code
  vmx: Merge MSR management routines
  x86/VPMU: Handle APIC_LVTPC accesses
  intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  x86/VPMU: Add public xenpmu.h
  x86/VPMU: Make vpmu not HVM-specific
  x86/VPMU: Interface for setting PMU mode and flags
  x86/VPMU: Initialize PMU for PV guests
  x86/VPMU: Add support for PMU register handling on PV guests
  x86/VPMU: Handle PMU interrupts for PV guests
  x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  x86/VPMU: Add privileged PMU mode
  x86/VPMU: Save VPMU state for PV guests during context switch
  x86/VPMU: NMI-based VPMU support
  x86/VPMU: Move VPMU files up from hvm/ directory

 xen/arch/x86/Makefile                    |   1 +
 xen/arch/x86/domain.c                    |  15 +-
 xen/arch/x86/hvm/Makefile                |   1 -
 xen/arch/x86/hvm/hvm.c                   |   3 +-
 xen/arch/x86/hvm/svm/Makefile            |   1 -
 xen/arch/x86/hvm/svm/svm.c               |  17 +-
 xen/arch/x86/hvm/svm/vpmu.c              | 494 ----------------
 xen/arch/x86/hvm/vlapic.c                |   5 +-
 xen/arch/x86/hvm/vmx/Makefile            |   1 -
 xen/arch/x86/hvm/vmx/vmcs.c              | 113 ++--
 xen/arch/x86/hvm/vmx/vmx.c               |  24 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 935 ------------------------------
 xen/arch/x86/hvm/vpmu.c                  | 266 ---------
 xen/arch/x86/oprofile/op_model_ppro.c    |   8 +-
 xen/arch/x86/platform_hypercall.c        |  21 +
 xen/arch/x86/traps.c                     |  56 +-
 xen/arch/x86/vpmu.c                      | 750 ++++++++++++++++++++++++
 xen/arch/x86/vpmu_amd.c                  | 512 +++++++++++++++++
 xen/arch/x86/vpmu_intel.c                | 954 +++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/entry.S       |   4 +
 xen/arch/x86/x86_64/entry.S              |   4 +
 xen/common/event_channel.c               |   1 +
 xen/common/symbols.c                     |  54 ++
 xen/include/asm-x86/domain.h             |   2 +
 xen/include/asm-x86/hvm/vcpu.h           |   3 -
 xen/include/asm-x86/hvm/vmx/vmcs.h       |  10 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  51 --
 xen/include/asm-x86/hvm/vpmu.h           | 104 ----
 xen/include/asm-x86/vpmu.h               | 102 ++++
 xen/include/public/arch-arm.h            |   3 +
 xen/include/public/arch-x86/pmu.h        |  58 ++
 xen/include/public/platform.h            |  19 +
 xen/include/public/pmu.h                 |  99 ++++
 xen/include/public/xen.h                 |   2 +
 xen/include/xen/hypercall.h              |   4 +
 xen/include/xen/softirq.h                |   1 +
 xen/include/xen/symbols.h                |   3 +
 xen/include/xlat.lst                     |   1 +
 38 files changed, 2779 insertions(+), 1923 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
@ 2014-06-06 17:39 ` Boris Ostrovsky
  2014-06-06 17:57   ` Andrew Cooper
  2014-06-10 11:31   ` Jan Beulich
  2014-06-06 17:39 ` [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded Boris Ostrovsky
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:39 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
hypercall

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/platform_hypercall.c | 21 +++++++++++++++
 xen/common/symbols.c              | 54 +++++++++++++++++++++++++++++++++++++++
 xen/include/public/platform.h     | 19 ++++++++++++++
 xen/include/xen/symbols.h         |  3 +++
 xen/include/xlat.lst              |  1 +
 5 files changed, 98 insertions(+)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 2162811..2c602e7 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include <xen/cpu.h>
 #include <xen/pmstat.h>
 #include <xen/irq.h>
+#include <xen/symbols.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -601,6 +602,26 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_get_symbol:
+    {
+        static char name[KSYM_NAME_LEN + 1];
+        XEN_GUEST_HANDLE(char) nameh;
+        uint32_t namelen;
+
+        guest_from_compat_handle(nameh, op->u.symdata.name);
+
+        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
+                           &op->u.symdata.address, name);
+
+        namelen = min((uint32_t)strlen(name), op->u.symdata.namelen - 1);
+        name[namelen] = '\0'; /* if strlen(name) >= op->u.symdata.namelen */
+        if ( !ret && copy_to_guest(nameh, name, namelen) )
+            ret = -EFAULT;
+        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
+            ret = -EFAULT;
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index bc2fde6..2c0942d 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <public/platform.h>
+#include <xen/guest_access.h>
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -148,3 +150,55 @@ const char *symbols_lookup(unsigned long addr,
     *offset = addr - symbols_address(low);
     return namebuf;
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+    /*
+     * Get just the first code, look it up in the token table,
+     * and return the first char from this token.
+     */
+    return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name)
+{
+    /*
+     * Symbols are most likely accessed sequentially so we remember position
+     * from previous read. This can help us avoid the extra call to
+     * get_symbol_offset().
+     */
+    static uint64_t next_symbol, next_offset;
+    static DEFINE_SPINLOCK(symbols_mutex);
+
+    if ( *symnum > symbols_num_syms )
+        return -ERANGE;
+    if ( *symnum == symbols_num_syms )
+    {
+        /* No more symbols */
+        name[0] = '\0';
+        return 0;
+    }
+
+    spin_lock(&symbols_mutex);
+
+    if ( *symnum == 0 )
+        next_offset = next_symbol = 0;
+    if ( next_symbol != *symnum )
+        /* Non-sequential access */
+        next_offset = get_symbol_offset(*symnum);
+
+    *type = symbols_get_symbol_type(next_offset);
+    next_offset = symbols_expand_symbol(next_offset, name);
+    *address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
+
+    next_symbol = ++*symnum;
+
+    spin_unlock(&symbols_mutex);
+
+    return 0;
+}
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 053b9fa..afe301c 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -527,6 +527,24 @@ struct xenpf_core_parking {
 typedef struct xenpf_core_parking xenpf_core_parking_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 
+#define XENPF_get_symbol   61
+struct xenpf_symdata {
+    /* IN variables */
+    uint32_t namelen; /* size of 'name' buffer */
+
+    /* IN/OUT variables */
+    uint32_t symnum;  /* IN:  Symbol to read                            */
+                      /* OUT: Next available symbol. If same as IN then */
+                      /*      we reached the end                        */
+
+    /* OUT variables */
+    char type;
+    uint64_t address;
+    XEN_GUEST_HANDLE(char) name;
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -553,6 +571,7 @@ struct xen_platform_op {
         struct xenpf_cpu_hotadd        cpu_add;
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
+        struct xenpf_symdata           symdata;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 87cd77d..1fa0537 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -11,4 +11,7 @@ const char *symbols_lookup(unsigned long addr,
                            unsigned long *offset,
                            char *namebuf);
 
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9a35dd7..c8fafef 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -86,6 +86,7 @@
 ?	processor_px			platform.h
 !	psd_package			platform.h
 ?	xenpf_enter_acpi_sleep		platform.h
+!	xenpf_symdata			platform.h
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
 !	sched_poll			sched.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2014-06-06 17:39 ` [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-06-06 17:39 ` Boris Ostrovsky
  2014-06-06 17:59   ` Andrew Cooper
  2014-06-06 17:39 ` [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:39 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Because a PMU interrupt may be generated as soon as PMU registers are loaded (or,
more precisely, as soon as HW PMU is "armed") we don't want to delay marking
context as LOADED until after registers are loaded. Otherwise during interrupt
handling VPMU_CONTEXT_LOADED may not be set and this could be confusing.

(Technically, only SVM needs this change right now since VMX will "arm" PMU later,
during VMRUN when global control register is loaded from VMCS. However, both
AMD and Intel code will require this patch when we introduce PV VPMU).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 2 ++
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 2 ++
 xen/arch/x86/hvm/vpmu.c           | 3 +--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 66a3815..3ac7d53 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -203,6 +203,8 @@ static void amd_vpmu_load(struct vcpu *v)
         return;
     }
 
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+
     context_load(v);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 3129ebd..ccd14d9 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -369,6 +369,8 @@ static void core2_vpmu_load(struct vcpu *v)
     if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         return;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+
     __core2_vpmu_load(v);
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 21fbaba..63765fa 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -211,10 +211,9 @@ void vpmu_load(struct vcpu *v)
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
     {
         apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        /* Arch code needs to set VPMU_CONTEXT_LOADED */
         vpmu->arch_vpmu_ops->arch_vpmu_load(v);
     }
-
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
 }
 
 void vpmu_initialise(struct vcpu *v)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2014-06-06 17:39 ` [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
  2014-06-06 17:39 ` [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded Boris Ostrovsky
@ 2014-06-06 17:39 ` Boris Ostrovsky
  2014-06-06 18:02   ` Andrew Cooper
  2014-06-06 17:40 ` [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient Boris Ostrovsky
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:39 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

In preparation for making VPMU code shared with PV make sure that we we update
MSR bitmaps only for PV guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 21 +++++++++++++--------
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  8 +++++---
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 3ac7d53..f76c81e 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -244,7 +244,8 @@ static int amd_vpmu_save(struct vcpu *v)
 
     context_save(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -284,8 +285,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     /* For all counters, enable guest only mode for HVM guest */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        !(is_guest_mode(msr_content)) )
+    if ( has_hvm_container_domain(v->domain) &&
+         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+         !(is_guest_mode(msr_content)) )
     {
         set_guest_mode(msr_content);
     }
@@ -300,8 +302,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
-        if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_set_msr_bitmap(v);
+        if ( has_hvm_container_domain(v->domain) &&
+             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             amd_vpmu_set_msr_bitmap(v);
     }
 
     /* stop saving & restore if guest stops first counter */
@@ -311,8 +314,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
+        if ( has_hvm_container_domain(v->domain) &&
+             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
 
@@ -403,7 +407,8 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_domain(v->domain) &&
+         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index ccd14d9..f5f249f 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -335,7 +335,8 @@ static int core2_vpmu_save(struct vcpu *v)
     __core2_vpmu_save(v);
 
     /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap)
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
 
     return 1;
@@ -448,7 +449,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
     {
         __core2_vpmu_load(current);
         vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap )
+        if ( has_hvm_container_domain(current->domain) &&
+             cpu_has_vmx_msr_bitmap )
             core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
     }
     return 1;
@@ -815,7 +817,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
     xfree(core2_vpmu_cxt->pmu_enable);
     xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap )
+    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
     vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2014-06-06 17:39 ` [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 18:13   ` Andrew Cooper
  2014-06-06 17:40 ` [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Update macros that modify VPMU flags to allow changing multiple bits at once.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 5 +----
 xen/arch/x86/hvm/vpmu.c           | 3 +--
 xen/include/asm-x86/hvm/vpmu.h    | 9 +++++----
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index f5f249f..bf3cd33 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -326,10 +326,7 @@ static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        return 0;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) 
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
     __core2_vpmu_save(v);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 63765fa..d456aa6 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -143,8 +143,7 @@ void vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int pcpu = smp_processor_id();
 
-    if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) &&
-           vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) )
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
        return;
 
     vpmu->last_pcpu = pcpu;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 40f63fb..7cd7266 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -81,10 +81,11 @@ struct vpmu_struct {
 #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
 
 
-#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
-#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
+#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
+#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
+#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
+#define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
+#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 18:34   ` Andrew Cooper
  2014-06-06 17:40 ` [PATCH v7 06/19] vmx: Merge MSR management routines Boris Ostrovsky
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures with
fields in core2_vpmu_context.

Call core2_get_pmc_count() once, during initialization.

Properly clean up when core2_vpmu_alloc_resource() fails and add routines
to remove MSRs from VMCS.


Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c              |  51 +++++
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 337 ++++++++++++++-----------------
 xen/include/asm-x86/hvm/vmx/vmcs.h       |   2 +
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
 4 files changed, 205 insertions(+), 204 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 7564895..50259d3 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1208,6 +1208,32 @@ int vmx_add_guest_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_guest_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+
+    --msr_count;
+    memmove(&msr_area[idx], &msr_area[idx + 1],
+            sizeof(struct vmx_msr_entry) * (msr_count - idx));
+    msr_area[msr_count].index = 0;
+
+    curr->arch.hvm_vmx.msr_count = msr_count;
+    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
+    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
+}
+
 int vmx_add_host_load_msr(u32 msr)
 {
     struct vcpu *curr = current;
@@ -1238,6 +1264,31 @@ int vmx_add_host_load_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_host_load_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int idx,  msr_count = curr->arch.hvm_vmx.host_msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+
+    --msr_count;
+    memmove(&msr_area[idx], &msr_area[idx + 1],
+            sizeof(struct vmx_msr_entry) * (msr_count - idx));
+    msr_area[msr_count].index = 0;
+
+    curr->arch.hvm_vmx.host_msr_count = msr_count;
+    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+}
+
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
 {
     if ( !test_and_set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap) )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index bf3cd33..6df9f56 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -69,6 +69,27 @@
 static bool_t __read_mostly full_width_write;
 
 /*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+#define VPMU_CORE2_MAX_FIXED_PMCS     4
+struct core2_vpmu_context {
+    u64 fixed_ctrl;
+    u64 ds_area;
+    u64 pebs_enable;
+    u64 global_ovf_status;
+    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
+    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
+    struct arch_msr_pair arch_msr_pair[1];
+};
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
  * QUIRK to workaround an issue on various family 6 cpus.
  * The issue leads to endless PMC interrupt loops on the processor.
  * If the interrupt handler is running and a pmc reaches the value 0, this
@@ -88,11 +109,8 @@ static void check_pmc_quirk(void)
         is_pmc_quirk = 0;    
 }
 
-static int core2_get_pmc_count(void);
 static void handle_pmc_quirk(u64 msr_content)
 {
-    int num_gen_pmc = core2_get_pmc_count();
-    int num_fix_pmc  = 3;
     int i;
     u64 val;
 
@@ -100,7 +118,7 @@ static void handle_pmc_quirk(u64 msr_content)
         return;
 
     val = msr_content;
-    for ( i = 0; i < num_gen_pmc; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -112,7 +130,7 @@ static void handle_pmc_quirk(u64 msr_content)
         val >>= 1;
     }
     val = msr_content >> 32;
-    for ( i = 0; i < num_fix_pmc; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -125,75 +143,42 @@ static void handle_pmc_quirk(u64 msr_content)
     }
 }
 
-static const u32 core2_fix_counters_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR0,
-    MSR_CORE_PERF_FIXED_CTR1,
-    MSR_CORE_PERF_FIXED_CTR2
-};
-
 /*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
  */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */
-#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
-
-/* Core 2 Non-architectual Performance Control MSRs. */
-static const u32 core2_ctrls_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR_CTRL,
-    MSR_IA32_PEBS_ENABLE,
-    MSR_IA32_DS_AREA
-};
-
-struct pmumsr {
-    unsigned int num;
-    const u32 *msr;
-};
-
-static const struct pmumsr core2_fix_counters = {
-    VPMU_CORE2_NUM_FIXED,
-    core2_fix_counters_msr
-};
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
 
-static const struct pmumsr core2_ctrls = {
-    VPMU_CORE2_NUM_CTRLS,
-    core2_ctrls_msr
-};
-static int arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
+}
 
 /*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
  */
-static int core2_get_pmc_count(void)
+static int core2_get_fixed_pmc_count(void)
 {
-    u32 eax, ebx, ecx, edx;
-
-    if ( arch_pmc_cnt == 0 )
-    {
-        cpuid(0xa, &eax, &ebx, &ecx, &edx);
-        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
-    }
+    u32 eax;
 
-    return arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
 }
 
 static u64 core2_calc_intial_glb_ctrl_msr(void)
 {
-    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
-    u64 fix_pmc_bits  = (1 << 3) - 1;
-    return ((fix_pmc_bits << 32) | arch_pmc_bits);
+    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
+    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
+    return ( (fix_pmc_bits << 32) | arch_pmc_bits );
 }
 
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
-    u32 eax, ebx, ecx, edx;
+    u32 edx;
 
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
+    edx = cpuid_edx(0xa);
+    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
 }
 
 static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
@@ -201,9 +186,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     int i;
     u32 msr_index_pmc;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_fix_counters.msr[i] == msr_index )
+        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
         {
             *type = MSR_TYPE_COUNTER;
             *index = i;
@@ -211,14 +196,12 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
+    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
+        (msr_index == MSR_IA32_DS_AREA) ||
+        (msr_index == MSR_IA32_PEBS_ENABLE) )
     {
-        if ( core2_ctrls.msr[i] == msr_index )
-        {
-            *type = MSR_TYPE_CTRL;
-            *index = i;
-            return 1;
-        }
+        *type = MSR_TYPE_CTRL;
+        return 1;
     }
 
     if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
@@ -231,7 +214,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
 
     msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
     if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-         (msr_index_pmc < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) )
+         (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
     {
         *type = MSR_TYPE_ARCH_COUNTER;
         *index = msr_index_pmc - MSR_IA32_PERFCTR0;
@@ -239,7 +222,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     }
 
     if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + core2_get_pmc_count())) )
+         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
     {
         *type = MSR_TYPE_ARCH_CTRL;
         *index = msr_index - MSR_P6_EVNTSEL0;
@@ -254,13 +237,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     int i;
 
     /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                   msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
@@ -275,26 +258,28 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     }
 
     /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
 
         if ( full_width_write )
@@ -305,10 +290,12 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static inline void __core2_vpmu_save(struct vcpu *v)
@@ -316,10 +303,10 @@ static inline void __core2_vpmu_save(struct vcpu *v)
     int i;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        rdmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -344,20 +331,22 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     unsigned int i, pmc_start;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        wrmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
     else
         pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
         wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, core2_vpmu_cxt->arch_msr_pair[i].control);
+    }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -376,56 +365,39 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct core2_vpmu_context *core2_vpmu_cxt;
-    struct core2_pmu_enable *pmu_enable;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
     if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
 
-    pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) +
-                               core2_get_pmc_count() - 1);
-    if ( !pmu_enable )
-        goto out1;
-
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (core2_get_pmc_count()-1)*sizeof(struct arch_msr_pair));
+                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
     if ( !core2_vpmu_cxt )
-        goto out2;
-    core2_vpmu_cxt->pmu_enable = pmu_enable;
+        goto out_err;
+
     vpmu->context = (void *)core2_vpmu_cxt;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
     return 1;
- out2:
-    xfree(pmu_enable);
- out1:
-    gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU feature is "
-             "unavailable on domain %d vcpu %d.\n",
-             v->vcpu_id, v->domain->domain_id);
-    return 0;
-}
 
-static void core2_vpmu_save_msr_context(struct vcpu *v, int type,
-                                       int index, u64 msr_data)
-{
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+out_err:
+    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    release_pmu_ownship(PMU_OWNER_HVM);
 
-    switch ( type )
-    {
-    case MSR_TYPE_CTRL:
-        core2_vpmu_cxt->ctrls[index] = msr_data;
-        break;
-    case MSR_TYPE_ARCH_CTRL:
-        core2_vpmu_cxt->arch_msr_pair[index].control = msr_data;
-        break;
-    }
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
 }
 
 static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
@@ -436,10 +408,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
         return 0;
 
     if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-	 (vpmu->context != NULL ||
-	  !core2_vpmu_alloc_resource(current)) )
+         !core2_vpmu_alloc_resource(current) )
         return 0;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
     /* Do the lazy load staff. */
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
@@ -455,8 +425,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
-    u64 global_ctrl, non_global_ctrl;
-    char pmu_enable = 0;
+    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -501,6 +470,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
         return 1;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
@@ -513,57 +483,48 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
-            core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0;
+            core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 1;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
-        for ( i = 0; i < core2_get_pmc_count(); i++ )
-        {
-            rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] =
-                    global_ctrl & (non_global_ctrl >> 22) & 1;
-            global_ctrl >>= 1;
-        }
-
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
-        global_ctrl = msr_content >> 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
-        {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
-            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
-            global_ctrl >>= 1;
-        }
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        non_global_ctrl = msr_content;
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        global_ctrl >>= 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        core2_vpmu_cxt->enabled_cntrs &=
+                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        if ( msr_content != 0 )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
-            non_global_ctrl >>= 4;
-            global_ctrl >>= 1;
+            u64 val = msr_content;
+            for ( i = 0; i < fixed_pmc_cnt; i++ )
+            {
+                if ( val & 3 )
+                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                val >>= FIXED_CTR_CTRL_BITS;
+            }
         }
+
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
         break;
     default:
         tmp = msr - MSR_P6_EVNTSEL0;
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        if ( tmp >= 0 && tmp < core2_get_pmc_count() )
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] =
-                (global_ctrl >> tmp) & (msr_content >> 22) & 1;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+
+            if ( msr_content & (1ULL << 22) )
+                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+            else
+                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+
+            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+        }
     }
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i];
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i];
-    pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable;
-    if ( pmu_enable )
+    if ((global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
+        (core2_vpmu_cxt->ds_area != 0)  )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -581,7 +542,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
     }
 
-    core2_vpmu_save_msr_context(v, type, index, msr_content);
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -597,7 +557,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             if  ( msr == MSR_IA32_DS_AREA )
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (VPMU_CORE2_NUM_FIXED * FIXED_CTR_CTRL_BITS)) - 1);
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
             if (msr_content & mask)
                 inject_gp = 1;
             break;
@@ -682,7 +642,7 @@ static void core2_vpmu_do_cpuid(unsigned int input,
 static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i, num;
+    int i;
     const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
     u64 val;
 
@@ -700,27 +660,25 @@ static void core2_vpmu_dump(const struct vcpu *v)
 
     printk("    vPMU running\n");
     core2_vpmu_cxt = vpmu->context;
-    num = core2_get_pmc_count();
+
     /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < num; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
 
-        if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] )
-            printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-                   i, msr_pair[i].counter, msr_pair[i].control);
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+               i, msr_pair[i].counter, msr_pair[i].control);
     }
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
      */
-    val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] )
-            printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-                   i, core2_vpmu_cxt->fix_counters[i],
-                   val & FIXED_CTR_CTRL_MASK);
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, core2_vpmu_cxt->fix_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
 }
@@ -738,7 +696,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
         core2_vpmu_cxt->global_ovf_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
     else
@@ -801,18 +759,27 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         }
     }
 func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
+    {
+        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
+        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
+               fixed_pmc_cnt);
+    }
     check_pmc_quirk();
+
     return 0;
 }
 
 static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
 
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
-    xfree(core2_vpmu_cxt->pmu_enable);
+
     xfree(vpmu->context);
     if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 445b39f..50befe1 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -480,7 +480,9 @@ void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
 int vmx_add_guest_msr(u32 msr);
+void vmx_rm_guest_msr(u32 msr);
 int vmx_add_host_load_msr(u32 msr);
+void vmx_rm_host_load_msr(u32 msr);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
index 60b05fd..410372d 100644
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
@@ -23,29 +23,10 @@
 #ifndef __ASM_X86_HVM_VPMU_CORE_H_
 #define __ASM_X86_HVM_VPMU_CORE_H_
 
-/* Currently only 3 fixed counters are supported. */
-#define VPMU_CORE2_NUM_FIXED 3
-/* Currently only 3 Non-architectual Performance Control MSRs */
-#define VPMU_CORE2_NUM_CTRLS 3
-
 struct arch_msr_pair {
     u64 counter;
     u64 control;
 };
 
-struct core2_pmu_enable {
-    char ds_area_enable;
-    char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
-    char arch_pmc_enable[1];
-};
-
-struct core2_vpmu_context {
-    struct core2_pmu_enable *pmu_enable;
-    u64 fix_counters[VPMU_CORE2_NUM_FIXED];
-    u64 ctrls[VPMU_CORE2_NUM_CTRLS];
-    u64 global_ovf_status;
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 06/19] vmx: Merge MSR management routines
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-11  9:55   ` Jan Beulich
  2014-06-06 17:40 ` [PATCH v7 07/19] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

vmx_add_host_load_msr()/vmx_rm_guest_msr() and vmx_add_guest_msr()/vmx_rm_guest_msr()
share fair amount of code. Merge them to simplify code maintenance.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 156 +++++++++++++++++--------------------
 xen/arch/x86/hvm/vmx/vmx.c         |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c  |   8 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  10 ++-
 4 files changed, 85 insertions(+), 93 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 50259d3..436e8f4 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1176,117 +1176,107 @@ int vmx_write_guest_msr(u32 msr, u64 val)
     return -ESRCH;
 }
 
-int vmx_add_guest_msr(u32 msr)
+int vmx_add_msr(u32 msr, u8 type)
 {
     struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+    unsigned int idx, *msr_count;
+    struct vmx_msr_entry **msr_area;
 
-    if ( msr_area == NULL )
+    ASSERT( (type == VMX_GUEST_MSR) || (type == VMX_HOST_MSR) );
+
+    if ( type == VMX_GUEST_MSR )
     {
-        if ( (msr_area = alloc_xenheap_page()) == NULL )
-            return -ENOMEM;
-        curr->arch.hvm_vmx.msr_area = msr_area;
-        __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(msr_area));
-        __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(msr_area));
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+    }
+    else
+    {
+        msr_count = &curr->arch.hvm_vmx.host_msr_count;
+        msr_area = &curr->arch.hvm_vmx.host_msr_area;
     }
 
-    for ( i = 0; i < msr_count; i++ )
-        if ( msr_area[i].index == msr )
-            return 0;
-
-    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
-        return -ENOSPC;
-
-    msr_area[msr_count].index = msr;
-    msr_area[msr_count].mbz   = 0;
-    msr_area[msr_count].data  = 0;
-    curr->arch.hvm_vmx.msr_count = ++msr_count;
-    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
-    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
-
-    return 0;
-}
-
-void vmx_rm_guest_msr(u32 msr)
-{
-    struct vcpu *curr = current;
-    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
-
-    if ( msr_area == NULL )
-        return;
-
-    for ( idx = 0; idx < msr_count; idx++ )
-        if ( msr_area[idx].index == msr )
-            break;
-
-    if ( idx == msr_count )
-        return;
-
-    --msr_count;
-    memmove(&msr_area[idx], &msr_area[idx + 1],
-            sizeof(struct vmx_msr_entry) * (msr_count - idx));
-    msr_area[msr_count].index = 0;
-
-    curr->arch.hvm_vmx.msr_count = msr_count;
-    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
-    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
-}
-
-int vmx_add_host_load_msr(u32 msr)
-{
-    struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.host_msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
-
-    if ( msr_area == NULL )
+    if ( *msr_area == NULL )
     {
-        if ( (msr_area = alloc_xenheap_page()) == NULL )
+        if ( (*msr_area = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
-        curr->arch.hvm_vmx.host_msr_area = msr_area;
-        __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(msr_area));
+
+        if ( type == VMX_GUEST_MSR )
+        {
+            __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+        }
+        else
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
     }
 
-    for ( i = 0; i < msr_count; i++ )
-        if ( msr_area[i].index == msr )
+    for ( idx = 0; idx < *msr_count; idx++ )
+        if ( (*msr_area)[idx].index == msr )
             return 0;
 
-    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
+    if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
         return -ENOSPC;
 
-    msr_area[msr_count].index = msr;
-    msr_area[msr_count].mbz   = 0;
-    rdmsrl(msr, msr_area[msr_count].data);
-    curr->arch.hvm_vmx.host_msr_count = ++msr_count;
-    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+    (*msr_area)[*msr_count].index = msr;
+    (*msr_area)[*msr_count].mbz   = 0;
+    (*msr_count)++;
+    if ( type == VMX_GUEST_MSR )
+    {
+        (*msr_area)[*msr_count - 1].data  = 0;
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+    }
+    else
+    {
+        rdmsrl(msr, (*msr_area)[*msr_count - 1].data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+    }
 
     return 0;
 }
 
-void vmx_rm_host_load_msr(u32 msr)
+void vmx_rm_msr(u32 msr, u8 type)
 {
     struct vcpu *curr = current;
-    unsigned int idx,  msr_count = curr->arch.hvm_vmx.host_msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
+    unsigned int idx, *msr_count;
+    struct vmx_msr_entry **msr_area;
 
-    if ( msr_area == NULL )
+    ASSERT( (type == VMX_GUEST_MSR) || (type == VMX_HOST_MSR) );
+    
+    if ( type == VMX_GUEST_MSR )
+    {
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+    }
+    else
+    {
+        msr_count = &curr->arch.hvm_vmx.host_msr_count;
+        msr_area = &curr->arch.hvm_vmx.host_msr_area;
+    }
+
+    if ( *msr_area == NULL )
         return;
 
-    for ( idx = 0; idx < msr_count; idx++ )
-        if ( msr_area[idx].index == msr )
+    for ( idx = 0; idx < *msr_count; idx++ )
+        if ( (*msr_area)[idx].index == msr )
             break;
 
-    if ( idx == msr_count )
+    if ( idx == *msr_count )
         return;
 
-    --msr_count;
-    memmove(&msr_area[idx], &msr_area[idx + 1],
-            sizeof(struct vmx_msr_entry) * (msr_count - idx));
-    msr_area[msr_count].index = 0;
+    --(*msr_count);
+    memmove(&(*msr_area)[idx], &(*msr_area)[idx + 1],
+            sizeof(struct vmx_msr_entry) * (*msr_count - idx));
+    msr_area[(*msr_count)]->index = 0;
 
-    curr->arch.hvm_vmx.host_msr_count = msr_count;
-    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+    if ( type == VMX_GUEST_MSR )
+    {
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+    }
+    else
+    {
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+    }
 }
 
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d45fb7f..6f55c9e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2254,12 +2254,12 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
+                    if ( (rc = vmx_add_msr(lbr->base + i, VMX_GUEST_MSR)) == 0 )
                         vmx_disable_intercept_for_msr(v, lbr->base + i, MSR_TYPE_R | MSR_TYPE_W);
         }
 
         if ( (rc < 0) ||
-             (vmx_add_host_load_msr(msr) < 0) )
+             (vmx_add_msr(msr, VMX_HOST_MSR) < 0) )
             hvm_inject_hw_exception(TRAP_machine_check, 0);
         else
         {
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 6df9f56..7c49f8a 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -370,10 +370,10 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
         goto out_err;
 
-    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
@@ -390,8 +390,8 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     return 1;
 
 out_err:
-    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
-    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
+    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
     release_pmu_ownship(PMU_OWNER_HVM);
 
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 50befe1..dd34b2c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -475,14 +475,16 @@ enum vmcs_field {
 
 #define MSR_TYPE_R 1
 #define MSR_TYPE_W 2
+
+#define VMX_GUEST_MSR 0
+#define VMX_HOST_MSR  1
+
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
-int vmx_add_guest_msr(u32 msr);
-void vmx_rm_guest_msr(u32 msr);
-int vmx_add_host_load_msr(u32 msr);
-void vmx_rm_host_load_msr(u32 msr);
+int vmx_add_msr(u32 msr, u8 type);
+void vmx_rm_msr(u32 msr, u8 type);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 07/19] x86/VPMU: Handle APIC_LVTPC accesses
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 06/19] vmx: Merge MSR management routines Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 08/19] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Update APIC_LVTPC vector when HVM guest writes to it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |  4 ----
 xen/arch/x86/hvm/vlapic.c         |  5 ++++-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 17 -----------------
 xen/arch/x86/hvm/vpmu.c           |  8 ++++++++
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 5 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index f76c81e..5c0f5df 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -299,8 +299,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
         if ( has_hvm_container_domain(v->domain) &&
              !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
@@ -311,8 +309,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
         if ( has_hvm_container_domain(v->domain) &&
              ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index cd7e872..f1b543c 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,6 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
@@ -734,8 +735,10 @@ static int vlapic_reg_write(struct vcpu *v,
             vlapic_adjust_i8259_target(v->domain);
             pt_may_unmask_irq(v->domain, NULL);
         }
-        if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
+        else if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
             pt_may_unmask_irq(NULL, &vlapic->pt);
+        else if ( offset == APIC_LVTPC )
+            vpmu_lvtpc_update(val);
         break;
 
     case APIC_TMICT:
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 7c49f8a..94cfd6d 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -529,19 +529,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    /* Setup LVTPC in local apic */
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
-    }
-    else
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
-    }
-
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -707,10 +694,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
             return 0;
     }
 
-    /* HW sets the MASK bit when performance counter interrupt occurs*/
-    vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED;
-    apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-
     return 1;
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index d456aa6..4ddac7d 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -64,6 +64,14 @@ static void __init parse_vpmu_param(char *s)
     }
 }
 
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 7cd7266..fe181fc 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -87,6 +87,7 @@ struct vpmu_struct {
 #define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
 #define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
 
+void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 08/19] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 07/19] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

MSR_CORE_PERF_GLOBAL_CTRL register should be set zero initially. It is up to
the guest to set it so that counters are enabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 94cfd6d..c7322f5 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -165,13 +165,6 @@ static int core2_get_fixed_pmc_count(void)
     return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
 }
 
-static u64 core2_calc_intial_glb_ctrl_msr(void)
-{
-    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
-    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
-    return ( (fix_pmc_bits << 32) | arch_pmc_bits );
-}
-
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
@@ -375,8 +368,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
         goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                 core2_calc_intial_glb_ctrl_msr());
+    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
                     (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 08/19] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 19:32   ` Andrew Cooper
  2014-06-11 10:03   ` Jan Beulich
  2014-06-06 17:40 ` [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c              |  81 +++++++++++----------
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 117 +++++++++++++++++--------------
 xen/arch/x86/hvm/vpmu.c                  |   5 ++
 xen/arch/x86/oprofile/op_model_ppro.c    |   6 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 ---------
 xen/include/asm-x86/hvm/vpmu.h           |  16 ++---
 xen/include/public/arch-arm.h            |   3 +
 xen/include/public/arch-x86/pmu.h        |  58 +++++++++++++++
 xen/include/public/pmu.h                 |  38 ++++++++++
 9 files changed, 224 insertions(+), 132 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 5c0f5df..b5ca577 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vpmu.h>
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include <public/pmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,10 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+#define AMD_MAX_COUNTERS    6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
     MSR_K7_PERFCTR0,
@@ -83,12 +84,10 @@ static const u32 AMD_F15H_CTRLS[] = {
     MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-    u64 counters[MAX_NUM_COUNTERS];
-    u64 ctrls[MAX_NUM_COUNTERS];
-    bool_t msr_bitmap_set;
-};
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)    { (vpmu)->priv_context = (void *)-1L; }
+#define msr_bitmap_off(vpmu)   { (vpmu)->priv_context = NULL; }
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
 
 static inline int get_pmu_reg_type(u32 addr)
 {
@@ -142,7 +141,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -150,14 +148,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
     }
 
-    ctxt->msr_bitmap_set = 1;
+    msr_bitmap_on(vpmu);
 }
 
 static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -165,7 +162,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
     }
 
-    ctxt->msr_bitmap_set = 0;
+    msr_bitmap_off(vpmu);
 }
 
 static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
@@ -177,19 +174,22 @@ static inline void context_load(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     for ( i = 0; i < num_counters; i++ )
     {
-        wrmsrl(counters[i], ctxt->counters[i]);
-        wrmsrl(ctrls[i], ctxt->ctrls[i]);
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
     }
 }
 
 static void amd_vpmu_load(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     vpmu_reset(vpmu, VPMU_FROZEN);
 
@@ -198,7 +198,7 @@ static void amd_vpmu_load(struct vcpu *v)
         unsigned int i;
 
         for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctxt->ctrls[i]);
+            wrmsrl(ctrls[i], ctrl_regs[i]);
 
         return;
     }
@@ -212,17 +212,17 @@ static inline void context_save(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
 
     /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
     for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], ctxt->counters[i]);
+        rdmsrl(counters[i], counter_regs[i]);
 }
 
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctx = vpmu->context;
     unsigned int i;
 
     /*
@@ -245,7 +245,7 @@ static int amd_vpmu_save(struct vcpu *v)
     context_save(v);
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
+         has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -256,7 +256,9 @@ static void context_update(unsigned int msr, u64 msr_content)
     unsigned int i;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     if ( k7_counters_mirrored &&
         ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
@@ -268,12 +270,12 @@ static void context_update(unsigned int msr, u64 msr_content)
     {
        if ( msr == ctrls[i] )
        {
-           ctxt->ctrls[i] = msr_content;
+           ctrl_regs[i] = msr_content;
            return;
        }
         else if (msr == counters[i] )
         {
-            ctxt->counters[i] = msr_content;
+            counter_regs[i] = msr_content;
             return;
         }
     }
@@ -300,8 +302,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
 
-        if ( has_hvm_container_domain(v->domain) &&
-             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -310,8 +311,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( has_hvm_container_domain(v->domain) &&
-             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -352,7 +352,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 static int amd_vpmu_initialise(struct vcpu *v)
 {
-    struct amd_vpmu_context *ctxt;
+    struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
 
@@ -382,7 +382,9 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc(struct amd_vpmu_context);
+    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
+			 sizeof(uint64_t) * AMD_MAX_COUNTERS + 
+			 sizeof(uint64_t) * AMD_MAX_COUNTERS);
     if ( !ctxt )
     {
         gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
@@ -391,7 +393,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
         return -ENOMEM;
     }
 
+    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
+
     vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
     return 0;
 }
@@ -403,8 +409,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( has_hvm_container_domain(v->domain) &&
-         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
@@ -421,7 +426,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
 static void amd_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct amd_vpmu_context *ctxt = vpmu->context;
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
     unsigned int i;
 
     printk("    VPMU state: 0x%x ", vpmu->flags);
@@ -451,8 +458,8 @@ static void amd_vpmu_dump(const struct vcpu *v)
         rdmsrl(ctrls[i], ctrl);
         rdmsrl(counters[i], cntr);
         printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctxt->ctrls[i], ctrl,
-               counters[i], ctxt->counters[i], cntr);
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
     }
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index c7322f5..b5bc8ea 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -35,8 +35,8 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <public/sched.h>
 #include <public/hvm/save.h>
+#include <public/pmu.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
@@ -68,6 +68,10 @@
 #define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
 static bool_t __read_mostly full_width_write;
 
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
 /*
  * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
  * counters. 4 bits for every counter.
@@ -75,17 +79,6 @@ static bool_t __read_mostly full_width_write;
 #define FIXED_CTR_CTRL_BITS 4
 #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
 
-#define VPMU_CORE2_MAX_FIXED_PMCS     4
-struct core2_vpmu_context {
-    u64 fixed_ctrl;
-    u64 ds_area;
-    u64 pebs_enable;
-    u64 global_ovf_status;
-    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
-    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 /* Number of general-purpose and fixed performance counters */
 static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
 
@@ -225,6 +218,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     return 0;
 }
 
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
@@ -294,12 +288,15 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 static inline void __core2_vpmu_save(struct vcpu *v)
 {
     int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -322,10 +319,13 @@ static int core2_vpmu_save(struct vcpu *v)
 static inline void __core2_vpmu_load(struct vcpu *v)
 {
     unsigned int i, pmc_start;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
@@ -333,8 +333,8 @@ static inline void __core2_vpmu_load(struct vcpu *v)
         pmc_start = MSR_IA32_PERFCTR0;
     for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL0 + i, core2_vpmu_cxt->arch_msr_pair[i].control);
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
     }
 
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
@@ -357,7 +357,8 @@ static void core2_vpmu_load(struct vcpu *v)
 static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *p = NULL;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
@@ -370,12 +371,20 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-    if ( !core2_vpmu_cxt )
+    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                   sizeof(uint64_t) * fixed_pmc_cnt +
+                                   sizeof(struct xen_pmu_cntr_pair) *
+                                   arch_pmc_cnt);
+    p = xzalloc_bytes(sizeof(uint64_t));
+    if ( !core2_vpmu_cxt || !p )
         goto out_err;
 
-    vpmu->context = (void *)core2_vpmu_cxt;
+    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+                                    sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = core2_vpmu_cxt;
+    vpmu->priv_context = p;
 
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
@@ -386,6 +395,9 @@ out_err:
     vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
     release_pmu_ownship(PMU_OWNER_HVM);
 
+    xfree(core2_vpmu_cxt);
+    xfree(p);
+
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
            v->vcpu_id, v->domain->domain_id);
 
@@ -422,7 +434,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *enabled_cntrs;
 
     if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -448,10 +461,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     }
 
     core2_vpmu_cxt = vpmu->context;
+    enabled_cntrs = vpmu->priv_context;
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_status &= ~msr_content;
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -485,15 +499,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        core2_vpmu_cxt->enabled_cntrs &=
-                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
             u64 val = msr_content;
             for ( i = 0; i < fixed_pmc_cnt; i++ )
             {
                 if ( val & 3 )
-                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                    *enabled_cntrs |= (1ULL << 32) << i;
                 val >>= FIXED_CTR_CTRL_BITS;
             }
         }
@@ -504,19 +517,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         tmp = msr - MSR_P6_EVNTSEL0;
         if ( tmp >= 0 && tmp < arch_pmc_cnt )
         {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
-                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+                *enabled_cntrs |= 1ULL << tmp;
             else
-                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+                *enabled_cntrs &= ~(1ULL << tmp);
 
-            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            xen_pmu_cntr_pair[tmp].control = msr_content;
         }
     }
 
-    if ((global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
-        (core2_vpmu_cxt->ds_area != 0)  )
+    if ((global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0)  )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -562,7 +577,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
 
     if ( core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -573,7 +588,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = 0;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_ovf_status;
+            *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
@@ -622,10 +637,12 @@ static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int i;
-    const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
     u64 val;
+    uint64_t *fixed_counters;
+    struct xen_pmu_cntr_pair *cntr_pair;
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
          return;
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
@@ -638,16 +655,15 @@ static void core2_vpmu_dump(const struct vcpu *v)
     }
 
     printk("    vPMU running\n");
-    core2_vpmu_cxt = vpmu->context;
+
+    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
 
     /* Print the contents of the counter and its configuration msr. */
     for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
-
         printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, msr_pair[i].counter, msr_pair[i].control);
-    }
+            i, cntr_pair[i].counter, cntr_pair[i].control);
+
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
@@ -656,7 +672,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, core2_vpmu_cxt->fix_counters[i],
+               i, fixed_counters[i],
                val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
@@ -667,14 +683,14 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     u64 msr_content;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
 
     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
     if ( msr_content )
     {
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_ovf_status |= msr_content;
+        core2_vpmu_cxt->global_status |= msr_content;
         msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
@@ -737,12 +753,6 @@ func_out:
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
-    {
-        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
-        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
-               fixed_pmc_cnt);
-    }
     check_pmc_quirk();
 
     return 0;
@@ -756,6 +766,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
 
     xfree(vpmu->context);
+    xfree(vpmu->priv_context);
     if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 4ddac7d..56b48be 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <public/pmu.h>
 
 /*
  * "vpmu" :     vpmu generally enabled
@@ -228,6 +229,10 @@ void vpmu_initialise(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t vendor = current_cpu_data.x86_vendor;
 
+    BUILD_BUG_ON((sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ) ||
+                 (sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ));
+
+
     if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         vpmu_destroy(v);
     vpmu_clear(vpmu);
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index 3225937..5aae2e7 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -20,11 +20,15 @@
 #include <asm/regs.h>
 #include <asm/current.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
 
+struct arch_msr_pair {
+    u64 counter;
+    u64 control;
+};
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
deleted file mode 100644
index 410372d..0000000
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-
-/*
- * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_CORE_H_
-#define __ASM_X86_HVM_VPMU_CORE_H_
-
-struct arch_msr_pair {
-    u64 counter;
-    u64 control;
-};
-
-#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
-
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index fe181fc..83b5461 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -22,6 +22,8 @@
 #ifndef __ASM_X86_HVM_VPMU_H_
 #define __ASM_X86_HVM_VPMU_H_
 
+#include <public/pmu.h>
+
 /*
  * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
  * See arch/x86/hvm/vpmu.c.
@@ -29,12 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.hvm_vcpu.vpmu))
-#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
@@ -42,6 +41,9 @@
 #define MSR_TYPE_ARCH_COUNTER       3
 #define MSR_TYPE_ARCH_CTRL          4
 
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
 
 /* Arch specific operations shared by all vpmus */
 struct arch_vpmu_ops {
@@ -64,7 +66,8 @@ struct vpmu_struct {
     u32 flags;
     u32 last_pcpu;
     u32 hw_lapic_lvtpc;
-    void *context;
+    void *context;      /* May be shared with PV guest */
+    void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
 };
 
@@ -76,11 +79,6 @@ struct vpmu_struct {
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
 
-/* VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-
 #define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
 #define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
 #define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 7496556..e982b53 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -388,6 +388,9 @@ typedef uint64_t xen_callback_t;
 
 #endif
 
+/* Stub definition of PMU structure */
+typedef struct xen_arch_pmu {} xen_arch_pmu_t;
+
 #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
 
 /*
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
new file mode 100644
index 0000000..059ed52
--- /dev/null
+++ b/xen/include/public/arch-x86/pmu.h
@@ -0,0 +1,58 @@
+#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
+#define __XEN_PUBLIC_ARCH_X86_PMU_H__
+
+/* x86-specific PMU definitions */
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+    uint32_t counters;       /* Offset to counter MSRs */
+    uint32_t ctrls;          /* Offset to control MSRs */
+};
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+    uint64_t counter;
+    uint64_t control;
+};
+
+struct xen_pmu_intel_ctxt {
+    uint64_t global_ctrl;
+    uint64_t global_ovf_ctrl;
+    uint64_t global_status;
+    uint64_t fixed_ctrl;
+    uint64_t ds_area;
+    uint64_t pebs_enable;
+    uint64_t debugctl;
+    uint32_t fixed_counters;  /* Offset to fixed counter MSRs */
+    uint32_t arch_counters;   /* Offset to architectural counter MSRs */
+};
+
+struct xen_arch_pmu {
+    union {
+        struct cpu_user_regs regs;
+        uint8_t pad[256];
+    } r;
+    union {
+        uint32_t lapic_lvtpc;
+        uint64_t pad;
+    } l;
+    union {
+        struct xen_pmu_amd_ctxt amd;
+        struct xen_pmu_intel_ctxt intel;
+#define XENPMU_CTXT_PAD_SZ  128
+        uint8_t pad[XENPMU_CTXT_PAD_SZ];
+    } c;
+};
+typedef struct xen_arch_pmu xen_arch_pmu_t;
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
new file mode 100644
index 0000000..d237c25
--- /dev/null
+++ b/xen/include/public/pmu.h
@@ -0,0 +1,38 @@
+#ifndef __XEN_PUBLIC_PMU_H__
+#define __XEN_PUBLIC_PMU_H__
+
+#include "xen.h"
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/pmu.h"
+#elif defined (__arm__) || defined (__aarch64__)
+#include "arch-arm.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    1
+
+
+/* Shared between hypervisor and PV domain */
+struct xen_pmu_data {
+    uint32_t domain_id;
+    uint32_t vcpu_id;
+    uint32_t pcpu_id;
+    uint32_t pmu_flags;
+
+    xen_arch_pmu_t pmu;
+};
+typedef struct xen_pmu_data xen_pmu_data_t;
+
+#endif /* __XEN_PUBLIC_PMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-11 10:07   ` Jan Beulich
  2014-06-06 17:40 ` [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 5 ++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index abf55fb..07b7ba8 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -403,6 +403,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    struct vpmu_struct vpmu;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index db37232..be4bed0 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -150,9 +150,6 @@ struct hvm_vcpu {
     u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
 
-    /* VPMU */
-    struct vpmu_struct  vpmu;
-
     union {
         struct arch_vmx_struct vmx;
         struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 83b5461..001e20b 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,8 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (9 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 19:58   ` Andrew Cooper
  2014-06-11 10:14   ` Jan Beulich
  2014-06-06 17:40 ` [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* PMU off
* PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0
  profiles itself and the hypervisor.
* dom0-only PMU: dom0 collects samples for both itself and guests.

For feature flags only Intel's BTS is currently supported.

Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/domain.c              |   4 +-
 xen/arch/x86/hvm/svm/vpmu.c        |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c  |  10 +--
 xen/arch/x86/hvm/vpmu.c            | 121 ++++++++++++++++++++++++++++++++++---
 xen/arch/x86/x86_64/compat/entry.S |   4 ++
 xen/arch/x86/x86_64/entry.S        |   4 ++
 xen/include/asm-x86/hvm/vpmu.h     |  14 ++---
 xen/include/public/pmu.h           |  48 +++++++++++++++
 xen/include/public/xen.h           |   1 +
 xen/include/xen/hypercall.h        |   4 ++
 10 files changed, 188 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e896210..a3ac1e2 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
     if ( is_hvm_vcpu(prev) )
     {
-        if (prev != next)
+        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_ON) )
             vpmu_save(prev);
 
         if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
@@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if (is_hvm_vcpu(next) && (prev != next) )
+    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_ON) )
         /* Must be done with interrupts enabled */
         vpmu_load(next);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index b5ca577..afeb32c 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -473,14 +473,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
     .arch_vpmu_dump = amd_vpmu_dump
 };
 
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int svm_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
     int ret = 0;
 
     /* vpmu enabled? */
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index b5bc8ea..336a425 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -705,13 +705,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+static int core2_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     u64 msr_content;
     struct cpuinfo_x86 *c = &current_cpu_data;
 
-    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
         goto func_out;
     /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
     if ( cpu_has(c, X86_FEATURE_DS) )
@@ -823,7 +823,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
     .do_cpuid = core2_no_vpmu_do_cpuid,
 };
 
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int vmx_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
@@ -831,7 +831,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     int ret = 0;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     if ( family == 6 )
@@ -874,7 +874,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         /* future: */
         case 0x3d:
         case 0x4e:
-            ret = core2_vpmu_initialise(v, vpmu_flags);
+            ret = core2_vpmu_initialise(v);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;
             return ret;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 56b48be..4336e0a 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,7 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -38,7 +39,8 @@
  * "vpmu=off" : vpmu generally disabled
  * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
  */
-static unsigned int __read_mostly opt_vpmu_enabled;
+uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+uint64_t __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
@@ -52,7 +54,7 @@ static void __init parse_vpmu_param(char *s)
         break;
     default:
         if ( !strcmp(s, "bts") )
-            opt_vpmu_enabled |= VPMU_BOOT_BTS;
+            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
         else if ( *s )
         {
             printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
@@ -60,7 +62,7 @@ static void __init parse_vpmu_param(char *s)
         }
         /* fall through */
     case 1:
-        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
+        vpmu_mode = XENPMU_MODE_ON;
         break;
     }
 }
@@ -77,6 +79,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( !(vpmu_mode & XENPMU_MODE_ON) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
         return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
     return 0;
@@ -86,6 +91,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( !(vpmu_mode & XENPMU_MODE_ON) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
         return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
     return 0;
@@ -241,19 +249,19 @@ void vpmu_initialise(struct vcpu *v)
     switch ( vendor )
     {
     case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( svm_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( vmx_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     default:
         printk("VPMU: Initialization failed. "
                "Unknown CPU vendor %d\n", vendor);
-        opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
         break;
     }
 }
@@ -275,3 +283,100 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+/* Unload VPMU contexts */
+static void vpmu_unload_all(void)
+{
+    struct domain *d;
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+
+    for_each_domain( d )
+    {
+        for_each_vcpu ( d, v )
+        {
+            if ( v != current )
+                vcpu_pause(v);
+            vpmu = vcpu_vpmu(v);
+
+            if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            {
+                if ( v != current )
+                    vcpu_unpause(v);
+                continue;
+            }
+
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                             vpmu_save_force, (void *)v, 1);
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+            if ( v != current )
+                vcpu_unpause(v);
+        }
+    }
+}
+
+
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret = -EINVAL;
+    xen_pmu_params_t pmu_params;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.d.val & ~XENPMU_MODE_ON )
+            return -EINVAL;
+
+        vpmu_mode = pmu_params.d.val;
+        if ( vpmu_mode == XENPMU_MODE_OFF )
+            /*
+             * After this VPMU context will never be loaded during context
+             * switch. We also prevent PMU MSR accesses (which can load
+             * context) when VPMU is disabled.
+             */
+            vpmu_unload_all();
+
+        ret = 0;
+        break;
+
+    case XENPMU_mode_get:
+        pmu_params.d.val = vpmu_mode;
+        pmu_params.v.version.maj = XENPMU_VER_MAJ;
+        pmu_params.v.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_feature_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_features = pmu_params.d.val;
+
+        ret = 0;
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.d.val = vpmu_mode;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+     }
+
+    return ret;
+}
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index ac594c9..8587c46 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall           /* reserved for XenClient */
+        .quad do_xenpmu_op              /* 40 */
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -465,6 +467,8 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 0 /* reserved for XenClient   */
+        .byte 2 /* do_xenpmu_op             */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 4796e65..cd4fe47 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -769,6 +769,8 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall       /* reserved for XenClient */
+        .quad do_xenpmu_op          /* 40 */
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -817,6 +819,8 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 0 /* reserved for XenClient */
+        .byte 2 /* do_xenpmu_op         */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 001e20b..12a45c4 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -24,13 +24,6 @@
 
 #include <public/pmu.h>
 
-/*
- * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
- * See arch/x86/hvm/vpmu.c.
- */
-#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
-#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
-
 #define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
 #define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
@@ -58,8 +51,8 @@ struct arch_vpmu_ops {
     void (*arch_vpmu_dump)(const struct vcpu *);
 };
 
-int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
-int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
+int vmx_vpmu_initialise(struct vcpu *);
+int svm_vpmu_initialise(struct vcpu *);
 
 struct vpmu_struct {
     u32 flags;
@@ -99,5 +92,8 @@ void vpmu_dump(struct vcpu *v);
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
+extern uint64_t vpmu_mode;
+extern uint64_t vpmu_features;
+
 #endif /* __ASM_X86_HVM_VPMU_H_*/
 
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index d237c25..d41049f 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -13,6 +13,54 @@
 #define XENPMU_VER_MAJ    0
 #define XENPMU_VER_MIN    1
 
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_feature_get     2
+#define XENPMU_feature_set     3
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xen_pmu_params {
+    /* IN/OUT parameters */
+    union {
+        struct version {
+            uint8_t maj;
+            uint8_t min;
+        } version;
+        uint64_t pad;
+    } v;
+    union {
+        uint64_t val;
+        XEN_GUEST_HANDLE(void) valp;
+    } d;
+
+    /* IN parameters */
+    uint64_t vcpu;
+};
+typedef struct xen_pmu_params xen_pmu_params_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_ON:    Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ */
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_ON            (1<<0)
+
+/*
+ * PMU features:
+ * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FEATURE_INTEL_BTS  1
 
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index a6a2092..0766790 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index a9e5229..cf34547 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -14,6 +14,7 @@
 #include <public/event_channel.h>
 #include <public/tmem.h>
 #include <public/version.h>
+#include <public/pmu.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -139,6 +140,9 @@ do_tmem_op(
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+extern long
+do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
+
 #ifdef CONFIG_COMPAT
 
 extern int
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (10 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 20:13   ` Andrew Cooper
  2014-06-11 10:20   ` Jan Beulich
  2014-06-06 17:40 ` [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
                   ` (6 subsequent siblings)
  18 siblings, 2 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/hvm.c            |  3 +-
 xen/arch/x86/hvm/svm/svm.c        |  8 +++-
 xen/arch/x86/hvm/svm/vpmu.c       | 37 +++++++++++-------
 xen/arch/x86/hvm/vmx/vmx.c        |  9 ++++-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 67 ++++++++++++++++++++++----------
 xen/arch/x86/hvm/vpmu.c           | 82 ++++++++++++++++++++++++++++++++++++++-
 xen/common/event_channel.c        |  1 +
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 xen/include/public/pmu.h          |  2 +
 xen/include/public/xen.h          |  1 +
 10 files changed, 169 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d2190be..fab54ff 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4031,7 +4031,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
     [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(domctl)
+    HYPERCALL(domctl),
+    HYPERCALL(xenpmu_op)
 };
 
 int hvm_do_hypercall(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 870e4ee..7358626 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1150,7 +1150,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     svm_guest_osvw_init(v);
 
@@ -1159,7 +1161,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
 
 static void svm_vcpu_destroy(struct vcpu *v)
 {
-    vpmu_destroy(v);
+    /* PVH's VPMU destroyed via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_destroy(v);
     svm_destroy_vmcb(v);
     passive_domain_destroy(v);
 }
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index afeb32c..96833ea 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -382,16 +382,21 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
-			 sizeof(uint64_t) * AMD_MAX_COUNTERS + 
-			 sizeof(uint64_t) * AMD_MAX_COUNTERS);
-    if ( !ctxt )
+    if ( is_hvm_domain(v->domain) )
     {
-        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-            " PMU feature is unavailable on domain %d vcpu %d.\n",
-            v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
+        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS +
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                     " PMU feature is unavailable on domain %d vcpu %d.\n",
+                     v->vcpu_id, v->domain->domain_id);
+            return -ENOMEM;
+        }
     }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
 
     ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
     ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
@@ -409,17 +414,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
-        amd_vpmu_unset_msr_bitmap(v);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( is_msr_bitmap_on(vpmu) )
+            amd_vpmu_unset_msr_bitmap(v);
 
-    xfree(vpmu->context);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
 
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 6f55c9e..5313112 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -116,7 +116,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     vmx_install_vlapic_mapping(v);
 
@@ -130,7 +132,10 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 static void vmx_vcpu_destroy(struct vcpu *v)
 {
     vmx_destroy_vmcs(v);
-    vpmu_destroy(v);
+
+    /* PVH's VPMU is destroyed via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_destroy(v);
     passive_domain_destroy(v);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 336a425..f2f4851 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -360,24 +360,36 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
     uint64_t *p = NULL;
 
-    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-        return 0;
-
-    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
+    p = xzalloc_bytes(sizeof(uint64_t));
+    if ( !p )
         goto out_err;
 
-    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
-        goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
+            goto out_err_hvm;
+        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
+            goto out_err_hvm;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    }
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
-                                   sizeof(uint64_t) * fixed_pmc_cnt +
-                                   sizeof(struct xen_pmu_cntr_pair) *
-                                   arch_pmc_cnt);
-    p = xzalloc_bytes(sizeof(uint64_t));
-    if ( !core2_vpmu_cxt || !p )
-        goto out_err;
+    if ( is_hvm_domain(v->domain) )
+    {
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) *
+                                       arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err_hvm;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+    }
 
     core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
     core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
@@ -390,7 +402,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     return 1;
 
-out_err:
+out_err_hvm:
     vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
     vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
     release_pmu_ownship(PMU_OWNER_HVM);
@@ -398,6 +410,7 @@ out_err:
     xfree(core2_vpmu_cxt);
     xfree(p);
 
+out_err:
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
            v->vcpu_id, v->domain->domain_id);
 
@@ -755,6 +768,10 @@ func_out:
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
     check_pmc_quirk();
 
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
+        return 1;
+
     return 0;
 }
 
@@ -765,12 +782,20 @@ static void core2_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    xfree(vpmu->context);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
     xfree(vpmu->priv_context);
-    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-    release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 struct arch_vpmu_ops core2_vpmu_ops = {
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 4336e0a..bceffa6 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,10 +21,14 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
 #include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
+#include <asm/p2m.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
@@ -271,7 +275,13 @@ void vpmu_destroy(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, v, 1);
+
         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
 }
 
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
@@ -316,6 +326,64 @@ static void vpmu_unload_all(void)
     }
 }
 
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct page_info *page;
+    uint64_t gfn = params->d.val;
+
+    if ( !params || params->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        return -EINVAL;
+    }
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( !params || params->vcpu >= d->max_vcpus )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if (v != current)
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page_and_type(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if ( v != current )
+        vcpu_unpause(v);
+}
 
 long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 {
@@ -376,7 +444,19 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         ret = 0;
         break;
-     }
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+    }
 
     return ret;
 }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 6853842..7f9b8c9 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -108,6 +108,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_XENPMU:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 12a45c4..24fb39b 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -61,6 +61,7 @@ struct vpmu_struct {
     void *context;      /* May be shared with PV guest */
     void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
+    xen_pmu_data_t *xenpmu_data;
 };
 
 /* VPMU states */
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index d41049f..6da3596 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -25,6 +25,8 @@
 #define XENPMU_mode_set        1
 #define XENPMU_feature_get     2
 #define XENPMU_feature_set     3
+#define XENPMU_init            4
+#define XENPMU_finish          5
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 0766790..e4d0b79 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
+#define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on PV guests
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (11 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 20:26   ` Andrew Cooper
  2014-06-06 17:40 ` [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Intercept accesses to PMU MSRs and process them in VPMU module.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/domain.c             |  3 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 69 ++++++++++++++++++++++++++++++---------
 xen/arch/x86/hvm/vpmu.c           |  7 ++++
 xen/arch/x86/traps.c              | 48 +++++++++++++++++++++++++--
 xen/include/public/pmu.h          |  1 +
 5 files changed, 108 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a3ac1e2..bb759dd 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2012,8 +2012,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
     paging_dump_vcpu_info(v);
 
-    if ( is_hvm_vcpu(v) )
-        vpmu_dump(v);
+    vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index f2f4851..3b56d3d 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/apic.h>
+#include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
 #include <asm/hvm/support.h>
@@ -297,12 +298,18 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
         rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( !has_hvm_container_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+    if ( !has_hvm_container_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
     if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
@@ -340,6 +347,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
     wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
     wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( !has_hvm_container_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -440,9 +454,16 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
     return 1;
 }
 
+static void inject_trap(struct vcpu *v, unsigned int trapno)
+{
+    if ( has_hvm_container_domain(v->domain) )
+        hvm_inject_hw_exception(trapno, 0);
+    else
+        send_guest_trap(v->domain, v->vcpu_id, trapno);
+}
+
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
-    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -466,7 +487,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
                     return 1;
                 gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                inject_trap(v, TRAP_gp_fault);
                 return 0;
             }
         }
@@ -479,11 +500,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
                  "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        inject_trap(v, TRAP_gp_fault);
         return 1;
     case MSR_IA32_PEBS_ENABLE:
         if ( msr_content & 1 )
@@ -499,7 +521,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 gdprintk(XENLOG_WARNING,
                          "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
                          msr_content);
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                inject_trap(v, TRAP_gp_fault);
                 return 1;
             }
             core2_vpmu_cxt->ds_area = msr_content;
@@ -508,10 +530,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 1;
     case MSR_CORE_PERF_GLOBAL_CTRL:
-        global_ctrl = msr_content;
+        core2_vpmu_cxt->global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        if ( has_hvm_container_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                               &core2_vpmu_cxt->global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
         *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
@@ -533,7 +559,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
                 vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                                   &core2_vpmu_cxt->global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
                 *enabled_cntrs |= 1ULL << tmp;
@@ -544,7 +574,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         }
     }
 
-    if ((global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0)  )
+    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -557,7 +588,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         {
         case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
             mask = ~((1ull << 32) - 1);
-            if (msr_content & mask)
+            if ( msr_content & mask )
                 inject_gp = 1;
             break;
         case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
@@ -565,22 +596,27 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
             mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
-            if (msr_content & mask)
+            if ( msr_content & mask )
                 inject_gp = 1;
             break;
         case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
             mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if (msr_content & mask)
+            if ( msr_content & mask )
                 inject_gp = 1;
             break;
         }
-        if (inject_gp)
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        if ( inject_gp )
+            inject_trap(v, TRAP_gp_fault);
         else
             wrmsrl(msr, msr_content);
     }
     else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    {
+       if ( has_hvm_container_domain(v->domain) )
+           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+       else
+           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
 
     return 1;
 }
@@ -604,7 +640,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
         default:
             rdmsrl(msr, *msr_content);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index bceffa6..ca0534b 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -456,6 +456,13 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         pvpmu_finish(current->domain, &pmu_params);
         break;
+
+    case XENPMU_lvtpc_set:
+        if ( current->arch.vpmu.xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 136821f..57d06c9 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,6 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
+#include <asm/hvm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
@@ -877,8 +878,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
         __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
         break;
 
+    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
+        break;
+
     case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
     case 0x0000000b: /* Extended Topology Enumeration */
     case 0x8000000a: /* SVM revision and features */
     case 0x8000001b: /* Instruction Based Sampling */
@@ -894,6 +897,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
     }
 
  out:
+    /* VPMU may decide to modify some of the leaves */
+    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
+
     regs->eax = a;
     regs->ebx = b;
     regs->ecx = c;
@@ -1921,6 +1927,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
     char io_emul_stub[32];
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val, msr_content;
+    bool_t vpmu_msr;
 
     if ( !read_descriptor(regs->cs, v, regs,
                           &code_base, &code_limit, &ar,
@@ -2411,6 +2418,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         uint32_t eax = regs->eax;
         uint32_t edx = regs->edx;
         msr_content = ((uint64_t)edx << 32) | eax;
+        vpmu_msr = 0;
         switch ( (u32)regs->ecx )
         {
         case MSR_FS_BASE:
@@ -2547,7 +2555,19 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
                 wrmsrl(regs->_ecx, msr_content);
             break;
-
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+                vpmu_msr = 1;
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+            {
+                if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
+                    goto invalid;
+                break;
+            }
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
                 break;
@@ -2579,6 +2599,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         break;
 
     case 0x32: /* RDMSR */
+        vpmu_msr = 0;
+
         switch ( (u32)regs->ecx )
         {
         case MSR_FS_BASE:
@@ -2649,7 +2671,27 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                             [regs->_ecx - MSR_AMD64_DR1_ADDRESS_MASK + 1];
             regs->edx = 0;
             break;
-
+        case MSR_IA32_PERF_CAPABILITIES:
+            /* No extra capabilities are supported */
+            regs->eax = regs->edx = 0;
+            break;
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+                vpmu_msr = 1;
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( vpmu_msr || (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
+            {
+                if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
+                {
+                    regs->eax = (uint32_t)msr_content;
+                    regs->edx = (uint32_t)(msr_content >> 32);
+                    break;
+                }
+                goto rdmsr_normal;
+            }
         default:
             if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
             {
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 6da3596..7341de9 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -27,6 +27,7 @@
 #define XENPMU_feature_set     3
 #define XENPMU_init            4
 #define XENPMU_finish          5
+#define XENPMU_lvtpc_set       6
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for PV guests
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (12 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 20:40   ` Andrew Cooper
  2014-06-06 17:40 ` [PATCH v7 15/19] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Add support for handling PMU interrupts for PV guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vpmu.c  | 134 +++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/public/pmu.h |   7 +++
 2 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index ca0534b..ce842e4 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -76,7 +76,12 @@ void vpmu_lvtpc_update(uint32_t val)
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+    if ( is_hvm_domain(current->domain) ||
+         !(current->arch.vpmu.xenpmu_data &&
+           (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED)) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
@@ -87,7 +92,23 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         return 0;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+
+        /*
+         * We may have received a PMU interrupt during WRMSR handling
+         * and since do_wrmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_domain(current->domain) &&
+             (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
@@ -99,14 +120,109 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
         return 0;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+        if ( !is_hvm_domain(current->domain) &&
+            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
+static struct vcpu *choose_dom0_vcpu(void)
+{
+    struct vcpu *v;
+    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
+
+    v = hardware_domain->vcpu[idx];
+
+    /*
+     * If index is not populated search downwards the vcpu array until
+     * a valid vcpu can be found
+     */
+    while ( !v && idx-- )
+        v = hardware_domain->vcpu[idx];
+
+    return v;
+}
+
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct vpmu_struct *vpmu;
+
+    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
+    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
+    {
+        v = choose_dom0_vcpu();
+        if ( !v )
+            return 0;
+    }
+
+    vpmu = vcpu_vpmu(v);
+    if ( !is_hvm_domain(v->domain) )
+    {
+        /* PV(H) guest or dom0 is doing system profiling */
+        const struct cpu_user_regs *gregs;
+        int err;
+
+        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
+            return 1;
+
+        if ( is_pvh_domain(current->domain) &&
+             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        /* Store appropriate registers in xenpmu_data */
+        if ( is_pv_32bit_domain(current->domain) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+             * and therefore we treat it the same way as a non-priviledged
+             * PV 32-bit domain.
+             */
+            struct compat_cpu_user_regs *cmp;
+
+            gregs = guest_cpu_user_regs();
+
+            cmp = (void *)&v->arch.vpmu.xenpmu_data->pmu.r.regs;
+            XLAT_cpu_user_regs(cmp, gregs);
+        }
+        else if ( !is_hardware_domain(current->domain) &&
+                 !is_idle_vcpu(current) )
+        {
+            /* PV(H) guest */
+            gregs = guest_cpu_user_regs();
+            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                   gregs, sizeof(struct cpu_user_regs));
+        }
+        else
+            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                   regs, sizeof(struct cpu_user_regs));
+
+        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
+        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
+        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
+
+        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+
+        send_guest_vcpu_virq(v, VIRQ_XENPMU);
+
+        return 1;
+    }
 
     if ( vpmu->arch_vpmu_ops )
     {
@@ -225,7 +341,9 @@ void vpmu_load(struct vcpu *v)
     local_irq_enable();
 
     /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (!is_hvm_domain(v->domain) &&
+          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
         return;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
@@ -463,6 +581,12 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
         ret = 0;
         break;
+    case XENPMU_flush:
+        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        vpmu_load(current);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 7341de9..67ce5de 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -28,6 +28,7 @@
 #define XENPMU_init            4
 #define XENPMU_finish          5
 #define XENPMU_lvtpc_set       6
+#define XENPMU_flush           7 /* Write cached MSR values to HW     */
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
@@ -65,6 +66,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  */
 #define XENPMU_FEATURE_INTEL_BTS  1
 
+/*
+ * PMU MSRs are cached in the context so the PV guest doesn't need to trap to
+ * the hypervisor
+ */
+#define PMU_CACHED 1
+
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
     uint32_t domain_id;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 15/19] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (13 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 16/19] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

The two routines share most of their logic.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/svm.c     |  9 ++++++---
 xen/arch/x86/hvm/vmx/vmx.c     | 11 +++++++----
 xen/arch/x86/hvm/vpmu.c        | 42 +++++++++++++++---------------------------
 xen/arch/x86/traps.c           |  4 ++--
 xen/include/asm-x86/hvm/vpmu.h |  6 ++++--
 5 files changed, 34 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 7358626..46161e7 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1634,7 +1634,7 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_rdmsr(msr, msr_content);
+        vpmu_do_msr(msr, msr_content, VPMU_MSR_READ);
         break;
 
     case MSR_AMD64_DR0_ADDRESS_MASK:
@@ -1785,9 +1785,12 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_wrmsr(msr, msr_content);
-        break;
+    {
+        uint64_t msr_val = msr_content;
 
+        vpmu_do_msr(msr, &msr_val, VPMU_MSR_WRITE);
+        break;
+    }
     case MSR_IA32_MCx_MISC(4): /* Threshold register */
     case MSR_F10_MC4_MISC1 ... MSR_F10_MC4_MISC3:
         /*
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 5313112..b620b98 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2070,11 +2070,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
                        MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
         /* Perhaps vpmu will change some bits. */
-        if ( vpmu_do_rdmsr(msr, msr_content) )
+        if ( vpmu_do_msr(msr, msr_content, VPMU_MSR_READ) )
             goto done;
         break;
     default:
-        if ( vpmu_do_rdmsr(msr, msr_content) )
+        if ( vpmu_do_msr(msr, msr_content, VPMU_MSR_READ) )
             break;
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
@@ -2225,6 +2225,7 @@ void vmx_vlapic_msr_changed(struct vcpu *v)
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 {
     struct vcpu *v = current;
+    uint64_t msr_val;
 
     HVM_DBG_LOG(DBG_LEVEL_1, "ecx=%#x, msr_value=%#"PRIx64, msr, msr_content);
 
@@ -2248,7 +2249,8 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( msr_content & ~supported )
         {
             /* Perhaps some other bits are supported in vpmu. */
-            if ( !vpmu_do_wrmsr(msr, msr_content) )
+            msr_val = msr_content;
+            if ( !vpmu_do_msr(msr, &msr_val, VPMU_MSR_WRITE) )
                 break;
         }
         if ( msr_content & IA32_DEBUGCTLMSR_LBR )
@@ -2279,7 +2281,8 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
             goto gp_fault;
         break;
     default:
-        if ( vpmu_do_wrmsr(msr, msr_content) )
+        msr_val = msr_content;
+        if ( vpmu_do_msr(msr, &msr_val, VPMU_MSR_WRITE) )
             return X86EMUL_OKAY;
         if ( passive_domain_do_wrmsr(msr, msr_content) )
             return X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index ce842e4..4452ed6 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -84,20 +84,29 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
     if ( !(vpmu_mode & XENPMU_MODE_ON) )
         return 0;
 
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
+    ASSERT((rw == VPMU_MSR_READ) || (rw == VPMU_MSR_WRITE));
+
+    if ( vpmu->arch_vpmu_ops )
     {
-        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+        int ret;
+
+        if ( (rw == VPMU_MSR_READ) && vpmu->arch_vpmu_ops->do_rdmsr )
+            ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+        else if ( vpmu->arch_vpmu_ops->do_wrmsr )
+            ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, *msr_content);
+        else
+            return 0;
 
         /*
-         * We may have received a PMU interrupt during WRMSR handling
-         * and since do_wrmsr may load VPMU context we should save
+         * We may have received a PMU interrupt while handling MSR access
+         * and since do_wr/rdmsr may load VPMU context we should save
          * (and unload) it again.
          */
         if ( !is_hvm_domain(current->domain) &&
@@ -107,31 +116,10 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             vpmu->arch_vpmu_ops->arch_vpmu_save(current);
             vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
         }
-        return ret;
-    }
-    return 0;
-}
-
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
-    if ( !(vpmu_mode & XENPMU_MODE_ON) )
-        return 0;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
-
-        if ( !is_hvm_domain(current->domain) &&
-            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
         return ret;
     }
+
     return 0;
 }
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 57d06c9..85b5b15 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2564,7 +2564,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
             {
-                if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
+                if ( !vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_WRITE) )
                     goto invalid;
                 break;
             }
@@ -2684,7 +2684,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
             {
-                if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
+                if ( vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_READ) )
                 {
                     regs->eax = (uint32_t)msr_content;
                     regs->edx = (uint32_t)(msr_content >> 32);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 24fb39b..0cfde19 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -78,9 +78,11 @@ struct vpmu_struct {
 #define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
 #define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
 
+#define VPMU_MSR_READ  0
+#define VPMU_MSR_WRITE 1
+
 void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                        unsigned int *ecx, unsigned int *edx);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 16/19] x86/VPMU: Add privileged PMU mode
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (14 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 15/19] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 17/19] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Add support for privileged PMU mode which allows privileged domain (dom0)
profile both itself (and the hypervisor) and the guests. While this mode is on
profiling in guests is disabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vpmu.c  | 99 +++++++++++++++++++++++++++++++++++-------------
 xen/arch/x86/traps.c     | 10 +++++
 xen/include/public/pmu.h |  3 ++
 3 files changed, 86 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 4452ed6..8f98182 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -88,7 +88,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
-    if ( !(vpmu_mode & XENPMU_MODE_ON) )
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_PRIV) &&
+          !is_hardware_domain(current->domain)) )
         return 0;
 
     ASSERT((rw == VPMU_MSR_READ) || (rw == VPMU_MSR_WRITE));
@@ -145,8 +147,12 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     struct vpmu_struct *vpmu;
 
-    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
-    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
+    /*
+     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+     * in XENPMU_MODE_PRIV, for everyone.
+     */
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+         (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
     {
         v = choose_dom0_vcpu();
         if ( !v )
@@ -154,7 +160,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
     }
 
     vpmu = vcpu_vpmu(v);
-    if ( !is_hvm_domain(v->domain) )
+    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
     {
         /* PV(H) guest or dom0 is doing system profiling */
         const struct cpu_user_regs *gregs;
@@ -163,7 +169,8 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
             return 1;
 
-        if ( is_pvh_domain(current->domain) &&
+        if ( !(vpmu_mode & XENPMU_MODE_PRIV) &&
+             is_pvh_domain(current->domain) &&
              !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
@@ -172,32 +179,67 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
         vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
 
-        /* Store appropriate registers in xenpmu_data */
-        if ( is_pv_32bit_domain(current->domain) )
+        if ( !is_hvm_domain(current->domain) )
         {
-            /*
-             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
-             * and therefore we treat it the same way as a non-priviledged
-             * PV 32-bit domain.
-             */
-            struct compat_cpu_user_regs *cmp;
+            /* Store appropriate registers in xenpmu_data */
+            if ( is_pv_32bit_domain(current->domain) )
+            {
+                /*
+                 * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+                 * and therefore we treat it the same way as a non-priviledged
+                 * PV 32-bit domain.
+                 */
+                struct compat_cpu_user_regs *cmp;
 
-            gregs = guest_cpu_user_regs();
+                gregs = guest_cpu_user_regs();
 
-            cmp = (void *)&v->arch.vpmu.xenpmu_data->pmu.r.regs;
-            XLAT_cpu_user_regs(cmp, gregs);
+                cmp = (void *)&v->arch.vpmu.xenpmu_data->pmu.r.regs;
+                XLAT_cpu_user_regs(cmp, gregs);
+
+                /* Adjust RPL for kernel mode */
+                if ((cmp->cs & 3) == 1)
+                    cmp->cs &= ~3;
+            }
+            else
+            {
+                if ( !is_hardware_domain(current->domain) &&
+                     !is_idle_vcpu(current) )
+                {
+                    /* 64-bit unprivileged PV(H) guest */
+                    gregs = guest_cpu_user_regs();
+                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                           gregs, sizeof(struct cpu_user_regs));
+                }
+                else
+                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                           regs, sizeof(struct cpu_user_regs));
+
+                if ( !is_pvh_domain(current->domain) )
+                {
+                    if ( current->arch.flags & TF_kernel_mode )
+                        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs &= ~3;
+                }
+                else
+                {
+                    struct segment_register seg_cs;
+
+                    hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+                    v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+                }
+            }
         }
-        else if ( !is_hardware_domain(current->domain) &&
-                 !is_idle_vcpu(current) )
+        else
         {
-            /* PV(H) guest */
+            /* HVM guest */
+            struct segment_register seg_cs;
+
             gregs = guest_cpu_user_regs();
             memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
                    gregs, sizeof(struct cpu_user_regs));
+
+            hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+            v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
         }
-        else
-            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                   regs, sizeof(struct cpu_user_regs));
 
         v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
         v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
@@ -505,15 +547,20 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         if ( copy_from_guest(&pmu_params, arg, 1) )
             return -EFAULT;
 
-        if ( pmu_params.d.val & ~XENPMU_MODE_ON )
+        if ( (pmu_params.d.val & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
+             ((pmu_params.d.val & XENPMU_MODE_ON) &&
+              (pmu_params.d.val & XENPMU_MODE_PRIV)) )
             return -EINVAL;
 
         vpmu_mode = pmu_params.d.val;
-        if ( vpmu_mode == XENPMU_MODE_OFF )
+
+        if ( (vpmu_mode == XENPMU_MODE_OFF) || (vpmu_mode & XENPMU_MODE_PRIV) )
             /*
              * After this VPMU context will never be loaded during context
-             * switch. We also prevent PMU MSR accesses (which can load
-             * context) when VPMU is disabled.
+             * switch. Because PMU MSR accesses load VPMU context we don't
+             * allow them when VPMU is off and, for non-provileged domains,
+             * when we are in privileged mode. (We do want these accesses to
+             * load VPMU context for control domain in this mode)
              */
             vpmu_unload_all();
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 85b5b15..8206662 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2564,6 +2564,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
             if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
             {
+                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
+                     !is_hardware_domain(v->domain) )
+                    break;
+
                 if ( !vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_WRITE) )
                     goto invalid;
                 break;
@@ -2690,6 +2694,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                     regs->edx = (uint32_t)(msr_content >> 32);
                     break;
                 }
+                else if ( !is_hardware_domain(v->domain) )
+                {
+                    /* Don't leak PMU MSRs to unprivileged domains */
+                    regs->eax = regs->edx = 0;
+                    break;
+                }
                 goto rdmsr_normal;
             }
         default:
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 67ce5de..c170a1e 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -56,9 +56,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  * - XENPMU_MODE_OFF:   No PMU virtualization
  * - XENPMU_MODE_ON:    Guests can profile themselves, dom0 profiles
  *                      itself and Xen
+ * - XENPMU_MODE_PRIV:  Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
  */
 #define XENPMU_MODE_OFF           0
 #define XENPMU_MODE_ON            (1<<0)
+#define XENPMU_MODE_PRIV          (1<<1)
 
 /*
  * PMU features:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 17/19] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (15 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 16/19] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 18/19] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 19/19] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
  18 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Save VPMU state during context switch for both HVM and PV guests unless we
are in PMU privileged mode (i.e. dom0 is doing all profiling).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/domain.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bb759dd..e9e5335 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1478,16 +1478,14 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     }
 
     if ( prev != next )
-        _update_runstate_area(prev);
-
-    if ( is_hvm_vcpu(prev) )
     {
-        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_ON) )
+        _update_runstate_area(prev);
+        if ( vpmu_mode & XENPMU_MODE_ON )
             vpmu_save(prev);
+    }
 
-        if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+    if ( is_hvm_vcpu(prev) &&  !list_empty(&prev->arch.hvm_vcpu.tm_list) )
             pt_save_timer(prev);
-    }
 
     local_irq_disable();
 
@@ -1526,7 +1524,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_ON) )
+    if ( (prev != next) && (vpmu_mode & XENPMU_MODE_ON) )
         /* Must be done with interrupts enabled */
         vpmu_load(next);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 18/19] x86/VPMU: NMI-based VPMU support
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (16 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 17/19] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 17:40 ` [PATCH v7 19/19] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
  18 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Add support for using NMIs as PMU interrupts.

Most of processing is still performed by vpmu_do_interrupt(). However, since
certain operations are not NMI-safe we defer them to a softint that vpmu_do_interrupt()
will schedule:
* For PV guests that would be send_guest_vcpu_virq()
* For HVM guests it's VLAPIC accesses and hvm_get_segment_register() (the later
can be called in privileged profiling mode when the interrupted guest is an HVM one).

With send_guest_vcpu_virq() and hvm_get_segment_register() for PV(H) and vlapic
accesses for HVM moved to sofint, the only routines/macros that vpmu_do_interrupt()
calls in NMI mode are:
* memcpy()
* querying domain type (is_XX_domain())
* guest_cpu_user_regs()
* XLAT_cpu_user_regs()
* raise_softirq()
* vcpu_vpmu()
* vpmu_ops->arch_vpmu_save()
* vpmu_ops->do_interrupt() (in the future for PVH support)

The latter two only access PMU MSRs with {rd,wr}msrl() (not the _safe versions
which would not be NMI-safe).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevint.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |   1 +
 xen/arch/x86/hvm/vmx/vpmu_core2.c |   1 +
 xen/arch/x86/hvm/vpmu.c           | 188 +++++++++++++++++++++++++++++++-------
 xen/include/xen/softirq.h         |   1 +
 4 files changed, 158 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 96833ea..06dfc01 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -220,6 +220,7 @@ static inline void context_save(struct vcpu *v)
         rdmsrl(counters[i], counter_regs[i]);
 }
 
+/* Must be NMI-safe */
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 3b56d3d..ac4998b 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -303,6 +303,7 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
+/* Must be NMI-safe */
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 8f98182..c6280dc 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -36,6 +36,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 #include <public/pmu.h>
 
 /*
@@ -48,34 +49,60 @@ uint64_t __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
+static void pmu_softnmi(void);
+
 static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
+
+static uint32_t __read_mostly vpmu_interrupt_type = PMU_APIC_VECTOR;
 
 static void __init parse_vpmu_param(char *s)
 {
-    switch ( parse_bool(s) )
-    {
-    case 0:
-        break;
-    default:
-        if ( !strcmp(s, "bts") )
-            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
-        else if ( *s )
+    char *ss;
+
+    vpmu_mode = XENPMU_MODE_ON;
+    if (*s == '\0')
+        return;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch  ( parse_bool(s) )
         {
-            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+        case 0:
+            vpmu_mode = XENPMU_MODE_OFF;
+            return;
+        case -1:
+            if ( !strcmp(s, "nmi") )
+                vpmu_interrupt_type = APIC_DM_NMI;
+            else if ( !strcmp(s, "bts") )
+                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
+            else if ( !strcmp(s, "priv") )
+            {
+                vpmu_mode &= ~XENPMU_MODE_ON;
+                vpmu_mode |= XENPMU_MODE_PRIV;
+            }
+            else
+            {
+                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+                vpmu_mode = XENPMU_MODE_OFF;
+                return;
+            }
+        default:
             break;
         }
-        /* fall through */
-    case 1:
-        vpmu_mode = XENPMU_MODE_ON;
-        break;
-    }
+
+        s = ss + 1;
+    } while ( ss );
 }
 
 void vpmu_lvtpc_update(uint32_t val)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | (val & APIC_LVT_MASKED);
 
     /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
     if ( is_hvm_domain(current->domain) ||
@@ -84,6 +111,24 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
+static void vpmu_send_interrupt(struct vcpu *v)
+{
+    struct vlapic *vlapic;
+    u32 vlapic_lvtpc;
+
+    ASSERT( is_hvm_vcpu(v) );
+
+    vlapic = vcpu_vlapic(v);
+    if ( !is_vlapic_lvtpc_enabled(vlapic) )
+        return;
+
+    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
+    else
+        v->nmi_pending = 1;
+}
+
 int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
@@ -142,6 +187,7 @@ static struct vcpu *choose_dom0_vcpu(void)
     return v;
 }
 
+/* This routine may be called in NMI context */
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -219,8 +265,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
                     if ( current->arch.flags & TF_kernel_mode )
                         v->arch.vpmu.xenpmu_data->pmu.r.regs.cs &= ~3;
                 }
-                else
+                else if ( !(vpmu_interrupt_type & APIC_DM_NMI) )
                 {
+                    /* Unsafe in NMI context, defer to softint later */
                     struct segment_register seg_cs;
 
                     hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
@@ -237,8 +284,12 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
                    gregs, sizeof(struct cpu_user_regs));
 
-            hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
-            v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+            /* This is unsafe in NMI context, we'll do it in softint handler */
+            if ( !(vpmu_interrupt_type & APIC_DM_NMI ) )
+            {
+                hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+                v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+            }
         }
 
         v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
@@ -249,30 +300,30 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
 
-        send_guest_vcpu_virq(v, VIRQ_XENPMU);
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            send_guest_vcpu_virq(v, VIRQ_XENPMU);
 
         return 1;
     }
 
     if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(v);
-        u32 vlapic_lvtpc;
-        unsigned char int_vec;
-
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
-        if ( !is_vlapic_lvtpc_enabled(vlapic) )
-            return 1;
-
-        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-        int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
-
-        if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
         else
-            v->nmi_pending = 1;
+            vpmu_send_interrupt(v);
+
         return 1;
     }
 
@@ -303,6 +354,8 @@ static void vpmu_save_force(void *arg)
     vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
+
+    pmu_softnmi();
 }
 
 void vpmu_save(struct vcpu *v)
@@ -320,7 +373,10 @@ void vpmu_save(struct vcpu *v)
         if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu_interrupt_type | APIC_LVT_MASKED);
+
+    /* Make sure there are no outstanding PMU NMIs */
+    pmu_softnmi();
 }
 
 void vpmu_load(struct vcpu *v)
@@ -365,6 +421,8 @@ void vpmu_load(struct vcpu *v)
         vpmu_save_force(prev);
         vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
+        pmu_softnmi();
+
         vpmu = vcpu_vpmu(v);
     }
 
@@ -474,11 +532,55 @@ static void vpmu_unload_all(void)
     }
 }
 
+/* Process the softirq set by PMU NMI handler */
+static void pmu_softnmi(void)
+{
+    struct vcpu *v, *sampled = this_cpu(sampled_vcpu);
+
+    if ( sampled == NULL )
+        return;
+    this_cpu(sampled_vcpu) = NULL;
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+            v = choose_dom0_vcpu();
+            if ( !v )
+                return;
+    }
+    else
+    {
+        if ( is_hvm_domain(sampled->domain) )
+        {
+            vpmu_send_interrupt(sampled);
+            return;
+        }
+        v = sampled;
+    }
+
+    if ( has_hvm_container_domain(sampled->domain) )
+    {
+        struct segment_register seg_cs;
+
+        hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
 static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
 {
     struct vcpu *v;
     struct page_info *page;
     uint64_t gfn = params->d.val;
+    static bool_t __read_mostly pvpmu_initted = 0;
+    static DEFINE_SPINLOCK(init_lock);
 
     if ( !params || params->vcpu >= d->max_vcpus )
         return -EINVAL;
@@ -501,6 +603,26 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
         return -EINVAL;
     }
 
+    spin_lock(&init_lock);
+
+    if ( !pvpmu_initted )
+    {
+        if ( reserve_lapic_nmi() == 0 )
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            spin_unlock(&init_lock);
+            printk("Failed to reserve PMU NMI\n");
+            put_page(page);
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
+
+        pvpmu_initted = 1;
+    }
+
+    spin_unlock(&init_lock);
+
     vpmu_initialise(v);
 
     return 0;
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index 0c0d481..5829fa4 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -8,6 +8,7 @@ enum {
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
     TASKLET_SOFTIRQ,
+    PMU_SOFTIRQ,
     NR_COMMON_SOFTIRQS
 };
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v7 19/19] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (17 preceding siblings ...)
  2014-06-06 17:40 ` [PATCH v7 18/19] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-06-06 17:40 ` Boris Ostrovsky
  2014-06-06 21:05   ` Andrew Cooper
  18 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 17:40 UTC (permalink / raw)
  To: JBeulich, kevin.tian, dietmar.hahn, suravee.suthikulpanit
  Cc: tim, boris.ostrovsky, keir, jun.nakajima, xen-devel

Since PMU is now not HVM specific we can move VPMU-related files up from
arch/x86/hvm/ directory.

Specifically:
    arch/x86/hvm/vpmu.c -> arch/x86/vpmu.c
    arch/x86/hvm/svm/vpmu.c -> arch/x86/vpmu_amd.c
    arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/vpmu_intel.c
    include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/Makefile                 |   1 +
 xen/arch/x86/hvm/Makefile             |   1 -
 xen/arch/x86/hvm/svm/Makefile         |   1 -
 xen/arch/x86/hvm/svm/vpmu.c           | 512 ------------------
 xen/arch/x86/hvm/vlapic.c             |   2 +-
 xen/arch/x86/hvm/vmx/Makefile         |   1 -
 xen/arch/x86/hvm/vmx/vpmu_core2.c     | 954 ----------------------------------
 xen/arch/x86/hvm/vpmu.c               | 750 --------------------------
 xen/arch/x86/oprofile/op_model_ppro.c |   2 +-
 xen/arch/x86/traps.c                  |   2 +-
 xen/arch/x86/vpmu.c                   | 750 ++++++++++++++++++++++++++
 xen/arch/x86/vpmu_amd.c               | 512 ++++++++++++++++++
 xen/arch/x86/vpmu_intel.c             | 954 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h    |   2 +-
 xen/include/asm-x86/hvm/vpmu.h        | 102 ----
 xen/include/asm-x86/vpmu.h            | 102 ++++
 16 files changed, 2323 insertions(+), 2325 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d502bdf..cf85dda 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -58,6 +58,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..742b83b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,3 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
index a10a55e..760d295 100644
--- a/xen/arch/x86/hvm/svm/Makefile
+++ b/xen/arch/x86/hvm/svm/Makefile
@@ -6,4 +6,3 @@ obj-y += nestedsvm.o
 obj-y += svm.o
 obj-y += svmdebug.o
 obj-y += vmcb.o
-obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
deleted file mode 100644
index 06dfc01..0000000
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ /dev/null
@@ -1,512 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2010, Advanced Micro Devices, Inc.
- * Parts of this code are Copyright (c) 2007, Intel Corporation
- *
- * Author: Wei Wang <wei.wang2@amd.com>
- * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- */
-
-#include <xen/config.h>
-#include <xen/xenoprof.h>
-#include <xen/hvm/save.h>
-#include <xen/sched.h>
-#include <xen/irq.h>
-#include <asm/apic.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vpmu.h>
-#include <public/pmu.h>
-
-#define MSR_F10H_EVNTSEL_GO_SHIFT   40
-#define MSR_F10H_EVNTSEL_EN_SHIFT   22
-#define MSR_F10H_COUNTER_LENGTH     48
-
-#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT))
-#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1))))
-
-static unsigned int __read_mostly num_counters;
-static const u32 __read_mostly *counters;
-static const u32 __read_mostly *ctrls;
-static bool_t __read_mostly k7_counters_mirrored;
-
-#define F10H_NUM_COUNTERS   4
-#define F15H_NUM_COUNTERS   6
-#define AMD_MAX_COUNTERS    6
-
-/* PMU Counter MSRs. */
-static const u32 AMD_F10H_COUNTERS[] = {
-    MSR_K7_PERFCTR0,
-    MSR_K7_PERFCTR1,
-    MSR_K7_PERFCTR2,
-    MSR_K7_PERFCTR3
-};
-
-/* PMU Control MSRs. */
-static const u32 AMD_F10H_CTRLS[] = {
-    MSR_K7_EVNTSEL0,
-    MSR_K7_EVNTSEL1,
-    MSR_K7_EVNTSEL2,
-    MSR_K7_EVNTSEL3
-};
-
-static const u32 AMD_F15H_COUNTERS[] = {
-    MSR_AMD_FAM15H_PERFCTR0,
-    MSR_AMD_FAM15H_PERFCTR1,
-    MSR_AMD_FAM15H_PERFCTR2,
-    MSR_AMD_FAM15H_PERFCTR3,
-    MSR_AMD_FAM15H_PERFCTR4,
-    MSR_AMD_FAM15H_PERFCTR5
-};
-
-static const u32 AMD_F15H_CTRLS[] = {
-    MSR_AMD_FAM15H_EVNTSEL0,
-    MSR_AMD_FAM15H_EVNTSEL1,
-    MSR_AMD_FAM15H_EVNTSEL2,
-    MSR_AMD_FAM15H_EVNTSEL3,
-    MSR_AMD_FAM15H_EVNTSEL4,
-    MSR_AMD_FAM15H_EVNTSEL5
-};
-
-/* Use private context as a flag for MSR bitmap */
-#define msr_bitmap_on(vpmu)    { (vpmu)->priv_context = (void *)-1L; }
-#define msr_bitmap_off(vpmu)   { (vpmu)->priv_context = NULL; }
-#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
-
-static inline int get_pmu_reg_type(u32 addr)
-{
-    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
-        return MSR_TYPE_CTRL;
-
-    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
-        return MSR_TYPE_COUNTER;
-
-    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
-         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
-    {
-        if (addr & 1)
-            return MSR_TYPE_COUNTER;
-        else
-            return MSR_TYPE_CTRL;
-    }
-
-    /* unsupported registers */
-    return -1;
-}
-
-static inline u32 get_fam15h_addr(u32 addr)
-{
-    switch ( addr )
-    {
-    case MSR_K7_PERFCTR0:
-        return MSR_AMD_FAM15H_PERFCTR0;
-    case MSR_K7_PERFCTR1:
-        return MSR_AMD_FAM15H_PERFCTR1;
-    case MSR_K7_PERFCTR2:
-        return MSR_AMD_FAM15H_PERFCTR2;
-    case MSR_K7_PERFCTR3:
-        return MSR_AMD_FAM15H_PERFCTR3;
-    case MSR_K7_EVNTSEL0:
-        return MSR_AMD_FAM15H_EVNTSEL0;
-    case MSR_K7_EVNTSEL1:
-        return MSR_AMD_FAM15H_EVNTSEL1;
-    case MSR_K7_EVNTSEL2:
-        return MSR_AMD_FAM15H_EVNTSEL2;
-    case MSR_K7_EVNTSEL3:
-        return MSR_AMD_FAM15H_EVNTSEL3;
-    default:
-        break;
-    }
-
-    return addr;
-}
-
-static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
-    }
-
-    msr_bitmap_on(vpmu);
-}
-
-static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
-    }
-
-    msr_bitmap_off(vpmu);
-}
-
-static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    return 1;
-}
-
-static inline void context_load(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        wrmsrl(counters[i], counter_regs[i]);
-        wrmsrl(ctrls[i], ctrl_regs[i]);
-    }
-}
-
-static void amd_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    vpmu_reset(vpmu, VPMU_FROZEN);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        unsigned int i;
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctrl_regs[i]);
-
-        return;
-    }
-
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-
-    context_load(v);
-}
-
-static inline void context_save(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-
-    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
-    for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], counter_regs[i]);
-}
-
-/* Must be NMI-safe */
-static int amd_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    unsigned int i;
-
-    /*
-     * Stop the counters. If we came here via vpmu_save_force (i.e.
-     * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
-     */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-    {
-        vpmu_set(vpmu, VPMU_FROZEN);
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], 0);
-
-        return 0;
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return 0;
-
-    context_save(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
-        amd_vpmu_unset_msr_bitmap(v);
-
-    return 1;
-}
-
-static void context_update(unsigned int msr, u64 msr_content)
-{
-    unsigned int i;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    if ( k7_counters_mirrored &&
-        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
-    {
-        msr = get_fam15h_addr(msr);
-    }
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-       if ( msr == ctrls[i] )
-       {
-           ctrl_regs[i] = msr_content;
-           return;
-       }
-        else if (msr == counters[i] )
-        {
-            counter_regs[i] = msr_content;
-            return;
-        }
-    }
-}
-
-static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    /* For all counters, enable guest only mode for HVM guest */
-    if ( has_hvm_container_domain(v->domain) &&
-         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-         !(is_guest_mode(msr_content)) )
-    {
-        set_guest_mode(msr_content);
-    }
-
-    /* check if the first counter is enabled */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 1;
-        vpmu_set(vpmu, VPMU_RUNNING);
-
-        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
-             amd_vpmu_set_msr_bitmap(v);
-    }
-
-    /* stop saving & restore if guest stops first counter */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
-             amd_vpmu_unset_msr_bitmap(v);
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    /* Update vpmu context immediately */
-    context_update(msr, msr_content);
-
-    /* Write to hw counters */
-    wrmsrl(msr, msr_content);
-    return 1;
-}
-
-static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    rdmsrl(msr, *msr_content);
-
-    return 1;
-}
-
-static int amd_vpmu_initialise(struct vcpu *v)
-{
-    struct xen_pmu_amd_ctxt *ctxt;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return 0;
-
-    if ( counters == NULL )
-    {
-         switch ( family )
-	 {
-	 case 0x15:
-	     num_counters = F15H_NUM_COUNTERS;
-	     counters = AMD_F15H_COUNTERS;
-	     ctrls = AMD_F15H_CTRLS;
-	     k7_counters_mirrored = 1;
-	     break;
-	 case 0x10:
-	 case 0x12:
-	 case 0x14:
-	 case 0x16:
-	 default:
-	     num_counters = F10H_NUM_COUNTERS;
-	     counters = AMD_F10H_COUNTERS;
-	     ctrls = AMD_F10H_CTRLS;
-	     k7_counters_mirrored = 0;
-	     break;
-	 }
-    }
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
-                             sizeof(uint64_t) * AMD_MAX_COUNTERS +
-                             sizeof(uint64_t) * AMD_MAX_COUNTERS);
-        if ( !ctxt )
-        {
-            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-                     " PMU feature is unavailable on domain %d vcpu %d.\n",
-                     v->vcpu_id, v->domain->domain_id);
-            return -ENOMEM;
-        }
-    }
-    else
-        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
-
-    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
-    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
-
-    vpmu->context = ctxt;
-    vpmu->priv_context = NULL;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    return 0;
-}
-
-static void amd_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( has_hvm_container_domain(v->domain) )
-    {
-        if ( is_msr_bitmap_on(vpmu) )
-            amd_vpmu_unset_msr_bitmap(v);
-
-        if ( is_hvm_domain(v->domain) )
-            xfree(vpmu->context);
-
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    vpmu->context = NULL;
-    vpmu_clear(vpmu);
-}
-
-/* VPMU part of the 'q' keyhandler */
-static void amd_vpmu_dump(const struct vcpu *v)
-{
-    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-    unsigned int i;
-
-    printk("    VPMU state: 0x%x ", vpmu->flags);
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-    {
-         printk("\n");
-         return;
-    }
-
-    printk("(");
-    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
-        printk("PASSIVE_DOMAIN_ALLOCATED, ");
-    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
-        printk("FROZEN, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        printk("SAVE, ");
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-        printk("RUNNING, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        printk("LOADED, ");
-    printk("ALLOCATED)\n");
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        uint64_t ctrl, cntr;
-
-        rdmsrl(ctrls[i], ctrl);
-        rdmsrl(counters[i], cntr);
-        printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctrl_regs[i], ctrl,
-               counters[i], counter_regs[i], cntr);
-    }
-}
-
-struct arch_vpmu_ops amd_vpmu_ops = {
-    .do_wrmsr = amd_vpmu_do_wrmsr,
-    .do_rdmsr = amd_vpmu_do_rdmsr,
-    .do_interrupt = amd_vpmu_do_interrupt,
-    .arch_vpmu_destroy = amd_vpmu_destroy,
-    .arch_vpmu_save = amd_vpmu_save,
-    .arch_vpmu_load = amd_vpmu_load,
-    .arch_vpmu_dump = amd_vpmu_dump
-};
-
-int svm_vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    int ret = 0;
-
-    /* vpmu enabled? */
-    if ( vpmu_mode == XENPMU_MODE_OFF )
-        return 0;
-
-    switch ( family )
-    {
-    case 0x10:
-    case 0x12:
-    case 0x14:
-    case 0x15:
-    case 0x16:
-        ret = amd_vpmu_initialise(v);
-        if ( !ret )
-            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
-        return ret;
-    }
-
-    printk("VPMU: Initialization failed. "
-           "AMD processor family %d has not "
-           "been supported\n", family);
-    return -EINVAL;
-}
-
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index f1b543c..b3fe4d3 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,7 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..04a29ce 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -3,5 +3,4 @@ obj-y += intr.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
-obj-y += vpmu_core2.o
 obj-y += vvmx.o
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
deleted file mode 100644
index ac4998b..0000000
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ /dev/null
@@ -1,954 +0,0 @@
-/*
- * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/irq.h>
-#include <asm/system.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/apic.h>
-#include <asm/traps.h>
-#include <asm/msr.h>
-#include <asm/msr-index.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <public/sched.h>
-#include <public/hvm/save.h>
-#include <public/pmu.h>
-#include <asm/hvm/vpmu.h>
-
-/*
- * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
- * instruction.
- * cpuid 0xa - Architectural Performance Monitoring Leaf
- * Register eax
- */
-#define PMU_VERSION_SHIFT        0  /* Version ID */
-#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
-#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT)
-
-#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
-#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
-#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT)
-
-#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
-#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
-#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT)
-/* Register edx */
-#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
-#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
-#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT)
-
-#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
-#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
-#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT)
-
-/* Alias registers (0x4c1) for full-width writes to PMCs */
-#define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
-static bool_t __read_mostly full_width_write;
-
-/* Intel-specific VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-/*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
- */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* Number of general-purpose and fixed performance counters */
-static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
-
-/*
- * QUIRK to workaround an issue on various family 6 cpus.
- * The issue leads to endless PMC interrupt loops on the processor.
- * If the interrupt handler is running and a pmc reaches the value 0, this
- * value remains forever and it triggers immediately a new interrupt after
- * finishing the handler.
- * A workaround is to read all flagged counters and if the value is 0 write
- * 1 (or another value != 0) into it.
- * There exist no errata and the real cause of this behaviour is unknown.
- */
-bool_t __read_mostly is_pmc_quirk;
-
-static void check_pmc_quirk(void)
-{
-    if ( current_cpu_data.x86 == 6 )
-        is_pmc_quirk = 1;
-    else
-        is_pmc_quirk = 0;    
-}
-
-static void handle_pmc_quirk(u64 msr_content)
-{
-    int i;
-    u64 val;
-
-    if ( !is_pmc_quirk )
-        return;
-
-    val = msr_content;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-    val = msr_content >> 32;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-}
-
-/*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
- */
-static int core2_get_arch_pmc_count(void)
-{
-    u32 eax;
-
-    eax = cpuid_eax(0xa);
-    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
-}
-
-/*
- * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
- */
-static int core2_get_fixed_pmc_count(void)
-{
-    u32 eax;
-
-    eax = cpuid_eax(0xa);
-    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
-}
-
-/* edx bits 5-12: Bit width of fixed-function performance counters  */
-static int core2_get_bitwidth_fix_count(void)
-{
-    u32 edx;
-
-    edx = cpuid_edx(0xa);
-    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
-}
-
-static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
-{
-    int i;
-    u32 msr_index_pmc;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
-        {
-            *type = MSR_TYPE_COUNTER;
-            *index = i;
-            return 1;
-        }
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
-        (msr_index == MSR_IA32_DS_AREA) ||
-        (msr_index == MSR_IA32_PEBS_ENABLE) )
-    {
-        *type = MSR_TYPE_CTRL;
-        return 1;
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
-    {
-        *type = MSR_TYPE_GLOBAL;
-        return 1;
-    }
-
-    msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
-    if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-         (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
-    {
-        *type = MSR_TYPE_ARCH_COUNTER;
-        *index = msr_index_pmc - MSR_IA32_PERFCTR0;
-        return 1;
-    }
-
-    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
-    {
-        *type = MSR_TYPE_ARCH_CTRL;
-        *index = msr_index - MSR_P6_EVNTSEL0;
-        return 1;
-    }
-
-    return 0;
-}
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
-static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-
-        if ( full_width_write )
-        {
-            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
-            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
-                      msr_bitmap + 0x800/BYTES_PER_LONG);
-        }
-    }
-
-    /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
-
-    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-
-        if ( full_width_write )
-        {
-            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
-            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
-                      msr_bitmap + 0x800/BYTES_PER_LONG);
-        }
-    }
-
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
-
-    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static inline void __core2_vpmu_save(struct vcpu *v)
-{
-    int i;
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
-
-    if ( !has_hvm_container_domain(v->domain) )
-        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
-}
-
-/* Must be NMI-safe */
-static int core2_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !has_hvm_container_domain(v->domain) )
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
-        return 0;
-
-    __core2_vpmu_save(v);
-
-    /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap)
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-
-    return 1;
-}
-
-static inline void __core2_vpmu_load(struct vcpu *v)
-{
-    unsigned int i, pmc_start;
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
-
-    if ( full_width_write )
-        pmc_start = MSR_IA32_A_PERFCTR0;
-    else
-        pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
-    }
-
-    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
-    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
-    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
-
-    if ( !has_hvm_container_domain(v->domain) )
-    {
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
-        core2_vpmu_cxt->global_ovf_ctrl = 0;
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-    }
-}
-
-static void core2_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-
-    __core2_vpmu_load(v);
-}
-
-static int core2_vpmu_alloc_resource(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-    uint64_t *p = NULL;
-
-    p = xzalloc_bytes(sizeof(uint64_t));
-    if ( !p )
-        goto out_err;
-
-    if ( has_hvm_container_domain(v->domain) )
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            goto out_err;
-
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
-            goto out_err_hvm;
-        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
-            goto out_err_hvm;
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    }
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
-                                       sizeof(uint64_t) * fixed_pmc_cnt +
-                                       sizeof(struct xen_pmu_cntr_pair) *
-                                       arch_pmc_cnt);
-        if ( !core2_vpmu_cxt )
-            goto out_err_hvm;
-    }
-    else
-    {
-        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
-    }
-
-    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
-    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
-                                    sizeof(uint64_t) * fixed_pmc_cnt;
-
-    vpmu->context = core2_vpmu_cxt;
-    vpmu->priv_context = p;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-
-    return 1;
-
-out_err_hvm:
-    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
-    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
-    release_pmu_ownship(PMU_OWNER_HVM);
-
-    xfree(core2_vpmu_cxt);
-    xfree(p);
-
-out_err:
-    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
-           v->vcpu_id, v->domain->domain_id);
-
-    return 0;
-}
-
-static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( !is_core2_vpmu_msr(msr_index, type, index) )
-        return 0;
-
-    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-         !core2_vpmu_alloc_resource(current) )
-        return 0;
-
-    /* Do the lazy load staff. */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        __core2_vpmu_load(current);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( has_hvm_container_domain(current->domain) &&
-             cpu_has_vmx_msr_bitmap )
-            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
-    }
-    return 1;
-}
-
-static void inject_trap(struct vcpu *v, unsigned int trapno)
-{
-    if ( has_hvm_container_domain(v->domain) )
-        hvm_inject_hw_exception(trapno, 0);
-    else
-        send_guest_trap(v->domain, v->vcpu_id, trapno);
-}
-
-static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    int i, tmp;
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-    uint64_t *enabled_cntrs;
-
-    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        /* Special handling for BTS */
-        if ( msr == MSR_IA32_DEBUGCTLMSR )
-        {
-            uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
-                                 IA32_DEBUGCTLMSR_BTINT;
-
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
-                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
-            if ( msr_content & supported )
-            {
-                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                    return 1;
-                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
-                inject_trap(v, TRAP_gp_fault);
-                return 0;
-            }
-        }
-        return 0;
-    }
-
-    core2_vpmu_cxt = vpmu->context;
-    enabled_cntrs = vpmu->priv_context;
-    switch ( msr )
-    {
-    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_status &= ~msr_content;
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-        return 1;
-    case MSR_CORE_PERF_GLOBAL_STATUS:
-        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
-                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        inject_trap(v, TRAP_gp_fault);
-        return 1;
-    case MSR_IA32_PEBS_ENABLE:
-        if ( msr_content & 1 )
-            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
-                     "which is not supported.\n");
-        core2_vpmu_cxt->pebs_enable = msr_content;
-        return 1;
-    case MSR_IA32_DS_AREA:
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            if ( !is_canonical_address(msr_content) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
-                         msr_content);
-                inject_trap(v, TRAP_gp_fault);
-                return 1;
-            }
-            core2_vpmu_cxt->ds_area = msr_content;
-            break;
-        }
-        gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
-        return 1;
-    case MSR_CORE_PERF_GLOBAL_CTRL:
-        core2_vpmu_cxt->global_ctrl = msr_content;
-        break;
-    case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        if ( has_hvm_container_domain(v->domain) )
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                               &core2_vpmu_cxt->global_ctrl);
-        else
-            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
-        if ( msr_content != 0 )
-        {
-            u64 val = msr_content;
-            for ( i = 0; i < fixed_pmc_cnt; i++ )
-            {
-                if ( val & 3 )
-                    *enabled_cntrs |= (1ULL << 32) << i;
-                val >>= FIXED_CTR_CTRL_BITS;
-            }
-        }
-
-        core2_vpmu_cxt->fixed_ctrl = msr_content;
-        break;
-    default:
-        tmp = msr - MSR_P6_EVNTSEL0;
-        if ( tmp >= 0 && tmp < arch_pmc_cnt )
-        {
-            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-            if ( has_hvm_container_domain(v->domain) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                                   &core2_vpmu_cxt->global_ctrl);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-
-            if ( msr_content & (1ULL << 22) )
-                *enabled_cntrs |= 1ULL << tmp;
-            else
-                *enabled_cntrs &= ~(1ULL << tmp);
-
-            xen_pmu_cntr_pair[tmp].control = msr_content;
-        }
-    }
-
-    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
-         (core2_vpmu_cxt->ds_area != 0) )
-        vpmu_set(vpmu, VPMU_RUNNING);
-    else
-        vpmu_reset(vpmu, VPMU_RUNNING);
-
-    if ( type != MSR_TYPE_GLOBAL )
-    {
-        u64 mask;
-        int inject_gp = 0;
-        switch ( type )
-        {
-        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
-            mask = ~((1ull << 32) - 1);
-            if ( msr_content & mask )
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
-            if  ( msr == MSR_IA32_DS_AREA )
-                break;
-            /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
-            if ( msr_content & mask )
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
-            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if ( msr_content & mask )
-                inject_gp = 1;
-            break;
-        }
-        if ( inject_gp )
-            inject_trap(v, TRAP_gp_fault);
-        else
-            wrmsrl(msr, msr_content);
-    }
-    else
-    {
-       if ( has_hvm_container_domain(v->domain) )
-           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-       else
-           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-    }
-
-    return 1;
-}
-
-static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-
-    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        core2_vpmu_cxt = vpmu->context;
-        switch ( msr )
-        {
-        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-            *msr_content = 0;
-            break;
-        case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_status;
-            break;
-        case MSR_CORE_PERF_GLOBAL_CTRL:
-            if ( has_hvm_container_domain(v->domain) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
-            break;
-        default:
-            rdmsrl(msr, *msr_content);
-        }
-    }
-    else
-    {
-        /* Extension for BTS */
-        if ( msr == MSR_IA32_MISC_ENABLE )
-        {
-            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-        }
-        else
-            return 0;
-    }
-
-    return 1;
-}
-
-static void core2_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    if (input == 0x1)
-    {
-        struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            /* Switch on the 'Debug Store' feature in CPUID.EAX[1]:EDX[21] */
-            *edx |= cpufeat_mask(X86_FEATURE_DS);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
-        }
-    }
-}
-
-/* Dump vpmu info on console, called in the context of keyhandler 'q'. */
-static void core2_vpmu_dump(const struct vcpu *v)
-{
-    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i;
-    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
-    u64 val;
-    uint64_t *fixed_counters;
-    struct xen_pmu_cntr_pair *cntr_pair;
-
-    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-         return;
-
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-            printk("    vPMU loaded\n");
-        else
-            printk("    vPMU allocated\n");
-        return;
-    }
-
-    printk("    vPMU running\n");
-
-    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-
-    /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-            i, cntr_pair[i].counter, cntr_pair[i].control);
-
-    /*
-     * The configuration of the fixed counter is 4 bits each in the
-     * MSR_CORE_PERF_FIXED_CTR_CTRL.
-     */
-    val = core2_vpmu_cxt->fixed_ctrl;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, fixed_counters[i],
-               val & FIXED_CTR_CTRL_MASK);
-        val >>= FIXED_CTR_CTRL_BITS;
-    }
-}
-
-static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    u64 msr_content;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
-
-    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
-    if ( msr_content )
-    {
-        if ( is_pmc_quirk )
-            handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-    }
-    else
-    {
-        /* No PMC overflow but perhaps a Trace Message interrupt. */
-        __vmread(GUEST_IA32_DEBUGCTL, &msr_content);
-        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
-            return 0;
-    }
-
-    return 1;
-}
-
-static int core2_vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    u64 msr_content;
-    struct cpuinfo_x86 *c = &current_cpu_data;
-
-    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
-        goto func_out;
-    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
-    if ( cpu_has(c, X86_FEATURE_DS) )
-    {
-        if ( !cpu_has(c, X86_FEATURE_DTES64) )
-        {
-            printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
-                   " - Debug Store disabled for %pv\n",
-                   v);
-            goto func_out;
-        }
-        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
-        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
-        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
-        {
-            /* If BTS_UNAVAIL is set reset the DS feature. */
-            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
-            printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
-                   " - Debug Store disabled for %pv\n",
-                   v);
-        }
-        else
-        {
-            vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
-            if ( !cpu_has(c, X86_FEATURE_DSCPL) )
-                printk(XENLOG_G_INFO
-                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
-            printk("******************************************************\n");
-            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
-            printk("** Using this processor feature in a virtualized    **\n");
-            printk("** environment is not 100%% safe.                    **\n");
-            printk("** Setting the DS buffer address with wrong values  **\n");
-            printk("** may lead to hypervisor hangs or crashes.         **\n");
-            printk("** It is NOT recommended for production use!        **\n");
-            printk("******************************************************\n");
-        }
-    }
-func_out:
-
-    arch_pmc_cnt = core2_get_arch_pmc_count();
-    fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    check_pmc_quirk();
-
-    /* PV domains can allocate resources immediately */
-    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
-        return 1;
-
-    return 0;
-}
-
-static void core2_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( has_hvm_container_domain(v->domain) )
-    {
-        if ( cpu_has_vmx_msr_bitmap )
-            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-
-        if ( is_hvm_domain(v->domain) )
-            xfree(vpmu->context);
-
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    xfree(vpmu->priv_context);
-    vpmu->context = NULL;
-    vpmu_clear(vpmu);
-}
-
-struct arch_vpmu_ops core2_vpmu_ops = {
-    .do_wrmsr = core2_vpmu_do_wrmsr,
-    .do_rdmsr = core2_vpmu_do_rdmsr,
-    .do_interrupt = core2_vpmu_do_interrupt,
-    .do_cpuid = core2_vpmu_do_cpuid,
-    .arch_vpmu_destroy = core2_vpmu_destroy,
-    .arch_vpmu_save = core2_vpmu_save,
-    .arch_vpmu_load = core2_vpmu_load,
-    .arch_vpmu_dump = core2_vpmu_dump
-};
-
-static void core2_no_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    /*
-     * As in this case the vpmu is not enabled reset some bits in the
-     * architectural performance monitoring related part.
-     */
-    if ( input == 0xa )
-    {
-        *eax &= ~PMU_VERSION_MASK;
-        *eax &= ~PMU_GENERAL_NR_MASK;
-        *eax &= ~PMU_GENERAL_WIDTH_MASK;
-
-        *edx &= ~PMU_FIXED_NR_MASK;
-        *edx &= ~PMU_FIXED_WIDTH_MASK;
-    }
-}
-
-/*
- * If its a vpmu msr set it to 0.
- */
-static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    if ( !is_core2_vpmu_msr(msr, &type, &index) )
-        return 0;
-    *msr_content = 0;
-    return 1;
-}
-
-/*
- * These functions are used in case vpmu is not enabled.
- */
-struct arch_vpmu_ops core2_no_vpmu_ops = {
-    .do_rdmsr = core2_no_vpmu_do_rdmsr,
-    .do_cpuid = core2_no_vpmu_do_cpuid,
-};
-
-int vmx_vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    uint8_t cpu_model = current_cpu_data.x86_model;
-    int ret = 0;
-
-    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( vpmu_mode == XENPMU_MODE_OFF )
-        return 0;
-
-    if ( family == 6 )
-    {
-        u64 caps;
-
-        rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
-        full_width_write = (caps >> 13) & 1;
-
-        switch ( cpu_model )
-        {
-        /* Core2: */
-        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
-        case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
-        case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
-        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
-
-        case 0x2a: /* SandyBridge */
-        case 0x2d: /* SandyBridge, "Romley-EP" */
-
-        /* Nehalem: */
-        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
-        case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */
-        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
-
-        /* Westmere: */
-        case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */
-        case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */
-        case 0x27: /* 32 nm Westmere-EX */
-
-        case 0x3a: /* IvyBridge */
-        case 0x3e: /* IvyBridge EP */
-
-        /* Haswell: */
-        case 0x3c:
-        case 0x3f:
-        case 0x45:
-        case 0x46:
-
-        /* future: */
-        case 0x3d:
-        case 0x4e:
-            ret = core2_vpmu_initialise(v);
-            if ( !ret )
-                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
-            return ret;
-        }
-    }
-
-    printk("VPMU: Initialization failed. "
-           "Intel processor family %d model %d has not "
-           "been supported\n", family, cpu_model);
-    return -EINVAL;
-}
-
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
deleted file mode 100644
index c6280dc..0000000
--- a/xen/arch/x86/hvm/vpmu.c
+++ /dev/null
@@ -1,750 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/event.h>
-#include <xen/softirq.h>
-#include <xen/hypercall.h>
-#include <xen/guest_access.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/msr.h>
-#include <asm/p2m.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <asm/hvm/vpmu.h>
-#include <asm/hvm/svm/svm.h>
-#include <asm/hvm/svm/vmcb.h>
-#include <asm/apic.h>
-#include <asm/nmi.h>
-#include <public/pmu.h>
-
-/*
- * "vpmu" :     vpmu generally enabled
- * "vpmu=off" : vpmu generally disabled
- * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
- */
-uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
-uint64_t __read_mostly vpmu_features = 0;
-static void parse_vpmu_param(char *s);
-custom_param("vpmu", parse_vpmu_param);
-
-static void pmu_softnmi(void);
-
-static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
-static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
-
-static uint32_t __read_mostly vpmu_interrupt_type = PMU_APIC_VECTOR;
-
-static void __init parse_vpmu_param(char *s)
-{
-    char *ss;
-
-    vpmu_mode = XENPMU_MODE_ON;
-    if (*s == '\0')
-        return;
-
-    do {
-        ss = strchr(s, ',');
-        if ( ss )
-            *ss = '\0';
-
-        switch  ( parse_bool(s) )
-        {
-        case 0:
-            vpmu_mode = XENPMU_MODE_OFF;
-            return;
-        case -1:
-            if ( !strcmp(s, "nmi") )
-                vpmu_interrupt_type = APIC_DM_NMI;
-            else if ( !strcmp(s, "bts") )
-                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
-            else if ( !strcmp(s, "priv") )
-            {
-                vpmu_mode &= ~XENPMU_MODE_ON;
-                vpmu_mode |= XENPMU_MODE_PRIV;
-            }
-            else
-            {
-                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
-                vpmu_mode = XENPMU_MODE_OFF;
-                return;
-            }
-        default:
-            break;
-        }
-
-        s = ss + 1;
-    } while ( ss );
-}
-
-void vpmu_lvtpc_update(uint32_t val)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | (val & APIC_LVT_MASKED);
-
-    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
-    if ( is_hvm_domain(current->domain) ||
-         !(current->arch.vpmu.xenpmu_data &&
-           (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED)) )
-        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-}
-
-static void vpmu_send_interrupt(struct vcpu *v)
-{
-    struct vlapic *vlapic;
-    u32 vlapic_lvtpc;
-
-    ASSERT( is_hvm_vcpu(v) );
-
-    vlapic = vcpu_vlapic(v);
-    if ( !is_vlapic_lvtpc_enabled(vlapic) )
-        return;
-
-    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
-    else
-        v->nmi_pending = 1;
-}
-
-int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
-         ((vpmu_mode & XENPMU_MODE_PRIV) &&
-          !is_hardware_domain(current->domain)) )
-        return 0;
-
-    ASSERT((rw == VPMU_MSR_READ) || (rw == VPMU_MSR_WRITE));
-
-    if ( vpmu->arch_vpmu_ops )
-    {
-        int ret;
-
-        if ( (rw == VPMU_MSR_READ) && vpmu->arch_vpmu_ops->do_rdmsr )
-            ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
-        else if ( vpmu->arch_vpmu_ops->do_wrmsr )
-            ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, *msr_content);
-        else
-            return 0;
-
-        /*
-         * We may have received a PMU interrupt while handling MSR access
-         * and since do_wr/rdmsr may load VPMU context we should save
-         * (and unload) it again.
-         */
-        if ( !is_hvm_domain(current->domain) &&
-             (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-
-        return ret;
-    }
-
-    return 0;
-}
-
-static struct vcpu *choose_dom0_vcpu(void)
-{
-    struct vcpu *v;
-    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
-
-    v = hardware_domain->vcpu[idx];
-
-    /*
-     * If index is not populated search downwards the vcpu array until
-     * a valid vcpu can be found
-     */
-    while ( !v && idx-- )
-        v = hardware_domain->vcpu[idx];
-
-    return v;
-}
-
-/* This routine may be called in NMI context */
-int vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu;
-
-    /*
-     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
-     * in XENPMU_MODE_PRIV, for everyone.
-     */
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
-         (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
-    {
-        v = choose_dom0_vcpu();
-        if ( !v )
-            return 0;
-    }
-
-    vpmu = vcpu_vpmu(v);
-    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
-    {
-        /* PV(H) guest or dom0 is doing system profiling */
-        const struct cpu_user_regs *gregs;
-        int err;
-
-        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
-            return 1;
-
-        if ( !(vpmu_mode & XENPMU_MODE_PRIV) &&
-             is_pvh_domain(current->domain) &&
-             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
-            return 0;
-
-        /* PV guest will be reading PMU MSRs from xenpmu_data */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
-        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-
-        if ( !is_hvm_domain(current->domain) )
-        {
-            /* Store appropriate registers in xenpmu_data */
-            if ( is_pv_32bit_domain(current->domain) )
-            {
-                /*
-                 * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
-                 * and therefore we treat it the same way as a non-priviledged
-                 * PV 32-bit domain.
-                 */
-                struct compat_cpu_user_regs *cmp;
-
-                gregs = guest_cpu_user_regs();
-
-                cmp = (void *)&v->arch.vpmu.xenpmu_data->pmu.r.regs;
-                XLAT_cpu_user_regs(cmp, gregs);
-
-                /* Adjust RPL for kernel mode */
-                if ((cmp->cs & 3) == 1)
-                    cmp->cs &= ~3;
-            }
-            else
-            {
-                if ( !is_hardware_domain(current->domain) &&
-                     !is_idle_vcpu(current) )
-                {
-                    /* 64-bit unprivileged PV(H) guest */
-                    gregs = guest_cpu_user_regs();
-                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                           gregs, sizeof(struct cpu_user_regs));
-                }
-                else
-                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                           regs, sizeof(struct cpu_user_regs));
-
-                if ( !is_pvh_domain(current->domain) )
-                {
-                    if ( current->arch.flags & TF_kernel_mode )
-                        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs &= ~3;
-                }
-                else if ( !(vpmu_interrupt_type & APIC_DM_NMI) )
-                {
-                    /* Unsafe in NMI context, defer to softint later */
-                    struct segment_register seg_cs;
-
-                    hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
-                    v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
-                }
-            }
-        }
-        else
-        {
-            /* HVM guest */
-            struct segment_register seg_cs;
-
-            gregs = guest_cpu_user_regs();
-            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
-                   gregs, sizeof(struct cpu_user_regs));
-
-            /* This is unsafe in NMI context, we'll do it in softint handler */
-            if ( !(vpmu_interrupt_type & APIC_DM_NMI ) )
-            {
-                hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
-                v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
-            }
-        }
-
-        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
-        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
-        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
-
-        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
-        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
-
-        if ( vpmu_interrupt_type & APIC_DM_NMI )
-        {
-            per_cpu(sampled_vcpu, smp_processor_id()) = current;
-            raise_softirq(PMU_SOFTIRQ);
-        }
-        else
-            send_guest_vcpu_virq(v, VIRQ_XENPMU);
-
-        return 1;
-    }
-
-    if ( vpmu->arch_vpmu_ops )
-    {
-        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
-            return 0;
-
-        if ( vpmu_interrupt_type & APIC_DM_NMI )
-        {
-            per_cpu(sampled_vcpu, smp_processor_id()) = current;
-            raise_softirq(PMU_SOFTIRQ);
-        }
-        else
-            vpmu_send_interrupt(v);
-
-        return 1;
-    }
-
-    return 0;
-}
-
-void vpmu_do_cpuid(unsigned int input,
-                   unsigned int *eax, unsigned int *ebx,
-                   unsigned int *ecx, unsigned int *edx)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid )
-        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
-}
-
-static void vpmu_save_force(void *arg)
-{
-    struct vcpu *v = (struct vcpu *)arg;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return;
-
-    if ( vpmu->arch_vpmu_ops )
-        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
-
-    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
-
-    per_cpu(last_vcpu, smp_processor_id()) = NULL;
-
-    pmu_softnmi();
-}
-
-void vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-
-    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
-       return;
-
-    vpmu->last_pcpu = pcpu;
-    per_cpu(last_vcpu, pcpu) = v;
-
-    if ( vpmu->arch_vpmu_ops )
-        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-    apic_write(APIC_LVTPC, vpmu_interrupt_type | APIC_LVT_MASKED);
-
-    /* Make sure there are no outstanding PMU NMIs */
-    pmu_softnmi();
-}
-
-void vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-    struct vcpu *prev = NULL;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    /* First time this VCPU is running here */
-    if ( vpmu->last_pcpu != pcpu )
-    {
-        /*
-         * Get the context from last pcpu that we ran on. Note that if another
-         * VCPU is running there it must have saved this VPCU's context before
-         * startig to run (see below).
-         * There should be no race since remote pcpu will disable interrupts
-         * before saving the context.
-         */
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-                             vpmu_save_force, (void *)v, 1);
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-        }
-    } 
-
-    /* Prevent forced context save from remote CPU */
-    local_irq_disable();
-
-    prev = per_cpu(last_vcpu, pcpu);
-
-    if ( prev != v && prev )
-    {
-        vpmu = vcpu_vpmu(prev);
-
-        /* Someone ran here before us */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-        vpmu_save_force(prev);
-        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-        pmu_softnmi();
-
-        vpmu = vcpu_vpmu(v);
-    }
-
-    local_irq_enable();
-
-    /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
-         (!is_hvm_domain(v->domain) &&
-          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
-        return;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
-    {
-        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-        /* Arch code needs to set VPMU_CONTEXT_LOADED */
-        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
-    }
-}
-
-void vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t vendor = current_cpu_data.x86_vendor;
-
-    BUILD_BUG_ON((sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ) ||
-                 (sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ));
-
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        vpmu_destroy(v);
-    vpmu_clear(vpmu);
-    vpmu->context = NULL;
-
-    switch ( vendor )
-    {
-    case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v) != 0 )
-            vpmu_mode = XENPMU_MODE_OFF;
-        break;
-
-    case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v) != 0 )
-            vpmu_mode = XENPMU_MODE_OFF;
-        break;
-
-    default:
-        printk("VPMU: Initialization failed. "
-               "Unknown CPU vendor %d\n", vendor);
-        vpmu_mode = XENPMU_MODE_OFF;
-        break;
-    }
-}
-
-void vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
-    {
-        /* Unload VPMU first. This will stop counters */
-        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
-                         vpmu_save_force, v, 1);
-
-        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
-    }
-}
-
-/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
-void vpmu_dump(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump )
-        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
-}
-
-/* Unload VPMU contexts */
-static void vpmu_unload_all(void)
-{
-    struct domain *d;
-    struct vcpu *v;
-    struct vpmu_struct *vpmu;
-
-    for_each_domain( d )
-    {
-        for_each_vcpu ( d, v )
-        {
-            if ( v != current )
-                vcpu_pause(v);
-            vpmu = vcpu_vpmu(v);
-
-            if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-            {
-                if ( v != current )
-                    vcpu_unpause(v);
-                continue;
-            }
-
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-                             vpmu_save_force, (void *)v, 1);
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-            if ( v != current )
-                vcpu_unpause(v);
-        }
-    }
-}
-
-/* Process the softirq set by PMU NMI handler */
-static void pmu_softnmi(void)
-{
-    struct vcpu *v, *sampled = this_cpu(sampled_vcpu);
-
-    if ( sampled == NULL )
-        return;
-    this_cpu(sampled_vcpu) = NULL;
-
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
-         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
-    {
-            v = choose_dom0_vcpu();
-            if ( !v )
-                return;
-    }
-    else
-    {
-        if ( is_hvm_domain(sampled->domain) )
-        {
-            vpmu_send_interrupt(sampled);
-            return;
-        }
-        v = sampled;
-    }
-
-    if ( has_hvm_container_domain(sampled->domain) )
-    {
-        struct segment_register seg_cs;
-
-        hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
-        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
-    }
-
-    send_guest_vcpu_virq(v, VIRQ_XENPMU);
-}
-
-int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
-{
-    return vpmu_do_interrupt(regs);
-}
-
-static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
-{
-    struct vcpu *v;
-    struct page_info *page;
-    uint64_t gfn = params->d.val;
-    static bool_t __read_mostly pvpmu_initted = 0;
-    static DEFINE_SPINLOCK(init_lock);
-
-    if ( !params || params->vcpu >= d->max_vcpus )
-        return -EINVAL;
-
-    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
-    if ( !page )
-        return -EINVAL;
-
-    if ( !get_page_type(page, PGT_writable_page) )
-    {
-        put_page(page);
-        return -EINVAL;
-    }
-
-    v = d->vcpu[params->vcpu];
-    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
-    if ( !v->arch.vpmu.xenpmu_data )
-    {
-        put_page_and_type(page);
-        return -EINVAL;
-    }
-
-    spin_lock(&init_lock);
-
-    if ( !pvpmu_initted )
-    {
-        if ( reserve_lapic_nmi() == 0 )
-            set_nmi_callback(pmu_nmi_interrupt);
-        else
-        {
-            spin_unlock(&init_lock);
-            printk("Failed to reserve PMU NMI\n");
-            put_page(page);
-            return -EBUSY;
-        }
-        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
-
-        pvpmu_initted = 1;
-    }
-
-    spin_unlock(&init_lock);
-
-    vpmu_initialise(v);
-
-    return 0;
-}
-
-static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
-{
-    struct vcpu *v;
-    uint64_t mfn;
-
-    if ( !params || params->vcpu >= d->max_vcpus )
-        return;
-
-    v = d->vcpu[params->vcpu];
-    if (v != current)
-        vcpu_pause(v);
-
-    if ( v->arch.vpmu.xenpmu_data )
-    {
-        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
-        if ( mfn_valid(mfn) )
-        {
-            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
-            put_page_and_type(mfn_to_page(mfn));
-        }
-    }
-    vpmu_destroy(v);
-
-    if ( v != current )
-        vcpu_unpause(v);
-}
-
-long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
-{
-    int ret = -EINVAL;
-    xen_pmu_params_t pmu_params;
-
-    switch ( op )
-    {
-    case XENPMU_mode_set:
-        if ( !is_control_domain(current->domain) )
-            return -EPERM;
-
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        if ( (pmu_params.d.val & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
-             ((pmu_params.d.val & XENPMU_MODE_ON) &&
-              (pmu_params.d.val & XENPMU_MODE_PRIV)) )
-            return -EINVAL;
-
-        vpmu_mode = pmu_params.d.val;
-
-        if ( (vpmu_mode == XENPMU_MODE_OFF) || (vpmu_mode & XENPMU_MODE_PRIV) )
-            /*
-             * After this VPMU context will never be loaded during context
-             * switch. Because PMU MSR accesses load VPMU context we don't
-             * allow them when VPMU is off and, for non-provileged domains,
-             * when we are in privileged mode. (We do want these accesses to
-             * load VPMU context for control domain in this mode)
-             */
-            vpmu_unload_all();
-
-        ret = 0;
-        break;
-
-    case XENPMU_mode_get:
-        pmu_params.d.val = vpmu_mode;
-        pmu_params.v.version.maj = XENPMU_VER_MAJ;
-        pmu_params.v.version.min = XENPMU_VER_MIN;
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-        ret = 0;
-        break;
-
-    case XENPMU_feature_set:
-        if ( !is_control_domain(current->domain) )
-            return -EPERM;
-
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        if ( pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
-            return -EINVAL;
-
-        vpmu_features = pmu_params.d.val;
-
-        ret = 0;
-        break;
-
-    case XENPMU_feature_get:
-        pmu_params.d.val = vpmu_mode;
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-        ret = 0;
-        break;
-
-    case XENPMU_init:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-        ret = pvpmu_init(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_finish:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-        pvpmu_finish(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_lvtpc_set:
-        if ( current->arch.vpmu.xenpmu_data == NULL )
-            return -EINVAL;
-        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
-        ret = 0;
-        break;
-    case XENPMU_flush:
-        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
-        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
-        vpmu_load(current);
-        ret = 0;
-        break;
-    }
-
-    return ret;
-}
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index 5aae2e7..bf5d9a5 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -19,7 +19,7 @@
 #include <asm/processor.h>
 #include <asm/regs.h>
 #include <asm/current.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8206662..ced38ad 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,7 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
diff --git a/xen/arch/x86/vpmu.c b/xen/arch/x86/vpmu.c
new file mode 100644
index 0000000..8977cf3
--- /dev/null
+++ b/xen/arch/x86/vpmu.c
@@ -0,0 +1,750 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/msr.h>
+#include <asm/p2m.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <asm/vpmu.h>
+#include <asm/hvm/svm/svm.h>
+#include <asm/hvm/svm/vmcb.h>
+#include <asm/apic.h>
+#include <asm/nmi.h>
+#include <public/pmu.h>
+
+/*
+ * "vpmu" :     vpmu generally enabled
+ * "vpmu=off" : vpmu generally disabled
+ * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
+ */
+uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+uint64_t __read_mostly vpmu_features = 0;
+static void parse_vpmu_param(char *s);
+custom_param("vpmu", parse_vpmu_param);
+
+static void pmu_softnmi(void);
+
+static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
+
+static uint32_t __read_mostly vpmu_interrupt_type = PMU_APIC_VECTOR;
+
+static void __init parse_vpmu_param(char *s)
+{
+    char *ss;
+
+    vpmu_mode = XENPMU_MODE_ON;
+    if (*s == '\0')
+        return;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch  ( parse_bool(s) )
+        {
+        case 0:
+            vpmu_mode = XENPMU_MODE_OFF;
+            return;
+        case -1:
+            if ( !strcmp(s, "nmi") )
+                vpmu_interrupt_type = APIC_DM_NMI;
+            else if ( !strcmp(s, "bts") )
+                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
+            else if ( !strcmp(s, "priv") )
+            {
+                vpmu_mode &= ~XENPMU_MODE_ON;
+                vpmu_mode |= XENPMU_MODE_PRIV;
+            }
+            else
+            {
+                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+                vpmu_mode = XENPMU_MODE_OFF;
+                return;
+            }
+        default:
+            break;
+        }
+
+        s = ss + 1;
+    } while ( ss );
+}
+
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | (val & APIC_LVT_MASKED);
+
+    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+    if ( is_hvm_domain(current->domain) ||
+         !(current->arch.vpmu.xenpmu_data &&
+           (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED)) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
+static void vpmu_send_interrupt(struct vcpu *v)
+{
+    struct vlapic *vlapic;
+    u32 vlapic_lvtpc;
+
+    ASSERT( is_hvm_vcpu(v) );
+
+    vlapic = vcpu_vlapic(v);
+    if ( !is_vlapic_lvtpc_enabled(vlapic) )
+        return;
+
+    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
+    else
+        v->nmi_pending = 1;
+}
+
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_PRIV) &&
+          !is_hardware_domain(current->domain)) )
+        return 0;
+
+    ASSERT((rw == VPMU_MSR_READ) || (rw == VPMU_MSR_WRITE));
+
+    if ( vpmu->arch_vpmu_ops )
+    {
+        int ret;
+
+        if ( (rw == VPMU_MSR_READ) && vpmu->arch_vpmu_ops->do_rdmsr )
+            ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+        else if ( vpmu->arch_vpmu_ops->do_wrmsr )
+            ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, *msr_content);
+        else
+            return 0;
+
+        /*
+         * We may have received a PMU interrupt while handling MSR access
+         * and since do_wr/rdmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_domain(current->domain) &&
+             (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+
+        return ret;
+    }
+
+    return 0;
+}
+
+static struct vcpu *choose_dom0_vcpu(void)
+{
+    struct vcpu *v;
+    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
+
+    v = hardware_domain->vcpu[idx];
+
+    /*
+     * If index is not populated search downwards the vcpu array until
+     * a valid vcpu can be found
+     */
+    while ( !v && idx-- )
+        v = hardware_domain->vcpu[idx];
+
+    return v;
+}
+
+/* This routine may be called in NMI context */
+int vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu;
+
+    /*
+     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+     * in XENPMU_MODE_PRIV, for everyone.
+     */
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+         (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+        v = choose_dom0_vcpu();
+        if ( !v )
+            return 0;
+    }
+
+    vpmu = vcpu_vpmu(v);
+    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
+    {
+        /* PV(H) guest or dom0 is doing system profiling */
+        const struct cpu_user_regs *gregs;
+        int err;
+
+        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
+            return 1;
+
+        if ( !(vpmu_mode & XENPMU_MODE_PRIV) &&
+             is_pvh_domain(current->domain) &&
+             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        if ( !is_hvm_domain(current->domain) )
+        {
+            /* Store appropriate registers in xenpmu_data */
+            if ( is_pv_32bit_domain(current->domain) )
+            {
+                /*
+                 * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+                 * and therefore we treat it the same way as a non-priviledged
+                 * PV 32-bit domain.
+                 */
+                struct compat_cpu_user_regs *cmp;
+
+                gregs = guest_cpu_user_regs();
+
+                cmp = (void *)&v->arch.vpmu.xenpmu_data->pmu.r.regs;
+                XLAT_cpu_user_regs(cmp, gregs);
+
+                /* Adjust RPL for kernel mode */
+                if ((cmp->cs & 3) == 1)
+                    cmp->cs &= ~3;
+            }
+            else
+            {
+                if ( !is_hardware_domain(current->domain) &&
+                     !is_idle_vcpu(current) )
+                {
+                    /* 64-bit unprivileged PV(H) guest */
+                    gregs = guest_cpu_user_regs();
+                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                           gregs, sizeof(struct cpu_user_regs));
+                }
+                else
+                    memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                           regs, sizeof(struct cpu_user_regs));
+
+                if ( !is_pvh_domain(current->domain) )
+                {
+                    if ( current->arch.flags & TF_kernel_mode )
+                        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs &= ~3;
+                }
+                else if ( !(vpmu_interrupt_type & APIC_DM_NMI) )
+                {
+                    /* Unsafe in NMI context, defer to softint later */
+                    struct segment_register seg_cs;
+
+                    hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+                    v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+                }
+            }
+        }
+        else
+        {
+            /* HVM guest */
+            struct segment_register seg_cs;
+
+            gregs = guest_cpu_user_regs();
+            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
+                   gregs, sizeof(struct cpu_user_regs));
+
+            /* This is unsafe in NMI context, we'll do it in softint handler */
+            if ( !(vpmu_interrupt_type & APIC_DM_NMI ) )
+            {
+                hvm_get_segment_register(current, x86_seg_cs, &seg_cs);
+                v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+            }
+        }
+
+        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
+        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
+        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
+
+        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            send_guest_vcpu_virq(v, VIRQ_XENPMU);
+
+        return 1;
+    }
+
+    if ( vpmu->arch_vpmu_ops )
+    {
+        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = current;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            vpmu_send_interrupt(v);
+
+        return 1;
+    }
+
+    return 0;
+}
+
+void vpmu_do_cpuid(unsigned int input,
+                   unsigned int *eax, unsigned int *ebx,
+                   unsigned int *ecx, unsigned int *edx)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid )
+        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
+}
+
+static void vpmu_save_force(void *arg)
+{
+    struct vcpu *v = (struct vcpu *)arg;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return;
+
+    if ( vpmu->arch_vpmu_ops )
+        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+
+    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
+
+    per_cpu(last_vcpu, smp_processor_id()) = NULL;
+
+    pmu_softnmi();
+}
+
+void vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
+       return;
+
+    vpmu->last_pcpu = pcpu;
+    per_cpu(last_vcpu, pcpu) = v;
+
+    if ( vpmu->arch_vpmu_ops )
+        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+    apic_write(APIC_LVTPC, vpmu_interrupt_type | APIC_LVT_MASKED);
+
+    /* Make sure there are no outstanding PMU NMIs */
+    pmu_softnmi();
+}
+
+void vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+    struct vcpu *prev = NULL;
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    /* First time this VCPU is running here */
+    if ( vpmu->last_pcpu != pcpu )
+    {
+        /*
+         * Get the context from last pcpu that we ran on. Note that if another
+         * VCPU is running there it must have saved this VPCU's context before
+         * startig to run (see below).
+         * There should be no race since remote pcpu will disable interrupts
+         * before saving the context.
+         */
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                             vpmu_save_force, (void *)v, 1);
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+        }
+    } 
+
+    /* Prevent forced context save from remote CPU */
+    local_irq_disable();
+
+    prev = per_cpu(last_vcpu, pcpu);
+
+    if ( prev != v && prev )
+    {
+        vpmu = vcpu_vpmu(prev);
+
+        /* Someone ran here before us */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+        vpmu_save_force(prev);
+        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+        pmu_softnmi();
+
+        vpmu = vcpu_vpmu(v);
+    }
+
+    local_irq_enable();
+
+    /* Only when PMU is counting, we load PMU context immediately. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (!is_hvm_domain(v->domain) &&
+          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+        return;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
+    {
+        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        /* Arch code needs to set VPMU_CONTEXT_LOADED */
+        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
+    }
+}
+
+void vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t vendor = current_cpu_data.x86_vendor;
+
+    BUILD_BUG_ON((sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ) ||
+                 (sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ));
+
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        vpmu_destroy(v);
+    vpmu_clear(vpmu);
+    vpmu->context = NULL;
+
+    switch ( vendor )
+    {
+    case X86_VENDOR_AMD:
+        if ( svm_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
+        break;
+
+    case X86_VENDOR_INTEL:
+        if ( vmx_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
+        break;
+
+    default:
+        printk("VPMU: Initialization failed. "
+               "Unknown CPU vendor %d\n", vendor);
+        vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    }
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, v, 1);
+
+        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
+}
+
+/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
+void vpmu_dump(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump )
+        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
+}
+
+/* Unload VPMU contexts */
+static void vpmu_unload_all(void)
+{
+    struct domain *d;
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+
+    for_each_domain( d )
+    {
+        for_each_vcpu ( d, v )
+        {
+            if ( v != current )
+                vcpu_pause(v);
+            vpmu = vcpu_vpmu(v);
+
+            if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            {
+                if ( v != current )
+                    vcpu_unpause(v);
+                continue;
+            }
+
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                             vpmu_save_force, (void *)v, 1);
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+            if ( v != current )
+                vcpu_unpause(v);
+        }
+    }
+}
+
+/* Process the softirq set by PMU NMI handler */
+static void pmu_softnmi(void)
+{
+    struct vcpu *v, *sampled = this_cpu(sampled_vcpu);
+
+    if ( sampled == NULL )
+        return;
+    this_cpu(sampled_vcpu) = NULL;
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+            v = choose_dom0_vcpu();
+            if ( !v )
+                return;
+    }
+    else
+    {
+        if ( is_hvm_domain(sampled->domain) )
+        {
+            vpmu_send_interrupt(sampled);
+            return;
+        }
+        v = sampled;
+    }
+
+    if ( has_hvm_container_domain(sampled->domain) )
+    {
+        struct segment_register seg_cs;
+
+        hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct page_info *page;
+    uint64_t gfn = params->d.val;
+    static bool_t __read_mostly pvpmu_initted = 0;
+    static DEFINE_SPINLOCK(init_lock);
+
+    if ( !params || params->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        return -EINVAL;
+    }
+
+    spin_lock(&init_lock);
+
+    if ( !pvpmu_initted )
+    {
+        if ( reserve_lapic_nmi() == 0 )
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            spin_unlock(&init_lock);
+            printk("Failed to reserve PMU NMI\n");
+            put_page(page);
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
+
+        pvpmu_initted = 1;
+    }
+
+    spin_unlock(&init_lock);
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( !params || params->vcpu >= d->max_vcpus )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if (v != current)
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page_and_type(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if ( v != current )
+        vcpu_unpause(v);
+}
+
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret = -EINVAL;
+    xen_pmu_params_t pmu_params;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( (pmu_params.d.val & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
+             ((pmu_params.d.val & XENPMU_MODE_ON) &&
+              (pmu_params.d.val & XENPMU_MODE_PRIV)) )
+            return -EINVAL;
+
+        vpmu_mode = pmu_params.d.val;
+
+        if ( (vpmu_mode == XENPMU_MODE_OFF) || (vpmu_mode & XENPMU_MODE_PRIV) )
+            /*
+             * After this VPMU context will never be loaded during context
+             * switch. Because PMU MSR accesses load VPMU context we don't
+             * allow them when VPMU is off and, for non-provileged domains,
+             * when we are in privileged mode. (We do want these accesses to
+             * load VPMU context for control domain in this mode)
+             */
+            vpmu_unload_all();
+
+        ret = 0;
+        break;
+
+    case XENPMU_mode_get:
+        pmu_params.d.val = vpmu_mode;
+        pmu_params.v.version.maj = XENPMU_VER_MAJ;
+        pmu_params.v.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_feature_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.d.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_features = pmu_params.d.val;
+
+        ret = 0;
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.d.val = vpmu_mode;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_lvtpc_set:
+        if ( current->arch.vpmu.xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        ret = 0;
+        break;
+    case XENPMU_flush:
+        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        vpmu_load(current);
+        ret = 0;
+        break;
+    }
+
+    return ret;
+}
diff --git a/xen/arch/x86/vpmu_amd.c b/xen/arch/x86/vpmu_amd.c
new file mode 100644
index 0000000..c4a6e97
--- /dev/null
+++ b/xen/arch/x86/vpmu_amd.c
@@ -0,0 +1,512 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2010, Advanced Micro Devices, Inc.
+ * Parts of this code are Copyright (c) 2007, Intel Corporation
+ *
+ * Author: Wei Wang <wei.wang2@amd.com>
+ * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/xenoprof.h>
+#include <xen/hvm/save.h>
+#include <xen/sched.h>
+#include <xen/irq.h>
+#include <asm/apic.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/vpmu.h>
+#include <public/pmu.h>
+
+#define MSR_F10H_EVNTSEL_GO_SHIFT   40
+#define MSR_F10H_EVNTSEL_EN_SHIFT   22
+#define MSR_F10H_COUNTER_LENGTH     48
+
+#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT))
+#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1))))
+
+static unsigned int __read_mostly num_counters;
+static const u32 __read_mostly *counters;
+static const u32 __read_mostly *ctrls;
+static bool_t __read_mostly k7_counters_mirrored;
+
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+#define AMD_MAX_COUNTERS    6
+
+/* PMU Counter MSRs. */
+static const u32 AMD_F10H_COUNTERS[] = {
+    MSR_K7_PERFCTR0,
+    MSR_K7_PERFCTR1,
+    MSR_K7_PERFCTR2,
+    MSR_K7_PERFCTR3
+};
+
+/* PMU Control MSRs. */
+static const u32 AMD_F10H_CTRLS[] = {
+    MSR_K7_EVNTSEL0,
+    MSR_K7_EVNTSEL1,
+    MSR_K7_EVNTSEL2,
+    MSR_K7_EVNTSEL3
+};
+
+static const u32 AMD_F15H_COUNTERS[] = {
+    MSR_AMD_FAM15H_PERFCTR0,
+    MSR_AMD_FAM15H_PERFCTR1,
+    MSR_AMD_FAM15H_PERFCTR2,
+    MSR_AMD_FAM15H_PERFCTR3,
+    MSR_AMD_FAM15H_PERFCTR4,
+    MSR_AMD_FAM15H_PERFCTR5
+};
+
+static const u32 AMD_F15H_CTRLS[] = {
+    MSR_AMD_FAM15H_EVNTSEL0,
+    MSR_AMD_FAM15H_EVNTSEL1,
+    MSR_AMD_FAM15H_EVNTSEL2,
+    MSR_AMD_FAM15H_EVNTSEL3,
+    MSR_AMD_FAM15H_EVNTSEL4,
+    MSR_AMD_FAM15H_EVNTSEL5
+};
+
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)    { (vpmu)->priv_context = (void *)-1L; }
+#define msr_bitmap_off(vpmu)   { (vpmu)->priv_context = NULL; }
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
+
+static inline int get_pmu_reg_type(u32 addr)
+{
+    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
+        return MSR_TYPE_CTRL;
+
+    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
+        return MSR_TYPE_COUNTER;
+
+    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
+         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
+    {
+        if (addr & 1)
+            return MSR_TYPE_COUNTER;
+        else
+            return MSR_TYPE_CTRL;
+    }
+
+    /* unsupported registers */
+    return -1;
+}
+
+static inline u32 get_fam15h_addr(u32 addr)
+{
+    switch ( addr )
+    {
+    case MSR_K7_PERFCTR0:
+        return MSR_AMD_FAM15H_PERFCTR0;
+    case MSR_K7_PERFCTR1:
+        return MSR_AMD_FAM15H_PERFCTR1;
+    case MSR_K7_PERFCTR2:
+        return MSR_AMD_FAM15H_PERFCTR2;
+    case MSR_K7_PERFCTR3:
+        return MSR_AMD_FAM15H_PERFCTR3;
+    case MSR_K7_EVNTSEL0:
+        return MSR_AMD_FAM15H_EVNTSEL0;
+    case MSR_K7_EVNTSEL1:
+        return MSR_AMD_FAM15H_EVNTSEL1;
+    case MSR_K7_EVNTSEL2:
+        return MSR_AMD_FAM15H_EVNTSEL2;
+    case MSR_K7_EVNTSEL3:
+        return MSR_AMD_FAM15H_EVNTSEL3;
+    default:
+        break;
+    }
+
+    return addr;
+}
+
+static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
+    }
+
+    msr_bitmap_on(vpmu);
+}
+
+static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
+    }
+
+    msr_bitmap_off(vpmu);
+}
+
+static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    return 1;
+}
+
+static inline void context_load(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
+    }
+}
+
+static void amd_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    vpmu_reset(vpmu, VPMU_FROZEN);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        unsigned int i;
+
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], ctrl_regs[i]);
+
+        return;
+    }
+
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+
+    context_load(v);
+}
+
+static inline void context_save(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+
+    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
+    for ( i = 0; i < num_counters; i++ )
+        rdmsrl(counters[i], counter_regs[i]);
+}
+
+/* Must be NMI-safe */
+static int amd_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    unsigned int i;
+
+    /*
+     * Stop the counters. If we came here via vpmu_save_force (i.e.
+     * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
+     */
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+    {
+        vpmu_set(vpmu, VPMU_FROZEN);
+
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], 0);
+
+        return 0;
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return 0;
+
+    context_save(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
+        amd_vpmu_unset_msr_bitmap(v);
+
+    return 1;
+}
+
+static void context_update(unsigned int msr, u64 msr_content)
+{
+    unsigned int i;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    if ( k7_counters_mirrored &&
+        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
+    {
+        msr = get_fam15h_addr(msr);
+    }
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+       if ( msr == ctrls[i] )
+       {
+           ctrl_regs[i] = msr_content;
+           return;
+       }
+        else if (msr == counters[i] )
+        {
+            counter_regs[i] = msr_content;
+            return;
+        }
+    }
+}
+
+static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    /* For all counters, enable guest only mode for HVM guest */
+    if ( has_hvm_container_domain(v->domain) &&
+         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+         !(is_guest_mode(msr_content)) )
+    {
+        set_guest_mode(msr_content);
+    }
+
+    /* check if the first counter is enabled */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            return 1;
+        vpmu_set(vpmu, VPMU_RUNNING);
+
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
+             amd_vpmu_set_msr_bitmap(v);
+    }
+
+    /* stop saving & restore if guest stops first counter */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        vpmu_reset(vpmu, VPMU_RUNNING);
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
+             amd_vpmu_unset_msr_bitmap(v);
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    /* Update vpmu context immediately */
+    context_update(msr, msr_content);
+
+    /* Write to hw counters */
+    wrmsrl(msr, msr_content);
+    return 1;
+}
+
+static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    rdmsrl(msr, *msr_content);
+
+    return 1;
+}
+
+static int amd_vpmu_initialise(struct vcpu *v)
+{
+    struct xen_pmu_amd_ctxt *ctxt;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( counters == NULL )
+    {
+         switch ( family )
+	 {
+	 case 0x15:
+	     num_counters = F15H_NUM_COUNTERS;
+	     counters = AMD_F15H_COUNTERS;
+	     ctrls = AMD_F15H_CTRLS;
+	     k7_counters_mirrored = 1;
+	     break;
+	 case 0x10:
+	 case 0x12:
+	 case 0x14:
+	 case 0x16:
+	 default:
+	     num_counters = F10H_NUM_COUNTERS;
+	     counters = AMD_F10H_COUNTERS;
+	     ctrls = AMD_F10H_CTRLS;
+	     k7_counters_mirrored = 0;
+	     break;
+	 }
+    }
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS +
+                             sizeof(uint64_t) * AMD_MAX_COUNTERS);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                     " PMU feature is unavailable on domain %d vcpu %d.\n",
+                     v->vcpu_id, v->domain->domain_id);
+            return -ENOMEM;
+        }
+    }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
+
+    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
+
+    vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    return 0;
+}
+
+static void amd_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( is_msr_bitmap_on(vpmu) )
+            amd_vpmu_unset_msr_bitmap(v);
+
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
+}
+
+/* VPMU part of the 'q' keyhandler */
+static void amd_vpmu_dump(const struct vcpu *v)
+{
+    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+    unsigned int i;
+
+    printk("    VPMU state: 0x%x ", vpmu->flags);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    {
+         printk("\n");
+         return;
+    }
+
+    printk("(");
+    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
+        printk("PASSIVE_DOMAIN_ALLOCATED, ");
+    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
+        printk("FROZEN, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+        printk("SAVE, ");
+    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
+        printk("RUNNING, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        printk("LOADED, ");
+    printk("ALLOCATED)\n");
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        uint64_t ctrl, cntr;
+
+        rdmsrl(ctrls[i], ctrl);
+        rdmsrl(counters[i], cntr);
+        printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
+    }
+}
+
+struct arch_vpmu_ops amd_vpmu_ops = {
+    .do_wrmsr = amd_vpmu_do_wrmsr,
+    .do_rdmsr = amd_vpmu_do_rdmsr,
+    .do_interrupt = amd_vpmu_do_interrupt,
+    .arch_vpmu_destroy = amd_vpmu_destroy,
+    .arch_vpmu_save = amd_vpmu_save,
+    .arch_vpmu_load = amd_vpmu_load,
+    .arch_vpmu_dump = amd_vpmu_dump
+};
+
+int svm_vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+    int ret = 0;
+
+    /* vpmu enabled? */
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return 0;
+
+    switch ( family )
+    {
+    case 0x10:
+    case 0x12:
+    case 0x14:
+    case 0x15:
+    case 0x16:
+        ret = amd_vpmu_initialise(v);
+        if ( !ret )
+            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
+        return ret;
+    }
+
+    printk("VPMU: Initialization failed. "
+           "AMD processor family %d has not "
+           "been supported\n", family);
+    return -EINVAL;
+}
+
diff --git a/xen/arch/x86/vpmu_intel.c b/xen/arch/x86/vpmu_intel.c
new file mode 100644
index 0000000..7b4064b
--- /dev/null
+++ b/xen/arch/x86/vpmu_intel.c
@@ -0,0 +1,954 @@
+/*
+ * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/irq.h>
+#include <asm/system.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/apic.h>
+#include <asm/traps.h>
+#include <asm/msr.h>
+#include <asm/msr-index.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <public/sched.h>
+#include <public/hvm/save.h>
+#include <public/pmu.h>
+#include <asm/vpmu.h>
+
+/*
+ * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
+ * instruction.
+ * cpuid 0xa - Architectural Performance Monitoring Leaf
+ * Register eax
+ */
+#define PMU_VERSION_SHIFT        0  /* Version ID */
+#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
+#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT)
+
+#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
+#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
+#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT)
+
+#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
+#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
+#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT)
+/* Register edx */
+#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
+#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
+#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT)
+
+#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
+#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
+#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT)
+
+/* Alias registers (0x4c1) for full-width writes to PMCs */
+#define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
+static bool_t __read_mostly full_width_write;
+
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
+/*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
+ * QUIRK to workaround an issue on various family 6 cpus.
+ * The issue leads to endless PMC interrupt loops on the processor.
+ * If the interrupt handler is running and a pmc reaches the value 0, this
+ * value remains forever and it triggers immediately a new interrupt after
+ * finishing the handler.
+ * A workaround is to read all flagged counters and if the value is 0 write
+ * 1 (or another value != 0) into it.
+ * There exist no errata and the real cause of this behaviour is unknown.
+ */
+bool_t __read_mostly is_pmc_quirk;
+
+static void check_pmc_quirk(void)
+{
+    if ( current_cpu_data.x86 == 6 )
+        is_pmc_quirk = 1;
+    else
+        is_pmc_quirk = 0;    
+}
+
+static void handle_pmc_quirk(u64 msr_content)
+{
+    int i;
+    u64 val;
+
+    if ( !is_pmc_quirk )
+        return;
+
+    val = msr_content;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+    val = msr_content >> 32;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+}
+
+/*
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ */
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
+
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
+}
+
+/*
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
+ */
+static int core2_get_fixed_pmc_count(void)
+{
+    u32 eax;
+
+    eax = cpuid_eax(0xa);
+    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
+}
+
+/* edx bits 5-12: Bit width of fixed-function performance counters  */
+static int core2_get_bitwidth_fix_count(void)
+{
+    u32 edx;
+
+    edx = cpuid_edx(0xa);
+    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
+}
+
+static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
+{
+    int i;
+    u32 msr_index_pmc;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
+        {
+            *type = MSR_TYPE_COUNTER;
+            *index = i;
+            return 1;
+        }
+    }
+
+    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
+        (msr_index == MSR_IA32_DS_AREA) ||
+        (msr_index == MSR_IA32_PEBS_ENABLE) )
+    {
+        *type = MSR_TYPE_CTRL;
+        return 1;
+    }
+
+    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
+         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
+         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
+    {
+        *type = MSR_TYPE_GLOBAL;
+        return 1;
+    }
+
+    msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
+    if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
+         (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
+    {
+        *type = MSR_TYPE_ARCH_COUNTER;
+        *index = msr_index_pmc - MSR_IA32_PERFCTR0;
+        return 1;
+    }
+
+    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
+         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
+    {
+        *type = MSR_TYPE_ARCH_CTRL;
+        *index = msr_index - MSR_P6_EVNTSEL0;
+        return 1;
+    }
+
+    return 0;
+}
+
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
+static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    /* Allow Read/Write PMU Counters MSR Directly. */
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+
+        if ( full_width_write )
+        {
+            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
+            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
+                      msr_bitmap + 0x800/BYTES_PER_LONG);
+        }
+    }
+
+    /* Allow Read PMU Non-global Controls Directly. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+
+        if ( full_width_write )
+        {
+            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
+            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
+                      msr_bitmap + 0x800/BYTES_PER_LONG);
+        }
+    }
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static inline void __core2_vpmu_save(struct vcpu *v)
+{
+    int i;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( !has_hvm_container_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
+}
+
+/* Must be NMI-safe */
+static int core2_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !has_hvm_container_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
+        return 0;
+
+    __core2_vpmu_save(v);
+
+    /* Unset PMU MSR bitmap to trap lazy load. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap)
+        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+    return 1;
+}
+
+static inline void __core2_vpmu_load(struct vcpu *v)
+{
+    unsigned int i, pmc_start;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
+
+    if ( full_width_write )
+        pmc_start = MSR_IA32_A_PERFCTR0;
+    else
+        pmc_start = MSR_IA32_PERFCTR0;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
+    }
+
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( !has_hvm_container_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
+}
+
+static void core2_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+
+    __core2_vpmu_load(v);
+}
+
+static int core2_vpmu_alloc_resource(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *p = NULL;
+
+    p = xzalloc_bytes(sizeof(uint64_t));
+    if ( !p )
+        goto out_err;
+
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
+            goto out_err_hvm;
+        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
+            goto out_err_hvm;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    }
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) *
+                                       arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err_hvm;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+    }
+
+    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+                                    sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = core2_vpmu_cxt;
+    vpmu->priv_context = p;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
+    return 1;
+
+out_err_hvm:
+    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
+    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
+    release_pmu_ownship(PMU_OWNER_HVM);
+
+    xfree(core2_vpmu_cxt);
+    xfree(p);
+
+out_err:
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
+}
+
+static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( !is_core2_vpmu_msr(msr_index, type, index) )
+        return 0;
+
+    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
+         !core2_vpmu_alloc_resource(current) )
+        return 0;
+
+    /* Do the lazy load staff. */
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        __core2_vpmu_load(current);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        if ( has_hvm_container_domain(current->domain) &&
+             cpu_has_vmx_msr_bitmap )
+            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
+    }
+    return 1;
+}
+
+static void inject_trap(struct vcpu *v, unsigned int trapno)
+{
+    if ( has_hvm_container_domain(v->domain) )
+        hvm_inject_hw_exception(trapno, 0);
+    else
+        send_guest_trap(v->domain, v->vcpu_id, trapno);
+}
+
+static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    int i, tmp;
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *enabled_cntrs;
+
+    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        /* Special handling for BTS */
+        if ( msr == MSR_IA32_DEBUGCTLMSR )
+        {
+            uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
+                                 IA32_DEBUGCTLMSR_BTINT;
+
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
+                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
+            if ( msr_content & supported )
+            {
+                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                    return 1;
+                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
+                inject_trap(v, TRAP_gp_fault);
+                return 0;
+            }
+        }
+        return 0;
+    }
+
+    core2_vpmu_cxt = vpmu->context;
+    enabled_cntrs = vpmu->priv_context;
+    switch ( msr )
+    {
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+        return 1;
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
+                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
+        inject_trap(v, TRAP_gp_fault);
+        return 1;
+    case MSR_IA32_PEBS_ENABLE:
+        if ( msr_content & 1 )
+            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
+                     "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
+        return 1;
+    case MSR_IA32_DS_AREA:
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            if ( !is_canonical_address(msr_content) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
+                         msr_content);
+                inject_trap(v, TRAP_gp_fault);
+                return 1;
+            }
+            core2_vpmu_cxt->ds_area = msr_content;
+            break;
+        }
+        gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
+        return 1;
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+        core2_vpmu_cxt->global_ctrl = msr_content;
+        break;
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+        if ( has_hvm_container_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                               &core2_vpmu_cxt->global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
+        if ( msr_content != 0 )
+        {
+            u64 val = msr_content;
+            for ( i = 0; i < fixed_pmc_cnt; i++ )
+            {
+                if ( val & 3 )
+                    *enabled_cntrs |= (1ULL << 32) << i;
+                val >>= FIXED_CTR_CTRL_BITS;
+            }
+        }
+
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
+        break;
+    default:
+        tmp = msr - MSR_P6_EVNTSEL0;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                                   &core2_vpmu_cxt->global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+
+            if ( msr_content & (1ULL << 22) )
+                *enabled_cntrs |= 1ULL << tmp;
+            else
+                *enabled_cntrs &= ~(1ULL << tmp);
+
+            xen_pmu_cntr_pair[tmp].control = msr_content;
+        }
+    }
+
+    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0) )
+        vpmu_set(vpmu, VPMU_RUNNING);
+    else
+        vpmu_reset(vpmu, VPMU_RUNNING);
+
+    if ( type != MSR_TYPE_GLOBAL )
+    {
+        u64 mask;
+        int inject_gp = 0;
+        switch ( type )
+        {
+        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
+            mask = ~((1ull << 32) - 1);
+            if ( msr_content & mask )
+                inject_gp = 1;
+            break;
+        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
+            if  ( msr == MSR_IA32_DS_AREA )
+                break;
+            /* 4 bits per counter, currently 3 fixed counters implemented. */
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
+            if ( msr_content & mask )
+                inject_gp = 1;
+            break;
+        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
+            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
+            if ( msr_content & mask )
+                inject_gp = 1;
+            break;
+        }
+        if ( inject_gp )
+            inject_trap(v, TRAP_gp_fault);
+        else
+            wrmsrl(msr, msr_content);
+    }
+    else
+    {
+       if ( has_hvm_container_domain(v->domain) )
+           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+       else
+           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
+
+    return 1;
+}
+
+static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+
+    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        core2_vpmu_cxt = vpmu->context;
+        switch ( msr )
+        {
+        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            *msr_content = 0;
+            break;
+        case MSR_CORE_PERF_GLOBAL_STATUS:
+            *msr_content = core2_vpmu_cxt->global_status;
+            break;
+        case MSR_CORE_PERF_GLOBAL_CTRL:
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
+            break;
+        default:
+            rdmsrl(msr, *msr_content);
+        }
+    }
+    else
+    {
+        /* Extension for BTS */
+        if ( msr == MSR_IA32_MISC_ENABLE )
+        {
+            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
+        }
+        else
+            return 0;
+    }
+
+    return 1;
+}
+
+static void core2_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    if (input == 0x1)
+    {
+        struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            /* Switch on the 'Debug Store' feature in CPUID.EAX[1]:EDX[21] */
+            *edx |= cpufeat_mask(X86_FEATURE_DS);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
+        }
+    }
+}
+
+/* Dump vpmu info on console, called in the context of keyhandler 'q'. */
+static void core2_vpmu_dump(const struct vcpu *v)
+{
+    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int i;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
+    u64 val;
+    uint64_t *fixed_counters;
+    struct xen_pmu_cntr_pair *cntr_pair;
+
+    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+         return;
+
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            printk("    vPMU loaded\n");
+        else
+            printk("    vPMU allocated\n");
+        return;
+    }
+
+    printk("    vPMU running\n");
+
+    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+
+    /* Print the contents of the counter and its configuration msr. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+            i, cntr_pair[i].counter, cntr_pair[i].control);
+
+    /*
+     * The configuration of the fixed counter is 4 bits each in the
+     * MSR_CORE_PERF_FIXED_CTR_CTRL.
+     */
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, fixed_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
+        val >>= FIXED_CTR_CTRL_BITS;
+    }
+}
+
+static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    u64 msr_content;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
+
+    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
+    if ( msr_content )
+    {
+        if ( is_pmc_quirk )
+            handle_pmc_quirk(msr_content);
+        core2_vpmu_cxt->global_status |= msr_content;
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+    }
+    else
+    {
+        /* No PMC overflow but perhaps a Trace Message interrupt. */
+        __vmread(GUEST_IA32_DEBUGCTL, &msr_content);
+        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
+            return 0;
+    }
+
+    return 1;
+}
+
+static int core2_vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    u64 msr_content;
+    struct cpuinfo_x86 *c = &current_cpu_data;
+
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
+        goto func_out;
+    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
+    if ( cpu_has(c, X86_FEATURE_DS) )
+    {
+        if ( !cpu_has(c, X86_FEATURE_DTES64) )
+        {
+            printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
+                   " - Debug Store disabled for %pv\n",
+                   v);
+            goto func_out;
+        }
+        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
+        {
+            /* If BTS_UNAVAIL is set reset the DS feature. */
+            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
+            printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
+                   " - Debug Store disabled for %pv\n",
+                   v);
+        }
+        else
+        {
+            vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
+            if ( !cpu_has(c, X86_FEATURE_DSCPL) )
+                printk(XENLOG_G_INFO
+                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
+            printk("******************************************************\n");
+            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
+            printk("** Using this processor feature in a virtualized    **\n");
+            printk("** environment is not 100%% safe.                    **\n");
+            printk("** Setting the DS buffer address with wrong values  **\n");
+            printk("** may lead to hypervisor hangs or crashes.         **\n");
+            printk("** It is NOT recommended for production use!        **\n");
+            printk("******************************************************\n");
+        }
+    }
+func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    check_pmc_quirk();
+
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
+        return 1;
+
+    return 0;
+}
+
+static void core2_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    xfree(vpmu->priv_context);
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
+}
+
+struct arch_vpmu_ops core2_vpmu_ops = {
+    .do_wrmsr = core2_vpmu_do_wrmsr,
+    .do_rdmsr = core2_vpmu_do_rdmsr,
+    .do_interrupt = core2_vpmu_do_interrupt,
+    .do_cpuid = core2_vpmu_do_cpuid,
+    .arch_vpmu_destroy = core2_vpmu_destroy,
+    .arch_vpmu_save = core2_vpmu_save,
+    .arch_vpmu_load = core2_vpmu_load,
+    .arch_vpmu_dump = core2_vpmu_dump
+};
+
+static void core2_no_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    /*
+     * As in this case the vpmu is not enabled reset some bits in the
+     * architectural performance monitoring related part.
+     */
+    if ( input == 0xa )
+    {
+        *eax &= ~PMU_VERSION_MASK;
+        *eax &= ~PMU_GENERAL_NR_MASK;
+        *eax &= ~PMU_GENERAL_WIDTH_MASK;
+
+        *edx &= ~PMU_FIXED_NR_MASK;
+        *edx &= ~PMU_FIXED_WIDTH_MASK;
+    }
+}
+
+/*
+ * If its a vpmu msr set it to 0.
+ */
+static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    if ( !is_core2_vpmu_msr(msr, &type, &index) )
+        return 0;
+    *msr_content = 0;
+    return 1;
+}
+
+/*
+ * These functions are used in case vpmu is not enabled.
+ */
+struct arch_vpmu_ops core2_no_vpmu_ops = {
+    .do_rdmsr = core2_no_vpmu_do_rdmsr,
+    .do_cpuid = core2_no_vpmu_do_cpuid,
+};
+
+int vmx_vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+    uint8_t cpu_model = current_cpu_data.x86_model;
+    int ret = 0;
+
+    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return 0;
+
+    if ( family == 6 )
+    {
+        u64 caps;
+
+        rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
+        full_width_write = (caps >> 13) & 1;
+
+        switch ( cpu_model )
+        {
+        /* Core2: */
+        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
+        case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
+        case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
+        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
+
+        case 0x2a: /* SandyBridge */
+        case 0x2d: /* SandyBridge, "Romley-EP" */
+
+        /* Nehalem: */
+        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
+        case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */
+        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
+
+        /* Westmere: */
+        case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */
+        case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */
+        case 0x27: /* 32 nm Westmere-EX */
+
+        case 0x3a: /* IvyBridge */
+        case 0x3e: /* IvyBridge EP */
+
+        /* Haswell: */
+        case 0x3c:
+        case 0x3f:
+        case 0x45:
+        case 0x46:
+
+        /* future: */
+        case 0x3d:
+        case 0x4e:
+            ret = core2_vpmu_initialise(v);
+            if ( !ret )
+                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
+            return ret;
+        }
+    }
+
+    printk("VPMU: Initialization failed. "
+           "Intel processor family %d model %d has not "
+           "been supported\n", family, cpu_model);
+    return -EINVAL;
+}
+
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index dd34b2c..a9823fa 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -20,7 +20,7 @@
 #define __ASM_X86_HVM_VMX_VMCS_H__
 
 #include <asm/hvm/io.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <irq_vectors.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
deleted file mode 100644
index 0cfde19..0000000
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ /dev/null
@@ -1,102 +0,0 @@
-/*
- * vpmu.h: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_H_
-#define __ASM_X86_HVM_VPMU_H_
-
-#include <public/pmu.h>
-
-#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
-#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
-
-#define MSR_TYPE_COUNTER            0
-#define MSR_TYPE_CTRL               1
-#define MSR_TYPE_GLOBAL             2
-#define MSR_TYPE_ARCH_COUNTER       3
-#define MSR_TYPE_ARCH_CTRL          4
-
-/* Start of PMU register bank */
-#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
-                                                 (uintptr_t)ctxt->offset))
-
-/* Arch specific operations shared by all vpmus */
-struct arch_vpmu_ops {
-    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
-    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-    int (*do_interrupt)(struct cpu_user_regs *regs);
-    void (*do_cpuid)(unsigned int input,
-                     unsigned int *eax, unsigned int *ebx,
-                     unsigned int *ecx, unsigned int *edx);
-    void (*arch_vpmu_destroy)(struct vcpu *v);
-    int (*arch_vpmu_save)(struct vcpu *v);
-    void (*arch_vpmu_load)(struct vcpu *v);
-    void (*arch_vpmu_dump)(const struct vcpu *);
-};
-
-int vmx_vpmu_initialise(struct vcpu *);
-int svm_vpmu_initialise(struct vcpu *);
-
-struct vpmu_struct {
-    u32 flags;
-    u32 last_pcpu;
-    u32 hw_lapic_lvtpc;
-    void *context;      /* May be shared with PV guest */
-    void *priv_context; /* hypervisor-only */
-    struct arch_vpmu_ops *arch_vpmu_ops;
-    xen_pmu_data_t *xenpmu_data;
-};
-
-/* VPMU states */
-#define VPMU_CONTEXT_ALLOCATED              0x1
-#define VPMU_CONTEXT_LOADED                 0x2
-#define VPMU_RUNNING                        0x4
-#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
-#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
-#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
-
-#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
-#define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
-#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
-
-#define VPMU_MSR_READ  0
-#define VPMU_MSR_WRITE 1
-
-void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw);
-int vpmu_do_interrupt(struct cpu_user_regs *regs);
-void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
-                                       unsigned int *ecx, unsigned int *edx);
-void vpmu_initialise(struct vcpu *v);
-void vpmu_destroy(struct vcpu *v);
-void vpmu_save(struct vcpu *v);
-void vpmu_load(struct vcpu *v);
-void vpmu_dump(struct vcpu *v);
-
-extern int acquire_pmu_ownership(int pmu_ownership);
-extern void release_pmu_ownership(int pmu_ownership);
-
-extern uint64_t vpmu_mode;
-extern uint64_t vpmu_features;
-
-#endif /* __ASM_X86_HVM_VPMU_H_*/
-
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
new file mode 100644
index 0000000..0cfde19
--- /dev/null
+++ b/xen/include/asm-x86/vpmu.h
@@ -0,0 +1,102 @@
+/*
+ * vpmu.h: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#ifndef __ASM_X86_HVM_VPMU_H_
+#define __ASM_X86_HVM_VPMU_H_
+
+#include <public/pmu.h>
+
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
+
+#define MSR_TYPE_COUNTER            0
+#define MSR_TYPE_CTRL               1
+#define MSR_TYPE_GLOBAL             2
+#define MSR_TYPE_ARCH_COUNTER       3
+#define MSR_TYPE_ARCH_CTRL          4
+
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
+
+/* Arch specific operations shared by all vpmus */
+struct arch_vpmu_ops {
+    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
+    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
+    int (*do_interrupt)(struct cpu_user_regs *regs);
+    void (*do_cpuid)(unsigned int input,
+                     unsigned int *eax, unsigned int *ebx,
+                     unsigned int *ecx, unsigned int *edx);
+    void (*arch_vpmu_destroy)(struct vcpu *v);
+    int (*arch_vpmu_save)(struct vcpu *v);
+    void (*arch_vpmu_load)(struct vcpu *v);
+    void (*arch_vpmu_dump)(const struct vcpu *);
+};
+
+int vmx_vpmu_initialise(struct vcpu *);
+int svm_vpmu_initialise(struct vcpu *);
+
+struct vpmu_struct {
+    u32 flags;
+    u32 last_pcpu;
+    u32 hw_lapic_lvtpc;
+    void *context;      /* May be shared with PV guest */
+    void *priv_context; /* hypervisor-only */
+    struct arch_vpmu_ops *arch_vpmu_ops;
+    xen_pmu_data_t *xenpmu_data;
+};
+
+/* VPMU states */
+#define VPMU_CONTEXT_ALLOCATED              0x1
+#define VPMU_CONTEXT_LOADED                 0x2
+#define VPMU_RUNNING                        0x4
+#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
+#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
+#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
+
+#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
+#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
+#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
+#define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
+#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
+
+#define VPMU_MSR_READ  0
+#define VPMU_MSR_WRITE 1
+
+void vpmu_lvtpc_update(uint32_t val);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, uint8_t rw);
+int vpmu_do_interrupt(struct cpu_user_regs *regs);
+void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
+                                       unsigned int *ecx, unsigned int *edx);
+void vpmu_initialise(struct vcpu *v);
+void vpmu_destroy(struct vcpu *v);
+void vpmu_save(struct vcpu *v);
+void vpmu_load(struct vcpu *v);
+void vpmu_dump(struct vcpu *v);
+
+extern int acquire_pmu_ownership(int pmu_ownership);
+extern void release_pmu_ownership(int pmu_ownership);
+
+extern uint64_t vpmu_mode;
+extern uint64_t vpmu_features;
+
+#endif /* __ASM_X86_HVM_VPMU_H_*/
+
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest
  2014-06-06 17:39 ` [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-06-06 17:57   ` Andrew Cooper
  2014-06-06 19:05     ` Boris Ostrovsky
  2014-06-10 11:31   ` Jan Beulich
  1 sibling, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 17:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:39, Boris Ostrovsky wrote:
> Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
> hypercall
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>  xen/arch/x86/platform_hypercall.c | 21 +++++++++++++++
>  xen/common/symbols.c              | 54 +++++++++++++++++++++++++++++++++++++++
>  xen/include/public/platform.h     | 19 ++++++++++++++
>  xen/include/xen/symbols.h         |  3 +++
>  xen/include/xlat.lst              |  1 +
>  5 files changed, 98 insertions(+)
>
> diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
> index 2162811..2c602e7 100644
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -23,6 +23,7 @@
>  #include <xen/cpu.h>
>  #include <xen/pmstat.h>
>  #include <xen/irq.h>
> +#include <xen/symbols.h>
>  #include <asm/current.h>
>  #include <public/platform.h>
>  #include <acpi/cpufreq/processor_perf.h>
> @@ -601,6 +602,26 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>      }
>      break;
>  
> +    case XENPF_get_symbol:
> +    {
> +        static char name[KSYM_NAME_LEN + 1];

There needs to be some concurrency protection surrounding the use of
this static.  The symbol mutext in xensyms_read() is not sufficient.

As it currently stands, two vcpus issuing this hypercall at the time
will collide and both will get junk back.

> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> index 053b9fa..afe301c 100644
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -527,6 +527,24 @@ struct xenpf_core_parking {
>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>  
> +#define XENPF_get_symbol   61
> +struct xenpf_symdata {
> +    /* IN variables */
> +    uint32_t namelen; /* size of 'name' buffer */
> +
> +    /* IN/OUT variables */
> +    uint32_t symnum;  /* IN:  Symbol to read                            */
> +                      /* OUT: Next available symbol. If same as IN then */
> +                      /*      we reached the end                        */
> +
> +    /* OUT variables */
> +    char type;
> +    uint64_t address;
> +    XEN_GUEST_HANDLE(char) name;

Please can we avoid introducing non-explicitly-width'd types into the
public ABI.

If you swap name and type, use XEN_GUEST_HANDLE_64() and promote type to
a uint32, it should be 32/64 bit safe and not need translating.

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded
  2014-06-06 17:39 ` [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded Boris Ostrovsky
@ 2014-06-06 17:59   ` Andrew Cooper
  2014-06-10 15:29     ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 17:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:39, Boris Ostrovsky wrote:
> Because a PMU interrupt may be generated as soon as PMU registers are loaded (or,
> more precisely, as soon as HW PMU is "armed") we don't want to delay marking
> context as LOADED until after registers are loaded. Otherwise during interrupt
> handling VPMU_CONTEXT_LOADED may not be set and this could be confusing.
>
> (Technically, only SVM needs this change right now since VMX will "arm" PMU later,
> during VMRUN when global control register is loaded from VMCS. However, both
> AMD and Intel code will require this patch when we introduce PV VPMU).
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
>  xen/arch/x86/hvm/svm/vpmu.c       | 2 ++
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 2 ++
>  xen/arch/x86/hvm/vpmu.c           | 3 +--
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 66a3815..3ac7d53 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -203,6 +203,8 @@ static void amd_vpmu_load(struct vcpu *v)
>          return;
>      }
>  
> +    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
> +
>      context_load(v);
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 3129ebd..ccd14d9 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -369,6 +369,8 @@ static void core2_vpmu_load(struct vcpu *v)
>      if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>          return;
>  
> +    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
> +
>      __core2_vpmu_load(v);
>  }
>  
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 21fbaba..63765fa 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -211,10 +211,9 @@ void vpmu_load(struct vcpu *v)
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
>      {
>          apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> +        /* Arch code needs to set VPMU_CONTEXT_LOADED */
>          vpmu->arch_vpmu_ops->arch_vpmu_load(v);
>      }
> -
> -    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
>  }
>  
>  void vpmu_initialise(struct vcpu *v)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  2014-06-06 17:39 ` [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
@ 2014-06-06 18:02   ` Andrew Cooper
  2014-06-06 19:07     ` Boris Ostrovsky
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 18:02 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:39, Boris Ostrovsky wrote:
> In preparation for making VPMU code shared with PV make sure that we we update
> MSR bitmaps only for PV guests

Don't you mean "only for HVM guests" ?

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient
  2014-06-06 17:40 ` [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient Boris Ostrovsky
@ 2014-06-06 18:13   ` Andrew Cooper
  2014-06-06 19:28     ` Boris Ostrovsky
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 18:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Update macros that modify VPMU flags to allow changing multiple bits at once.

Modify how?  It appears to only be an introduction of "vcpu_are_all_set()".

>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 5 +----
>  xen/arch/x86/hvm/vpmu.c           | 3 +--
>  xen/include/asm-x86/hvm/vpmu.h    | 9 +++++----
>  3 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index f5f249f..bf3cd33 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -326,10 +326,7 @@ static int core2_vpmu_save(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>  
> -    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
> -        return 0;
> -
> -    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) 
> +    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
>          return 0;
>  
>      __core2_vpmu_save(v);
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 63765fa..d456aa6 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -143,8 +143,7 @@ void vpmu_save(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      int pcpu = smp_processor_id();
>  
> -    if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) &&
> -           vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) )
> +    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
>         return;
>  
>      vpmu->last_pcpu = pcpu;
> diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
> index 40f63fb..7cd7266 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -81,10 +81,11 @@ struct vpmu_struct {
>  #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
>  
>  
> -#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
> -#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
> -#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
> -#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
> +#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
> +#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
> +#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
> +#define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
> +#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)

These macros' implicit types don't quite match their implementation.

set, reset and clear are implicitly void. (and frankly clear and reset
are confusing given the other bitopt nomenclature, but I don't suggest
changing them).

is_set and are_all_set are implicitly bool, but don't have !!'s


I realise I am straying vastly into personal preference here, but I feel
these would be much nicer as static inlines with proper types, and const
correctness for {is,are_all}_set().

~Andrew

>  
>  int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
>  int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code
  2014-06-06 17:40 ` [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
@ 2014-06-06 18:34   ` Andrew Cooper
  2014-06-06 19:43     ` Boris Ostrovsky
  2014-06-10  8:19     ` Jan Beulich
  0 siblings, 2 replies; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 18:34 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures with
> fields in core2_vpmu_context.
>
> Call core2_get_pmc_count() once, during initialization.
>
> Properly clean up when core2_vpmu_alloc_resource() fails and add routines
> to remove MSRs from VMCS.
>
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
>
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -125,75 +143,42 @@ static void handle_pmc_quirk(u64 msr_content)
>      }
>  }
>  
> -static const u32 core2_fix_counters_msr[] = {
> -    MSR_CORE_PERF_FIXED_CTR0,
> -    MSR_CORE_PERF_FIXED_CTR1,
> -    MSR_CORE_PERF_FIXED_CTR2
> -};
> -
>  /*
> - * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
> - * counters. 4 bits for every counter.
> + * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
>   */
> -#define FIXED_CTR_CTRL_BITS 4
> -#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
> -
> -/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */
> -#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
> -
> -/* Core 2 Non-architectual Performance Control MSRs. */
> -static const u32 core2_ctrls_msr[] = {
> -    MSR_CORE_PERF_FIXED_CTR_CTRL,
> -    MSR_IA32_PEBS_ENABLE,
> -    MSR_IA32_DS_AREA
> -};
> -
> -struct pmumsr {
> -    unsigned int num;
> -    const u32 *msr;
> -};
> -
> -static const struct pmumsr core2_fix_counters = {
> -    VPMU_CORE2_NUM_FIXED,
> -    core2_fix_counters_msr
> -};
> +static int core2_get_arch_pmc_count(void)
> +{
> +    u32 eax;
>  
> -static const struct pmumsr core2_ctrls = {
> -    VPMU_CORE2_NUM_CTRLS,
> -    core2_ctrls_msr
> -};
> -static int arch_pmc_cnt;
> +    eax = cpuid_eax(0xa);
> +    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
> +}

This (and later) can be made much simpler.  The style guidelines easily
permit:

static int core2_get_arch_pmc_count(void)
{
    return (cpuid_eax(0xa) & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
}

Unrelated to this code itself, I wonder whether Xen should gain some
mnemonics for cpuid leaves.

>  
>  /*
> - * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
> + * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
>   */
> -static int core2_get_pmc_count(void)
> +static int core2_get_fixed_pmc_count(void)
>  {
> -    u32 eax, ebx, ecx, edx;
> -
> -    if ( arch_pmc_cnt == 0 )
> -    {
> -        cpuid(0xa, &eax, &ebx, &ecx, &edx);
> -        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
> -    }
> +    u32 eax;
>  
> -    return arch_pmc_cnt;
> +    eax = cpuid_eax(0xa);
> +    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
>  }
>  
>  static u64 core2_calc_intial_glb_ctrl_msr(void)
>  {
> -    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
> -    u64 fix_pmc_bits  = (1 << 3) - 1;
> -    return ((fix_pmc_bits << 32) | arch_pmc_bits);
> +    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
> +    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;

Newline after variable declarations.

> +    return ( (fix_pmc_bits << 32) | arch_pmc_bits );

Redundant brackets.

>  }
>  
>  /* edx bits 5-12: Bit width of fixed-function performance counters  */
>  static int core2_get_bitwidth_fix_count(void)
>  {
> -    u32 eax, ebx, ecx, edx;
> +    u32 edx;
>  
> -    cpuid(0xa, &eax, &ebx, &ecx, &edx);
> -    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
> +    edx = cpuid_edx(0xa);
> +    return ( (edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT );
>  }
>  
>  static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
> @@ -201,9 +186,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
>      int i;
>      u32 msr_index_pmc;
>  
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
> -        if ( core2_fix_counters.msr[i] == msr_index )
> +        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
>          {
>              *type = MSR_TYPE_COUNTER;
>              *index = i;
> @@ -211,14 +196,12 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
>          }
>      }
>  
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> +    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||

Stray space before the closing bracket

> +        (msr_index == MSR_IA32_DS_AREA) ||
> +        (msr_index == MSR_IA32_PEBS_ENABLE) )
>      {
> -        if ( core2_ctrls.msr[i] == msr_index )
> -        {
> -            *type = MSR_TYPE_CTRL;
> -            *index = i;
> -            return 1;
> -        }
> +        *type = MSR_TYPE_CTRL;
> +        return 1;
>      }
>  
>      if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
> @@ -275,26 +258,28 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>      }
>  
>      /* Allow Read PMU Non-global Controls Directly. */
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> -        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> -        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
> +         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
> +
> +    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
> +    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
> +    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);

This code looks as if "clear_msr_in_map(MSR_$FOO, msr_bitmap)" would be
a useful helper.

> @@ -801,18 +759,27 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>          }
>      }
>  func_out:
> +
> +    arch_pmc_cnt = core2_get_arch_pmc_count();
> +    fixed_pmc_cnt = core2_get_fixed_pmc_count();
> +    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
> +    {
> +        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
> +        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
> +               fixed_pmc_cnt);
> +    }
>      check_pmc_quirk();
> +
>      return 0;
>  }
>  
>  static void core2_vpmu_destroy(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
>  
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
> -    xfree(core2_vpmu_cxt->pmu_enable);
> +
>      xfree(vpmu->context);

Is it really right to bail if !VPMU_CONTEXT_ALLOCATED ?  A stray
vpmu_clear() would result in a memory leak.

i.e. can VPMU_CONTEXT_ALLOCATED be deemed from "vpmu->context != NULL"
instead?

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest
  2014-06-06 17:57   ` Andrew Cooper
@ 2014-06-06 19:05     ` Boris Ostrovsky
  2014-06-06 19:13       ` Andrew Cooper
  0 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 19:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 01:57 PM, Andrew Cooper wrote:
> On 06/06/14 18:39, Boris Ostrovsky wrote:
>> Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
>> hypercall
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> ---
>>   xen/arch/x86/platform_hypercall.c | 21 +++++++++++++++
>>   xen/common/symbols.c              | 54 +++++++++++++++++++++++++++++++++++++++
>>   xen/include/public/platform.h     | 19 ++++++++++++++
>>   xen/include/xen/symbols.h         |  3 +++
>>   xen/include/xlat.lst              |  1 +
>>   5 files changed, 98 insertions(+)
>>
>> diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
>> index 2162811..2c602e7 100644
>> --- a/xen/arch/x86/platform_hypercall.c
>> +++ b/xen/arch/x86/platform_hypercall.c
>> @@ -23,6 +23,7 @@
>>   #include <xen/cpu.h>
>>   #include <xen/pmstat.h>
>>   #include <xen/irq.h>
>> +#include <xen/symbols.h>
>>   #include <asm/current.h>
>>   #include <public/platform.h>
>>   #include <acpi/cpufreq/processor_perf.h>
>> @@ -601,6 +602,26 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>       }
>>       break;
>>   
>> +    case XENPF_get_symbol:
>> +    {
>> +        static char name[KSYM_NAME_LEN + 1];
> There needs to be some concurrency protection surrounding the use of
> this static.  The symbol mutext in xensyms_read() is not sufficient.
>
> As it currently stands, two vcpus issuing this hypercall at the time
> will collide and both will get junk back.


We should be protected by xenpf_lock here.


>
>> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
>> index 053b9fa..afe301c 100644
>> --- a/xen/include/public/platform.h
>> +++ b/xen/include/public/platform.h
>> @@ -527,6 +527,24 @@ struct xenpf_core_parking {
>>   typedef struct xenpf_core_parking xenpf_core_parking_t;
>>   DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>>   
>> +#define XENPF_get_symbol   61
>> +struct xenpf_symdata {
>> +    /* IN variables */
>> +    uint32_t namelen; /* size of 'name' buffer */
>> +
>> +    /* IN/OUT variables */
>> +    uint32_t symnum;  /* IN:  Symbol to read                            */
>> +                      /* OUT: Next available symbol. If same as IN then */
>> +                      /*      we reached the end                        */
>> +
>> +    /* OUT variables */
>> +    char type;
>> +    uint64_t address;
>> +    XEN_GUEST_HANDLE(char) name;
> Please can we avoid introducing non-explicitly-width'd types into the
> public ABI.
>
> If you swap name and type, use XEN_GUEST_HANDLE_64() and promote type to
> a uint32, it should be 32/64 bit safe and not need translating.

I thought we are going via compat_platform_op for 32-bit guests and thus 
pass pack(4)'d structures so translating is not needed. Is that not true?

(I did test this on 32-bit dom0 and it worked fine, although that alone 
is probably not a good argument).

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  2014-06-06 18:02   ` Andrew Cooper
@ 2014-06-06 19:07     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 19:07 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 02:02 PM, Andrew Cooper wrote:
> On 06/06/14 18:39, Boris Ostrovsky wrote:
>> In preparation for making VPMU code shared with PV make sure that we we update
>> MSR bitmaps only for PV guests
> Don't you mean "only for HVM guests" ?

Right. ("only for for HVM/PVH guests")

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest
  2014-06-06 19:05     ` Boris Ostrovsky
@ 2014-06-06 19:13       ` Andrew Cooper
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 19:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, tim, dietmar.hahn, xen-devel,
	jun.nakajima, suravee.suthikulpanit

On 06/06/14 20:05, Boris Ostrovsky wrote:
> On 06/06/2014 01:57 PM, Andrew Cooper wrote:
>> On 06/06/14 18:39, Boris Ostrovsky wrote:
>>> Export Xen's symbols as {<address><type><name>} triplet via new
>>> XENPF_get_symbol
>>> hypercall
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>>> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>>> ---
>>>   xen/arch/x86/platform_hypercall.c | 21 +++++++++++++++
>>>   xen/common/symbols.c              | 54
>>> +++++++++++++++++++++++++++++++++++++++
>>>   xen/include/public/platform.h     | 19 ++++++++++++++
>>>   xen/include/xen/symbols.h         |  3 +++
>>>   xen/include/xlat.lst              |  1 +
>>>   5 files changed, 98 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/platform_hypercall.c
>>> b/xen/arch/x86/platform_hypercall.c
>>> index 2162811..2c602e7 100644
>>> --- a/xen/arch/x86/platform_hypercall.c
>>> +++ b/xen/arch/x86/platform_hypercall.c
>>> @@ -23,6 +23,7 @@
>>>   #include <xen/cpu.h>
>>>   #include <xen/pmstat.h>
>>>   #include <xen/irq.h>
>>> +#include <xen/symbols.h>
>>>   #include <asm/current.h>
>>>   #include <public/platform.h>
>>>   #include <acpi/cpufreq/processor_perf.h>
>>> @@ -601,6 +602,26 @@ ret_t
>>> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>>       }
>>>       break;
>>>   +    case XENPF_get_symbol:
>>> +    {
>>> +        static char name[KSYM_NAME_LEN + 1];
>> There needs to be some concurrency protection surrounding the use of
>> this static.  The symbol mutext in xensyms_read() is not sufficient.
>>
>> As it currently stands, two vcpus issuing this hypercall at the time
>> will collide and both will get junk back.
>
>
> We should be protected by xenpf_lock here.

So we are.  Perhaps worth leaving a small comment indicating as such?

>
>
>>
>>> diff --git a/xen/include/public/platform.h
>>> b/xen/include/public/platform.h
>>> index 053b9fa..afe301c 100644
>>> --- a/xen/include/public/platform.h
>>> +++ b/xen/include/public/platform.h
>>> @@ -527,6 +527,24 @@ struct xenpf_core_parking {
>>>   typedef struct xenpf_core_parking xenpf_core_parking_t;
>>>   DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>>>   +#define XENPF_get_symbol   61
>>> +struct xenpf_symdata {
>>> +    /* IN variables */
>>> +    uint32_t namelen; /* size of 'name' buffer */
>>> +
>>> +    /* IN/OUT variables */
>>> +    uint32_t symnum;  /* IN:  Symbol to
>>> read                            */
>>> +                      /* OUT: Next available symbol. If same as IN
>>> then */
>>> +                      /*      we reached the
>>> end                        */
>>> +
>>> +    /* OUT variables */
>>> +    char type;
>>> +    uint64_t address;
>>> +    XEN_GUEST_HANDLE(char) name;
>> Please can we avoid introducing non-explicitly-width'd types into the
>> public ABI.
>>
>> If you swap name and type, use XEN_GUEST_HANDLE_64() and promote type to
>> a uint32, it should be 32/64 bit safe and not need translating.
>
> I thought we are going via compat_platform_op for 32-bit guests and
> thus pass pack(4)'d structures so translating is not needed. Is that
> not true?
>
> (I did test this on 32-bit dom0 and it worked fine, although that
> alone is probably not a good argument).
>
> -boris

Erm, pass.  Each time I try to get my head around the compat interface,
a new supprise pops up.  platform_op appears to be different to other
hypercall handlers I am more used to.

If it works, then probably best not to play with it.

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient
  2014-06-06 18:13   ` Andrew Cooper
@ 2014-06-06 19:28     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 19:28 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 02:13 PM, Andrew Cooper wrote:
> On 06/06/14 18:40, Boris Ostrovsky wrote:
>> Update macros that modify VPMU flags to allow changing multiple bits at once.
> Modify how?  It appears to only be an introduction of "vcpu_are_all_set()".

I meant "vpmu_is_set(foo) + vpmu_is_set(bar)" ==> vpmu_are_all_set(foo|bar).

So probably "Introduce macro that allows testing multiple bits at one".

>
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -81,10 +81,11 @@ struct vpmu_struct {
>   #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
>   
>   
> -#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
> -#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
> -#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
> -#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
> +#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
> +#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
> +#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
> +#define vpmu_are_all_set(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x))
> +#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
> These macros' implicit types don't quite match their implementation.
>
> set, reset and clear are implicitly void. (and frankly clear and reset
> are confusing given the other bitopt nomenclature, but I don't suggest
> changing them).
>
> is_set and are_all_set are implicitly bool, but don't have !!'s
>
>
> I realise I am straying vastly into personal preference here, but I feel
> these would be much nicer as static inlines with proper types, and const
> correctness for {is,are_all}_set().

I don't have a preference here. If there is no difference binary-wise 
(code size) between the two I can change these to inlines.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h
  2014-06-06 17:40 ` [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-06-06 19:32   ` Andrew Cooper
  2014-06-11 10:03   ` Jan Beulich
  1 sibling, 0 replies; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 19:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Add pmu.h header files, move various macros and structures that will be
> shared between hypervisor and PV guests to it.
>
> Move MSR banks out of architectural PMU structures to allow for larger sizes
> in the future. The banks are allocated immediately after the context and
> PMU structures store offsets to them.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>  xen/arch/x86/hvm/svm/vpmu.c              |  81 +++++++++++----------
>  xen/arch/x86/hvm/vmx/vpmu_core2.c        | 117 +++++++++++++++++--------------
>  xen/arch/x86/hvm/vpmu.c                  |   5 ++
>  xen/arch/x86/oprofile/op_model_ppro.c    |   6 +-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 ---------
>  xen/include/asm-x86/hvm/vpmu.h           |  16 ++---
>  xen/include/public/arch-arm.h            |   3 +
>  xen/include/public/arch-x86/pmu.h        |  58 +++++++++++++++
>  xen/include/public/pmu.h                 |  38 ++++++++++
>  9 files changed, 224 insertions(+), 132 deletions(-)
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  create mode 100644 xen/include/public/arch-x86/pmu.h
>  create mode 100644 xen/include/public/pmu.h
>
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -382,7 +382,9 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
>  
> -    ctxt = xzalloc(struct amd_vpmu_context);
> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + 
> +			 sizeof(uint64_t) * AMD_MAX_COUNTERS + 
> +			 sizeof(uint64_t) * AMD_MAX_COUNTERS);

Two lots of counters?  If so, then * 2 would be better.

>      if ( !ctxt )
>      {
>          gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> @@ -391,7 +393,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>          return -ENOMEM;
>      }
>  
> +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
> +
>      vpmu->context = ctxt;
> +    vpmu->priv_context = NULL;
>      vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
>      return 0;
>  }
> @@ -403,8 +409,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
>  
> -    if ( has_hvm_container_domain(v->domain) &&
> -         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> +    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
>          amd_vpmu_unset_msr_bitmap(v);
>  
>      xfree(vpmu->context);
> @@ -421,7 +426,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
>  static void amd_vpmu_dump(const struct vcpu *v)
>  {
>      const struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    const struct amd_vpmu_context *ctxt = vpmu->context;
> +    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
> +    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
> +    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);

const for both of these.  The pointer arithmatic in vpmu_reg_pointer()
hides what would otherwise should be a compiler error.

> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 4ddac7d..56b48be 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -31,6 +31,7 @@
>  #include <asm/hvm/svm/svm.h>
>  #include <asm/hvm/svm/vmcb.h>
>  #include <asm/apic.h>
> +#include <public/pmu.h>
>  
>  /*
>   * "vpmu" :     vpmu generally enabled
> @@ -228,6 +229,10 @@ void vpmu_initialise(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t vendor = current_cpu_data.x86_vendor;
>  
> +    BUILD_BUG_ON((sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ) ||
> +                 (sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ));
> +

Can this be two BUILD_BUG_ON()s so the compiler tells you which
structure is too big?

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code
  2014-06-06 18:34   ` Andrew Cooper
@ 2014-06-06 19:43     ` Boris Ostrovsky
  2014-06-10  8:19     ` Jan Beulich
  1 sibling, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 19:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 02:34 PM, Andrew Cooper wrote:
>
> +static int core2_get_arch_pmc_count(void)
> +{
> +    u32 eax;
>   
> -static const struct pmumsr core2_ctrls = {
> -    VPMU_CORE2_NUM_CTRLS,
> -    core2_ctrls_msr
> -};
> -static int arch_pmc_cnt;
> +    eax = cpuid_eax(0xa);
> +    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
> +}
> This (and later) can be made much simpler.  The style guidelines easily
> permit:
>
> static int core2_get_arch_pmc_count(void)
> {
>      return (cpuid_eax(0xa) & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
> }
>
> Unrelated to this code itself, I wonder whether Xen should gain some
> mnemonics for cpuid leaves.

Not sure what you mean here.

>   
>   static void core2_vpmu_destroy(struct vcpu *v)
>   {
>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
>   
>       if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>           return;
> -    xfree(core2_vpmu_cxt->pmu_enable);
> +
>       xfree(vpmu->context);
> Is it really right to bail if !VPMU_CONTEXT_ALLOCATED ?  A stray
> vpmu_clear() would result in a memory leak.
>
> i.e. can VPMU_CONTEXT_ALLOCATED be deemed from "vpmu->context != NULL"
> instead?

I think so.

But the flag is used throughout VPMU code to indicate availability of 
context so for consistency I think we should continue using it here. And 
I don't want to drop it as a flag completely since checking for it is a 
tiny bit faster than for NULL pointer: we often check multiple flags at 
once.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags
  2014-06-06 17:40 ` [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-06-06 19:58   ` Andrew Cooper
  2014-06-06 20:42     ` Boris Ostrovsky
  2014-06-11 10:14   ` Jan Beulich
  1 sibling, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 19:58 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * PMU off
> * PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0
>   profiles itself and the hypervisor.
> * dom0-only PMU: dom0 collects samples for both itself and guests.
>
> For feature flags only Intel's BTS is currently supported.
>
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>
> @@ -275,3 +283,100 @@ void vpmu_dump(struct vcpu *v)
>          vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
>  }
>  
> +/* Unload VPMU contexts */
> +static void vpmu_unload_all(void)
> +{
> +    struct domain *d;
> +    struct vcpu *v;
> +    struct vpmu_struct *vpmu;
> +
> +    for_each_domain( d )
> +    {
> +        for_each_vcpu ( d, v )
> +        {
> +            if ( v != current )
> +                vcpu_pause(v);
> +            vpmu = vcpu_vpmu(v);
> +
> +            if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
> +            {
> +                if ( v != current )
> +                    vcpu_unpause(v);
> +                continue;
> +            }
> +
> +            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> +            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
> +                             vpmu_save_force, (void *)v, 1);
> +            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> +
> +            if ( v != current )
> +                vcpu_unpause(v);
> +        }
> +    }
> +}

This is a very long function without any preemption.

To reduce its impact greatly, would it be better to have a global mode
controlling vcpu context switching?  You could enable
"VCPU_GLOBAL_DISABLING", then IPI all cpus at the same time to save
their vpmu context.  Once the IPI is complete, all vpmu context is clean.

This avoids a double loop and all the vcpu_pause/unpause()s.

> +
> +
> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> +{
> +    int ret = -EINVAL;
> +    xen_pmu_params_t pmu_params;

This is probably going to want to be XSM'd up for permissions?

>
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index d237c25..d41049f 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -13,6 +13,54 @@
>  #define XENPMU_VER_MAJ    0
>  #define XENPMU_VER_MIN    1
>  
> +/*
> + * ` enum neg_errnoval
> + * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
> + *
> + * @cmd  == XENPMU_* (PMU operation)
> + * @args == struct xenpmu_params
> + */
> +/* ` enum xenpmu_op { */
> +#define XENPMU_mode_get        0 /* Also used for getting PMU version */
> +#define XENPMU_mode_set        1
> +#define XENPMU_feature_get     2
> +#define XENPMU_feature_set     3
> +/* ` } */
> +
> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
> +struct xen_pmu_params {
> +    /* IN/OUT parameters */
> +    union {
> +        struct version {
> +            uint8_t maj;
> +            uint8_t min;
> +        } version;
> +        uint64_t pad;

introducing "struct version" into the namespace is very antisocial.

> +    } v;
> +    union {
> +        uint64_t val;
> +        XEN_GUEST_HANDLE(void) valp;
> +    } d;
> +
> +    /* IN parameters */
> +    uint64_t vcpu;
> +};

Bits are cheep these days.

struct xen_pmu_params {
    struct { uint32_t maj, min; } version;
    XEN_GUEST_HANDLE_64(void) valp;
    uint64_t vcpu;
};

Comes with far less padding/unions and is still 32/64bit safe.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests
  2014-06-06 17:40 ` [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
@ 2014-06-06 20:13   ` Andrew Cooper
  2014-06-06 20:49     ` Boris Ostrovsky
  2014-06-11 10:20   ` Jan Beulich
  1 sibling, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 20:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Code for initializing/tearing down PMU for PV guests

PVH guests judging by the content?

>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>  xen/arch/x86/hvm/hvm.c            |  3 +-
>  xen/arch/x86/hvm/svm/svm.c        |  8 +++-
>  xen/arch/x86/hvm/svm/vpmu.c       | 37 +++++++++++-------
>  xen/arch/x86/hvm/vmx/vmx.c        |  9 ++++-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 67 ++++++++++++++++++++++----------
>  xen/arch/x86/hvm/vpmu.c           | 82 ++++++++++++++++++++++++++++++++++++++-
>  xen/common/event_channel.c        |  1 +
>  xen/include/asm-x86/hvm/vpmu.h    |  1 +
>  xen/include/public/pmu.h          |  2 +
>  xen/include/public/xen.h          |  1 +
>  10 files changed, 169 insertions(+), 42 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d2190be..fab54ff 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4031,7 +4031,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
>      [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
>      HYPERCALL(hvm_op),
>      HYPERCALL(sysctl),
> -    HYPERCALL(domctl)
> +    HYPERCALL(domctl),
> +    HYPERCALL(xenpmu_op)
>  };
>  
>  int hvm_do_hypercall(struct cpu_user_regs *regs)
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 870e4ee..7358626 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1150,7 +1150,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
>          return rc;
>      }
>  
> -    vpmu_initialise(v);
> +    /* PVH's VPMU is initialized via hypercall */
> +    if ( is_hvm_domain(v->domain) )
> +        vpmu_initialise(v);
>  

With the wish to merge HVM and PVH, is this wise?  What does
hypercall-based setup gain us over MSR based setup, given that we have
to provide the MSR access to HVM guests anyway?

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on PV guests
  2014-06-06 17:40 ` [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
@ 2014-06-06 20:26   ` Andrew Cooper
  2014-06-06 20:53     ` Boris Ostrovsky
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 20:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Intercept accesses to PMU MSRs and process them in VPMU module.
>
> Dump VPMU state for all domains (HVM and PV) when requested.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>  xen/arch/x86/domain.c             |  3 +-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 69 ++++++++++++++++++++++++++++++---------
>  xen/arch/x86/hvm/vpmu.c           |  7 ++++
>  xen/arch/x86/traps.c              | 48 +++++++++++++++++++++++++--
>  xen/include/public/pmu.h          |  1 +
>  5 files changed, 108 insertions(+), 20 deletions(-)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index a3ac1e2..bb759dd 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2012,8 +2012,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
>  {
>      paging_dump_vcpu_info(v);
>  
> -    if ( is_hvm_vcpu(v) )
> -        vpmu_dump(v);
> +    vpmu_dump(v);
>  }
>  
>  void domain_cpuid(
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -440,9 +454,16 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
>      return 1;
>  }
>  
> +static void inject_trap(struct vcpu *v, unsigned int trapno)
> +{
> +    if ( has_hvm_container_domain(v->domain) )
> +        hvm_inject_hw_exception(trapno, 0);
> +    else
> +        send_guest_trap(v->domain, v->vcpu_id, trapno);
> +}
> +

send_guest_trap() cant be used to send anything other than an nmi or
mce, but you appear to be using it to try and send #PFs ?

>  static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>  {
> -    u64 global_ctrl;
>      int i, tmp;
>      int type = -1, index = -1;
>      struct vcpu *v = current;
> @@ -466,7 +487,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>                  if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>                      return 1;
>                  gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +                inject_trap(v, TRAP_gp_fault);
>                  return 0;
>              }
>          }
> @@ -479,11 +500,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>      {
>      case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
>          core2_vpmu_cxt->global_status &= ~msr_content;
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
>          return 1;
>      case MSR_CORE_PERF_GLOBAL_STATUS:
>          gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
>                   "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
> -        hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        inject_trap(v, TRAP_gp_fault);
>          return 1;
>      case MSR_IA32_PEBS_ENABLE:
>          if ( msr_content & 1 )
> @@ -499,7 +521,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>                  gdprintk(XENLOG_WARNING,
>                           "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
>                           msr_content);
> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +                inject_trap(v, TRAP_gp_fault);
>                  return 1;
>              }
>              core2_vpmu_cxt->ds_area = msr_content;
> @@ -508,10 +530,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>          gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
>          return 1;
>      case MSR_CORE_PERF_GLOBAL_CTRL:
> -        global_ctrl = msr_content;
> +        core2_vpmu_cxt->global_ctrl = msr_content;
>          break;
>      case MSR_CORE_PERF_FIXED_CTR_CTRL:
> -        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
> +        if ( has_hvm_container_domain(v->domain) )
> +            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> +                               &core2_vpmu_cxt->global_ctrl);
> +        else
> +            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
>          *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
>          if ( msr_content != 0 )
>          {
> @@ -533,7 +559,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>              struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
>                  vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
>  
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
> +            if ( has_hvm_container_domain(v->domain) )
> +                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> +                                   &core2_vpmu_cxt->global_ctrl);
> +            else
> +                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
>  
>              if ( msr_content & (1ULL << 22) )
>                  *enabled_cntrs |= 1ULL << tmp;
> @@ -544,7 +574,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>          }
>      }
>  
> -    if ((global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0)  )
> +    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
> +         (core2_vpmu_cxt->ds_area != 0) )
>          vpmu_set(vpmu, VPMU_RUNNING);
>      else
>          vpmu_reset(vpmu, VPMU_RUNNING);
> @@ -557,7 +588,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>          {
>          case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
>              mask = ~((1ull << 32) - 1);
> -            if (msr_content & mask)
> +            if ( msr_content & mask )

It would be helpful not to mix large functional changes and code
cleanup.  While I appreciate the cleanup, could it be moved to a
different patch?  The same comment applies to several other patches as well.

~Andrew

>                  inject_gp = 1;
>              break;
>          case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
> @@ -565,22 +596,27 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>                  break;
>              /* 4 bits per counter, currently 3 fixed counters implemented. */
>              mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
> -            if (msr_content & mask)
> +            if ( msr_content & mask )
>                  inject_gp = 1;
>              break;
>          case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
>              mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
> -            if (msr_content & mask)
> +            if ( msr_content & mask )
>                  inject_gp = 1;
>              break;
>          }
> -        if (inject_gp)
> -            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        if ( inject_gp )
> +            inject_trap(v, TRAP_gp_fault);
>          else
>              wrmsrl(msr, msr_content);
>      }
>      else
> -        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +    {
> +       if ( has_hvm_container_domain(v->domain) )
> +           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +       else
> +           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +    }
>  
>      return 1;
>  }
> @@ -604,7 +640,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>              *msr_content = core2_vpmu_cxt->global_status;
>              break;
>          case MSR_CORE_PERF_GLOBAL_CTRL:
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +            if ( has_hvm_container_domain(v->domain) )
> +                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +            else
> +                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
>              break;
>          default:
>              rdmsrl(msr, *msr_content);
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index bceffa6..ca0534b 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -456,6 +456,13 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>              return -EFAULT;
>          pvpmu_finish(current->domain, &pmu_params);
>          break;
> +
> +    case XENPMU_lvtpc_set:
> +        if ( current->arch.vpmu.xenpmu_data == NULL )
> +            return -EINVAL;
> +        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
> +        ret = 0;
> +        break;
>      }
>  
>      return ret;
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 136821f..57d06c9 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -72,6 +72,7 @@
>  #include <asm/apic.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/hpet.h>
> +#include <asm/hvm/vpmu.h>
>  #include <public/arch-x86/cpuid.h>
>  #include <xsm/xsm.h>
>  
> @@ -877,8 +878,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>          __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
>          break;
>  
> +    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
> +        break;
> +
>      case 0x00000005: /* MONITOR/MWAIT */
> -    case 0x0000000a: /* Architectural Performance Monitor Features */
>      case 0x0000000b: /* Extended Topology Enumeration */
>      case 0x8000000a: /* SVM revision and features */
>      case 0x8000001b: /* Instruction Based Sampling */
> @@ -894,6 +897,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
>      }
>  
>   out:
> +    /* VPMU may decide to modify some of the leaves */
> +    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
> +
>      regs->eax = a;
>      regs->ebx = b;
>      regs->ecx = c;
> @@ -1921,6 +1927,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>      char io_emul_stub[32];
>      void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
>      uint64_t val, msr_content;
> +    bool_t vpmu_msr;
>  
>      if ( !read_descriptor(regs->cs, v, regs,
>                            &code_base, &code_limit, &ar,
> @@ -2411,6 +2418,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          uint32_t eax = regs->eax;
>          uint32_t edx = regs->edx;
>          msr_content = ((uint64_t)edx << 32) | eax;
> +        vpmu_msr = 0;
>          switch ( (u32)regs->ecx )
>          {
>          case MSR_FS_BASE:
> @@ -2547,7 +2555,19 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>              if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
>                  wrmsrl(regs->_ecx, msr_content);
>              break;
> -
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +                vpmu_msr = 1;
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
> +            {
> +                if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
> +                    goto invalid;
> +                break;
> +            }
>          default:
>              if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
>                  break;
> @@ -2579,6 +2599,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          break;
>  
>      case 0x32: /* RDMSR */
> +        vpmu_msr = 0;
> +
>          switch ( (u32)regs->ecx )
>          {
>          case MSR_FS_BASE:
> @@ -2649,7 +2671,27 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>                              [regs->_ecx - MSR_AMD64_DR1_ADDRESS_MASK + 1];
>              regs->edx = 0;
>              break;
> -
> +        case MSR_IA32_PERF_CAPABILITIES:
> +            /* No extra capabilities are supported */
> +            regs->eax = regs->edx = 0;
> +            break;
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +                vpmu_msr = 1;
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( vpmu_msr || (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) )
> +            {
> +                if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
> +                {
> +                    regs->eax = (uint32_t)msr_content;
> +                    regs->edx = (uint32_t)(msr_content >> 32);
> +                    break;
> +                }
> +                goto rdmsr_normal;
> +            }
>          default:
>              if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
>              {
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index 6da3596..7341de9 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -27,6 +27,7 @@
>  #define XENPMU_feature_set     3
>  #define XENPMU_init            4
>  #define XENPMU_finish          5
> +#define XENPMU_lvtpc_set       6
>  /* ` } */
>  
>  /* Parameters structure for HYPERVISOR_xenpmu_op call */

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for PV guests
  2014-06-06 17:40 ` [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-06-06 20:40   ` Andrew Cooper
  2014-06-06 21:21     ` Boris Ostrovsky
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 20:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Add support for handling PMU interrupts for PV guests.
>
> VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
> hypercall. This allows the guest to access PMU MSR values that are stored in
> VPMU context which is shared between hypervisor and domain, thus avoiding
> traps to hypervisor.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> ---
>  xen/arch/x86/hvm/vpmu.c  | 134 +++++++++++++++++++++++++++++++++++++++++++++--
>  xen/include/public/pmu.h |   7 +++
>  2 files changed, 136 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index ca0534b..ce842e4 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -76,7 +76,12 @@ void vpmu_lvtpc_update(uint32_t val)
>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>  
>      vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
> -    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> +
> +    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
> +    if ( is_hvm_domain(current->domain) ||
> +         !(current->arch.vpmu.xenpmu_data &&
> +           (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED)) )
> +        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);

Repeatedly referencing current is expensive (as it is an assembly block
which frobs the stack pointer).  Please use "struct vcpu *curr = current;".

Although these accesses look as if they should be through the vpmu
pointer anyway.

>  }
>  
>  int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
> @@ -87,7 +92,23 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>          return 0;
>  
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
> -        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
> +    {
> +        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
> +
> +        /*
> +         * We may have received a PMU interrupt during WRMSR handling
> +         * and since do_wrmsr may load VPMU context we should save
> +         * (and unload) it again.
> +         */
> +        if ( !is_hvm_domain(current->domain) &&
> +             (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
> +        {
> +            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> +            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
> +            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +        }
> +        return ret;
> +    }
>      return 0;
>  }
>  
> @@ -99,14 +120,109 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>          return 0;
>  
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
> -        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
> +    {
> +        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
> +
> +        if ( !is_hvm_domain(current->domain) &&
> +            (current->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED) )
> +        {
> +            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> +            vpmu->arch_vpmu_ops->arch_vpmu_save(current);
> +            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +        }
> +        return ret;
> +    }
>      return 0;
>  }
>  
> +static struct vcpu *choose_dom0_vcpu(void)

Probably want to rename this function in line with the
s/dom0/hardware_domain/ which has happened in the tree since you first
posted a series.

> +{
> +    struct vcpu *v;
> +    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
> +
> +    v = hardware_domain->vcpu[idx];
> +
> +    /*
> +     * If index is not populated search downwards the vcpu array until
> +     * a valid vcpu can be found
> +     */
> +    while ( !v && idx-- )
> +        v = hardware_domain->vcpu[idx];
> +
> +    return v;
> +}
> +
>  int vpmu_do_interrupt(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
> +    struct vpmu_struct *vpmu;

Perhaps an ASSERT(system_state == SYS_STATE_active) to sanity check the
environment before we start assuming that dom0s vcpus are constructed?

~Andrew

> +
> +    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
> +    if ( v->domain->domain_id >= DOMID_FIRST_RESERVED )
> +    {
> +        v = choose_dom0_vcpu();
> +        if ( !v )
> +            return 0;
> +    }
> +
> +    vpmu = vcpu_vpmu(v);
> +    if ( !is_hvm_domain(v->domain) )
> +    {
> +        /* PV(H) guest or dom0 is doing system profiling */
> +        const struct cpu_user_regs *gregs;
> +        int err;
> +
> +        if ( v->arch.vpmu.xenpmu_data->pmu_flags & PMU_CACHED )
> +            return 1;
> +
> +        if ( is_pvh_domain(current->domain) &&
> +             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
> +            return 0;
> +
> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
> +        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +        err = vpmu->arch_vpmu_ops->arch_vpmu_save(v);
> +        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +
> +        /* Store appropriate registers in xenpmu_data */
> +        if ( is_pv_32bit_domain(current->domain) )
> +        {
> +            /*
> +             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
> +             * and therefore we treat it the same way as a non-priviledged
> +             * PV 32-bit domain.
> +             */
> +            struct compat_cpu_user_regs *cmp;
> +
> +            gregs = guest_cpu_user_regs();
> +
> +            cmp = (void *)&v->arch.vpmu.xenpmu_data->pmu.r.regs;
> +            XLAT_cpu_user_regs(cmp, gregs);
> +        }
> +        else if ( !is_hardware_domain(current->domain) &&
> +                 !is_idle_vcpu(current) )
> +        {
> +            /* PV(H) guest */
> +            gregs = guest_cpu_user_regs();
> +            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                   gregs, sizeof(struct cpu_user_regs));
> +        }
> +        else
> +            memcpy(&v->arch.vpmu.xenpmu_data->pmu.r.regs,
> +                   regs, sizeof(struct cpu_user_regs));
> +
> +        v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id;
> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
> +
> +        v->arch.vpmu.xenpmu_data->pmu_flags |= PMU_CACHED;
> +        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
> +        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
> +
> +        send_guest_vcpu_virq(v, VIRQ_XENPMU);
> +
> +        return 1;
> +    }
>  
>      if ( vpmu->arch_vpmu_ops )
>      {
> @@ -225,7 +341,9 @@ void vpmu_load(struct vcpu *v)
>      local_irq_enable();
>  
>      /* Only when PMU is counting, we load PMU context immediately. */
> -    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
> +    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
> +         (!is_hvm_domain(v->domain) &&
> +          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
>          return;
>  
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
> @@ -463,6 +581,12 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>          vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
>          ret = 0;
>          break;
> +    case XENPMU_flush:
> +        current->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
> +        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
> +        vpmu_load(current);
> +        ret = 0;
> +        break;
>      }
>  
>      return ret;
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index 7341de9..67ce5de 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -28,6 +28,7 @@
>  #define XENPMU_init            4
>  #define XENPMU_finish          5
>  #define XENPMU_lvtpc_set       6
> +#define XENPMU_flush           7 /* Write cached MSR values to HW     */
>  /* ` } */
>  
>  /* Parameters structure for HYPERVISOR_xenpmu_op call */
> @@ -65,6 +66,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
>   */
>  #define XENPMU_FEATURE_INTEL_BTS  1
>  
> +/*
> + * PMU MSRs are cached in the context so the PV guest doesn't need to trap to
> + * the hypervisor
> + */
> +#define PMU_CACHED 1
> +
>  /* Shared between hypervisor and PV domain */
>  struct xen_pmu_data {
>      uint32_t domain_id;

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags
  2014-06-06 19:58   ` Andrew Cooper
@ 2014-06-06 20:42     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 20:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 03:58 PM, Andrew Cooper wrote:
> On 06/06/14 18:40, Boris Ostrovsky wrote:
>> Add runtime interface for setting PMU mode and flags. Three main modes are
>> provided:
>> * PMU off
>> * PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0
>>    profiles itself and the hypervisor.
>> * dom0-only PMU: dom0 collects samples for both itself and guests.
>>
>> For feature flags only Intel's BTS is currently supported.
>>
>> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> ---
>>
>> @@ -275,3 +283,100 @@ void vpmu_dump(struct vcpu *v)
>>           vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
>>   }
>>   
>> +/* Unload VPMU contexts */
>> +static void vpmu_unload_all(void)
>> +{
>> +    struct domain *d;
>> +    struct vcpu *v;
>> +    struct vpmu_struct *vpmu;
>> +
>> +    for_each_domain( d )
>> +    {
>> +        for_each_vcpu ( d, v )
>> +        {
>> +            if ( v != current )
>> +                vcpu_pause(v);
>> +            vpmu = vcpu_vpmu(v);
>> +
>> +            if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>> +            {
>> +                if ( v != current )
>> +                    vcpu_unpause(v);
>> +                continue;
>> +            }
>> +
>> +            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>> +            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
>> +                             vpmu_save_force, (void *)v, 1);
>> +            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>> +
>> +            if ( v != current )
>> +                vcpu_unpause(v);
>> +        }
>> +    }
>> +}
> This is a very long function without any preemption.
>
> To reduce its impact greatly, would it be better to have a global mode
> controlling vcpu context switching?  You could enable
> "VCPU_GLOBAL_DISABLING", then IPI all cpus at the same time to save
> their vpmu context.  Once the IPI is complete, all vpmu context is clean.
>
> This avoids a double loop and all the vcpu_pause/unpause()s.

I can see the problem but I am not sure what you are suggesting will 
work (or perhaps I don't quite understand what you are suggesting). If I 
IPI a processor may be in the middle of working on the VPMU context.

Let me think about this.


>
>> +
>> +
>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    int ret = -EINVAL;
>> +    xen_pmu_params_t pmu_params;
> This is probably going to want to be XSM'd up for permissions?

I haven't looked at this since earlier versions of the patch but when I 
did it turned out we were out of bits in the access vector (or whatever 
that structure was). I in fact mentioned this problem in the cover 
letter for the series.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests
  2014-06-06 20:13   ` Andrew Cooper
@ 2014-06-06 20:49     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 20:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 04:13 PM, Andrew Cooper wrote:
> On 06/06/14 18:40, Boris Ostrovsky wrote:
>> Code for initializing/tearing down PMU for PV guests
> PVH guests judging by the content?

PV and PVH actually.

>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Acked-by: Kevin Tian <kevin.tian@intel.com>
>> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> ---
>>   xen/arch/x86/hvm/hvm.c            |  3 +-
>>   xen/arch/x86/hvm/svm/svm.c        |  8 +++-
>>   xen/arch/x86/hvm/svm/vpmu.c       | 37 +++++++++++-------
>>   xen/arch/x86/hvm/vmx/vmx.c        |  9 ++++-
>>   xen/arch/x86/hvm/vmx/vpmu_core2.c | 67 ++++++++++++++++++++++----------
>>   xen/arch/x86/hvm/vpmu.c           | 82 ++++++++++++++++++++++++++++++++++++++-
>>   xen/common/event_channel.c        |  1 +
>>   xen/include/asm-x86/hvm/vpmu.h    |  1 +
>>   xen/include/public/pmu.h          |  2 +
>>   xen/include/public/xen.h          |  1 +
>>   10 files changed, 169 insertions(+), 42 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index d2190be..fab54ff 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4031,7 +4031,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
>>       [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
>>       HYPERCALL(hvm_op),
>>       HYPERCALL(sysctl),
>> -    HYPERCALL(domctl)
>> +    HYPERCALL(domctl),
>> +    HYPERCALL(xenpmu_op)
>>   };
>>   
>>   int hvm_do_hypercall(struct cpu_user_regs *regs)
>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>> index 870e4ee..7358626 100644
>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -1150,7 +1150,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
>>           return rc;
>>       }
>>   
>> -    vpmu_initialise(v);
>> +    /* PVH's VPMU is initialized via hypercall */
>> +    if ( is_hvm_domain(v->domain) )
>> +        vpmu_initialise(v);
>>   
> With the wish to merge HVM and PVH, is this wise?  What does
> hypercall-based setup gain us over MSR based setup, given that we have
> to provide the MSR access to HVM guests anyway?


Interrupts.

We have to go via event channel and the handler for VIRQ_PMU in Linux 
wants to access MSRs via shared page (i.e. not do actual MSR access). 
That page is set up duting the hypercall.

Once we add APIC support for PVH I think most of PMU support will follow 
HVM code. But for now we have to do it separately.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on PV guests
  2014-06-06 20:26   ` Andrew Cooper
@ 2014-06-06 20:53     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 20:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 04:26 PM, Andrew Cooper wrote:
>
>   }
>   
> +static void inject_trap(struct vcpu *v, unsigned int trapno)
> +{
> +    if ( has_hvm_container_domain(v->domain) )
> +        hvm_inject_hw_exception(trapno, 0);
> +    else
> +        send_guest_trap(v->domain, v->vcpu_id, trapno);
> +}
> +
> send_guest_trap() cant be used to send anything other than an nmi or
> mce, but you appear to be using it to try and send #PFs ?

Yes, this is wrong. I'll need to find right interface.

>
>>           {
>>           case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
>>               mask = ~((1ull << 32) - 1);
>> -            if (msr_content & mask)
>> +            if ( msr_content & mask )
> It would be helpful not to mix large functional changes and code
> cleanup.  While I appreciate the cleanup, could it be moved to a
> different patch?  The same comment applies to several other patches as well.

I *think* this is the only instance of me doing this. I'll see if there 
are other places.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 19/19] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-06-06 17:40 ` [PATCH v7 19/19] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
@ 2014-06-06 21:05   ` Andrew Cooper
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Cooper @ 2014-06-06 21:05 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/14 18:40, Boris Ostrovsky wrote:
> Since PMU is now not HVM specific we can move VPMU-related files up from
> arch/x86/hvm/ directory.
>
> Specifically:
>     arch/x86/hvm/vpmu.c -> arch/x86/vpmu.c
>     arch/x86/hvm/svm/vpmu.c -> arch/x86/vpmu_amd.c
>     arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/vpmu_intel.c
>     include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

Use -M please, to vastly reduce the size of this patch.

~Andrew

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for PV guests
  2014-06-06 20:40   ` Andrew Cooper
@ 2014-06-06 21:21     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-06 21:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, keir, JBeulich, jun.nakajima, tim, dietmar.hahn,
	xen-devel, suravee.suthikulpanit

On 06/06/2014 04:40 PM, Andrew Cooper wrote:
>   
> +
>   int vpmu_do_interrupt(struct cpu_user_regs *regs)
>   {
>       struct vcpu *v = current;
> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
> +    struct vpmu_struct *vpmu;
> Perhaps an ASSERT(system_state == SYS_STATE_active) to sanity check the
> environment before we start assuming that dom0s vcpus are constructed?

I think this may not work --- if we are being suspended we first change 
state to SYS_STATE_suspend and then start pausing domains. We may get a 
PMU interrupt between these two.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code
  2014-06-06 18:34   ` Andrew Cooper
  2014-06-06 19:43     ` Boris Ostrovsky
@ 2014-06-10  8:19     ` Jan Beulich
  1 sibling, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2014-06-10  8:19 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 06.06.14 at 20:34, <andrew.cooper3@citrix.com> wrote:
> On 06/06/14 18:40, Boris Ostrovsky wrote:
>> -static int arch_pmc_cnt;
>> +    eax = cpuid_eax(0xa);
>> +    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
>> +}
> 
> This (and later) can be made much simpler.  The style guidelines easily
> permit:
> 
> static int core2_get_arch_pmc_count(void)
> {
>     return (cpuid_eax(0xa) & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
> }

The above can be further simplified using MASK_EXTR(), at once
eliminating the need for *_SHIFT constants. 

> Unrelated to this code itself, I wonder whether Xen should gain some
> mnemonics for cpuid leaves.

We already have XSTATE_CPUID, so it would indeed seem desirable
to name 0xa too.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest
  2014-06-06 17:39 ` [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
  2014-06-06 17:57   ` Andrew Cooper
@ 2014-06-10 11:31   ` Jan Beulich
  1 sibling, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2014-06-10 11:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 06.06.14 at 19:39, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -527,6 +527,24 @@ struct xenpf_core_parking {
>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>  
> +#define XENPF_get_symbol   61
> +struct xenpf_symdata {
> +    /* IN variables */
> +    uint32_t namelen; /* size of 'name' buffer */

I think if you want this to be an IN it should be named "namesize"
or "namesz"; when being "namelen" one wold naturally expect it
to be an IN/OUT (which I think I'd prefer).

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded
  2014-06-06 17:59   ` Andrew Cooper
@ 2014-06-10 15:29     ` Jan Beulich
  2014-06-10 15:40       ` Boris Ostrovsky
  0 siblings, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2014-06-10 15:29 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, Andrew Cooper, tim,
	dietmar.hahn, xen-devel, jun.nakajima

>>> On 06.06.14 at 19:59, <andrew.cooper3@citrix.com> wrote:
> On 06/06/14 18:39, Boris Ostrovsky wrote:
>> Because a PMU interrupt may be generated as soon as PMU registers are loaded (or,
>> more precisely, as soon as HW PMU is "armed") we don't want to delay marking
>> context as LOADED until after registers are loaded. Otherwise during interrupt
>> handling VPMU_CONTEXT_LOADED may not be set and this could be confusing.
>>
>> (Technically, only SVM needs this change right now since VMX will "arm" PMU later,
>> during VMRUN when global control register is loaded from VMCS. However, both
>> AMD and Intel code will require this patch when we introduce PV VPMU).
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Acked-by: Kevin Tian <kevin.tian@intel.com>
>> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

With this not having seen other comments, does it make sense to
apply this on its own, while you address comments you got on
subsequent patches?

Thanks, Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded
  2014-06-10 15:29     ` Jan Beulich
@ 2014-06-10 15:40       ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-10 15:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, Andrew Cooper, tim,
	dietmar.hahn, xen-devel, jun.nakajima

On 06/10/2014 11:29 AM, Jan Beulich wrote:
>>>> On 06.06.14 at 19:59, <andrew.cooper3@citrix.com> wrote:
>> On 06/06/14 18:39, Boris Ostrovsky wrote:
>>> Because a PMU interrupt may be generated as soon as PMU registers are loaded (or,
>>> more precisely, as soon as HW PMU is "armed") we don't want to delay marking
>>> context as LOADED until after registers are loaded. Otherwise during interrupt
>>> handling VPMU_CONTEXT_LOADED may not be set and this could be confusing.
>>>
>>> (Technically, only SVM needs this change right now since VMX will "arm" PMU later,
>>> during VMRUN when global control register is loaded from VMCS. However, both
>>> AMD and Intel code will require this patch when we introduce PV VPMU).
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> Acked-by: Kevin Tian <kevin.tian@intel.com>
>>> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>>> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> With this not having seen other comments, does it make sense to
> apply this on its own, while you address comments you got on
> subsequent patches?

Yes, this can go in, especially given that it's the first patch 
affecting VPMU.

Thanks.
-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 06/19] vmx: Merge MSR management routines
  2014-06-06 17:40 ` [PATCH v7 06/19] vmx: Merge MSR management routines Boris Ostrovsky
@ 2014-06-11  9:55   ` Jan Beulich
  0 siblings, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2014-06-11  9:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
> +    ASSERT( (type == VMX_GUEST_MSR) || (type == VMX_HOST_MSR) );

As hinted at by Andrew already, I think you need to do a global
coding style cleanup pass through this series, making sure you
have (possibly among other things) spaces/parentheses where
they belong and, apart from ambiguous cases where it's a matter
of taste, none where none should be. The example here has too
many spaces - ASSERT, other than if/for/while, is not a keyword.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h
  2014-06-06 17:40 ` [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
  2014-06-06 19:32   ` Andrew Cooper
@ 2014-06-11 10:03   ` Jan Beulich
  2014-06-11 12:32     ` Boris Ostrovsky
  1 sibling, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2014-06-11 10:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
> @@ -228,6 +229,10 @@ void vpmu_initialise(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t vendor = current_cpu_data.x86_vendor;
>  
> +    BUILD_BUG_ON((sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ) ||
> +                 (sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ));

This should be two BUILD_BUG_ON()s, so that if one triggers there's
no ambiguity in which of the structures grew too large.

> +struct xen_pmu_amd_ctxt {
> +    uint32_t counters;       /* Offset to counter MSRs */
> +    uint32_t ctrls;          /* Offset to control MSRs */

Offsets relative to what?

> +struct xen_pmu_intel_ctxt {
> +    uint64_t global_ctrl;
> +    uint64_t global_ovf_ctrl;
> +    uint64_t global_status;
> +    uint64_t fixed_ctrl;
> +    uint64_t ds_area;
> +    uint64_t pebs_enable;
> +    uint64_t debugctl;
> +    uint32_t fixed_counters;  /* Offset to fixed counter MSRs */
> +    uint32_t arch_counters;   /* Offset to architectural counter MSRs */

Same question here.

> +struct xen_arch_pmu {
> +    union {
> +        struct cpu_user_regs regs;
> +        uint8_t pad[256];
> +    } r;
> +    union {
> +        uint32_t lapic_lvtpc;
> +        uint64_t pad;
> +    } l;
> +    union {
> +        struct xen_pmu_amd_ctxt amd;
> +        struct xen_pmu_intel_ctxt intel;
> +#define XENPMU_CTXT_PAD_SZ  128
> +        uint8_t pad[XENPMU_CTXT_PAD_SZ];

So assuming the variable size MSR arrays follow here, is 128 enough
for forward compatibility? Or is 128 indeed only expected to cover the
fixed size base structures (whichever way it is should perhaps be
made explicit by way of adding a comment)?

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific
  2014-06-06 17:40 ` [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2014-06-11 10:07   ` Jan Beulich
  2014-06-11 13:57     ` Is: Reviewed-by, Acked-by rules. Was:Re: " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2014-06-11 10:07 UTC (permalink / raw)
  To: kevin.tian, Boris Ostrovsky
  Cc: keir, suravee.suthikulpanit, tim, dietmar.hahn, xen-devel, jun.nakajima

>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
> vpmu structure will be used for both HVM and PV guests. Move it from
> hvm_vcpu to arch_vcpu.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>

I pointed this out on an earlier version of the series, perhaps on
another patch (but comments like this should be taken to apply
globally): An ack by a non-maintainer of any of the code being
modified is bogus/confusing - it should either be Reviewed-by or
left off.

> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags
  2014-06-06 17:40 ` [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
  2014-06-06 19:58   ` Andrew Cooper
@ 2014-06-11 10:14   ` Jan Beulich
  2014-06-17 15:01     ` Boris Ostrovsky
  1 sibling, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2014-06-11 10:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/x86_64/compat/entry.S
> +++ b/xen/arch/x86/x86_64/compat/entry.S
> @@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
>          .quad do_domctl
>          .quad compat_kexec_op
>          .quad do_tmem_op
> +        .quad do_ni_hypercall           /* reserved for XenClient */
> +        .quad do_xenpmu_op              /* 40 */
>          .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
>          .quad compat_ni_hypercall
>          .endr
> @@ -465,6 +467,8 @@ ENTRY(compat_hypercall_args_table)
>          .byte 1 /* do_domctl                */
>          .byte 2 /* compat_kexec_op          */
>          .byte 1 /* do_tmem_op               */
> +        .byte 0 /* reserved for XenClient   */
> +        .byte 2 /* do_xenpmu_op             */  /* 40 */
>          .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
>          .byte 0 /* compat_ni_hypercall      */
>          .endr

I'm pretty certain I said on an earlier version of the patch series
already: If you use the same handler for native and compat mode
hypercalls, you should add structure layout verification (see the ?
entries in xen/include/xlat.lst and how what they produce is being
used in various places). And the fact that the interface structure
contains a handle already makes clear that this check will fail - I
don't think there's  a way around having at least a thin compat
mode wrapper here.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests
  2014-06-06 17:40 ` [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
  2014-06-06 20:13   ` Andrew Cooper
@ 2014-06-11 10:20   ` Jan Beulich
  2014-06-11 12:49     ` Boris Ostrovsky
  1 sibling, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2014-06-11 10:20 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
> Code for initializing/tearing down PMU for PV guests
> ...
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4031,7 +4031,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
>      [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
>      HYPERCALL(hvm_op),
>      HYPERCALL(sysctl),
> -    HYPERCALL(domctl)
> +    HYPERCALL(domctl),
> +    HYPERCALL(xenpmu_op)

How is this change related to the subject and description?

> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    struct page_info *page;
> +    uint64_t gfn = params->d.val;
> +
> +    if ( !params || params->vcpu >= d->max_vcpus )
> +        return -EINVAL;
> +
> +    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> +    if ( !page )
> +        return -EINVAL;
> +
> +    if ( !get_page_type(page, PGT_writable_page) )
> +    {
> +        put_page(page);
> +        return -EINVAL;
> +    }
> +
> +    v = d->vcpu[params->vcpu];

I'm sure I asked this before - what guarantees (now as well as
going forward) that you get non-NULL here? Various existing
code paths properly verify both d->vcpu and d->vcpu[...] not
being NULL, so you should do this consistently too.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h
  2014-06-11 10:03   ` Jan Beulich
@ 2014-06-11 12:32     ` Boris Ostrovsky
  0 siblings, 0 replies; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-11 12:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

On 06/11/2014 06:03 AM, Jan Beulich wrote:
>>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
>> @@ -228,6 +229,10 @@ void vpmu_initialise(struct vcpu *v)
>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>       uint8_t vendor = current_cpu_data.x86_vendor;
>>   
>> +    BUILD_BUG_ON((sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ) ||
>> +                 (sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ));
> This should be two BUILD_BUG_ON()s, so that if one triggers there's
> no ambiguity in which of the structures grew too large.
>
>> +struct xen_pmu_amd_ctxt {
>> +    uint32_t counters;       /* Offset to counter MSRs */
>> +    uint32_t ctrls;          /* Offset to control MSRs */
> Offsets relative to what?

Relative to context union of xen_arch_pmu (i.e. xen_pmu_amd_ctxt). I'll 
add a comment.

>
>> +struct xen_pmu_intel_ctxt {
>> +    uint64_t global_ctrl;
>> +    uint64_t global_ovf_ctrl;
>> +    uint64_t global_status;
>> +    uint64_t fixed_ctrl;
>> +    uint64_t ds_area;
>> +    uint64_t pebs_enable;
>> +    uint64_t debugctl;
>> +    uint32_t fixed_counters;  /* Offset to fixed counter MSRs */
>> +    uint32_t arch_counters;   /* Offset to architectural counter MSRs */
> Same question here.
>
>> +struct xen_arch_pmu {
>> +    union {
>> +        struct cpu_user_regs regs;
>> +        uint8_t pad[256];
>> +    } r;
>> +    union {
>> +        uint32_t lapic_lvtpc;
>> +        uint64_t pad;
>> +    } l;
>> +    union {
>> +        struct xen_pmu_amd_ctxt amd;
>> +        struct xen_pmu_intel_ctxt intel;
>> +#define XENPMU_CTXT_PAD_SZ  128
>> +        uint8_t pad[XENPMU_CTXT_PAD_SZ];
> So assuming the variable size MSR arrays follow here, is 128 enough
> for forward compatibility? Or is 128 indeed only expected to cover the
> fixed size base structures (whichever way it is should perhaps be
> made explicit by way of adding a comment)?

128 is for fixed part only (which is currently 64 bytes Intel, which is 
larger). Will add a comment.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests
  2014-06-11 10:20   ` Jan Beulich
@ 2014-06-11 12:49     ` Boris Ostrovsky
  2014-06-11 12:53       ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-11 12:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

On 06/11/2014 06:20 AM, Jan Beulich wrote:
>>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
>> Code for initializing/tearing down PMU for PV guests
>> ...
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4031,7 +4031,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
>>       [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
>>       HYPERCALL(hvm_op),
>>       HYPERCALL(sysctl),
>> -    HYPERCALL(domctl)
>> +    HYPERCALL(domctl),
>> +    HYPERCALL(xenpmu_op)
> How is this change related to the subject and description?

Yes, this is result of my eliminating PVH patch as a standalone one and 
spreading its content over other patches. I'll update the text.

>
>> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>> +{
>> +    struct vcpu *v;
>> +    struct page_info *page;
>> +    uint64_t gfn = params->d.val;
>> +
>> +    if ( !params || params->vcpu >= d->max_vcpus )
>> +        return -EINVAL;
>> +
>> +    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
>> +    if ( !page )
>> +        return -EINVAL;
>> +
>> +    if ( !get_page_type(page, PGT_writable_page) )
>> +    {
>> +        put_page(page);
>> +        return -EINVAL;
>> +    }
>> +
>> +    v = d->vcpu[params->vcpu];
> I'm sure I asked this before - what guarantees (now as well as
> going forward) that you get non-NULL here? Various existing
> code paths properly verify both d->vcpu and d->vcpu[...] not
> being NULL, so you should do this consistently too.

Ah, I thought you were referring to params check last time. I'll add 
vcpu checks as well.

-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests
  2014-06-11 12:49     ` Boris Ostrovsky
@ 2014-06-11 12:53       ` Jan Beulich
  0 siblings, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2014-06-11 12:53 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 11.06.14 at 14:49, <boris.ostrovsky@oracle.com> wrote:
> On 06/11/2014 06:20 AM, Jan Beulich wrote:
>>>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
>>> Code for initializing/tearing down PMU for PV guests
>>> ...
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -4031,7 +4031,8 @@ static hvm_hypercall_t *const 
> pvh_hypercall64_table[NR_hypercalls] = {
>>>       [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
>>>       HYPERCALL(hvm_op),
>>>       HYPERCALL(sysctl),
>>> -    HYPERCALL(domctl)
>>> +    HYPERCALL(domctl),
>>> +    HYPERCALL(xenpmu_op)
>> How is this change related to the subject and description?
> 
> Yes, this is result of my eliminating PVH patch as a standalone one and 
> spreading its content over other patches. I'll update the text.
> 
>>
>>> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>>> +{
>>> +    struct vcpu *v;
>>> +    struct page_info *page;
>>> +    uint64_t gfn = params->d.val;
>>> +
>>> +    if ( !params || params->vcpu >= d->max_vcpus )
>>> +        return -EINVAL;
>>> +
>>> +    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
>>> +    if ( !page )
>>> +        return -EINVAL;
>>> +
>>> +    if ( !get_page_type(page, PGT_writable_page) )
>>> +    {
>>> +        put_page(page);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    v = d->vcpu[params->vcpu];
>> I'm sure I asked this before - what guarantees (now as well as
>> going forward) that you get non-NULL here? Various existing
>> code paths properly verify both d->vcpu and d->vcpu[...] not
>> being NULL, so you should do this consistently too.
> 
> Ah, I thought you were referring to params check last time. I'll add 
> vcpu checks as well.

And drop checking param against NULL - I don't think that check
makes any sense.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Is: Reviewed-by, Acked-by rules. Was:Re: [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific
  2014-06-11 10:07   ` Jan Beulich
@ 2014-06-11 13:57     ` Konrad Rzeszutek Wilk
  2014-06-11 20:25       ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-11 13:57 UTC (permalink / raw)
  To: Jan Beulich, time
  Cc: kevin.tian, keir, jun.nakajima, tim, dietmar.hahn, xen-devel,
	suravee.suthikulpanit, Boris Ostrovsky

On Wed, Jun 11, 2014 at 11:07:14AM +0100, Jan Beulich wrote:
> >>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
> > vpmu structure will be used for both HVM and PV guests. Move it from
> > hvm_vcpu to arch_vcpu.
> > 
> > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > Acked-by: Kevin Tian <kevin.tian@intel.com>
> 
> I pointed this out on an earlier version of the series, perhaps on
> another patch (but comments like this should be taken to apply
> globally): An ack by a non-maintainer of any of the code being
> modified is bogus/confusing - it should either be Reviewed-by or
> left off.

That is at odds with what the Linux style has:
---
13) When to use Acked-by: and Cc:

The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery path.

If a person was not directly involved in the preparation or handling of a
patch but wishes to signify and record their approval of it then they can
arrange to have an Acked-by: line added to the patch's changelog.

Acked-by: is often used by the maintainer of the affected code when that
maintainer neither contributed to nor forwarded the patch.

Acked-by: is not as formal as Signed-off-by:.  It is a record that the acker
has at least reviewed the patch and has indicated acceptance.  Hence patch
mergers will sometimes manually convert an acker's "yep, looks good to me"
into an Acked-by:.

Acked-by: does not necessarily indicate acknowledgement of the entire patch.
For example, if a patch affects multiple subsystems and has an Acked-by: from
one subsystem maintainer then this usually indicates acknowledgement of just
the part which affects that maintainer's code.  Judgement should be used here.
When in doubt people should refer to the original discussion in the mailing
list archives.

If a person has had the opportunity to comment on a patch, but has not
provided such comments, you may optionally add a "Cc:" tag to the patch.
This is the only tag which might be added without an explicit action by the
person it names.  This tag documents that potentially interested parties
have been included in the discussion

---

Which does allow non-maintainers to provide 'Acked-by'. Which
is a lesser version of Reviewed-by by a non-maintainer. As 'Reviewed-by' has:

	Reviewer's statement of oversight

	By offering my Reviewed-by: tag, I state that:

 	 (a) I have carried out a technical review of this patch to
	     evaluate its appropriateness and readiness for inclusion into
	     the mainline kernel.

	 (b) Any problems, concerns, or questions relating to the patch
	     have been communicated back to the submitter.  I am satisfied
	     with the submitter's response to my comments.

	 (c) While there may be things that could be improved with this
	     submission, I believe that it is, at this time, (1) a
	     worthwhile modification to the kernel, and (2) free of known
	     issues which would argue against its inclusion.

	 (d) While I have reviewed the patch and believe it to be sound, I
	     do not (unless explicitly stated elsewhere) make any
	     warranties or guarantees that it will achieve its stated
	     purpose or function properly in any given situation.

Now obviously Xen != Linux, but if we want to this 'Acked-by' or
'Reviewed-by' policy to be different from Linux (which I think we
should not), we should have it written somewhere so folks do not
get confused.

And I think George has a different opinion of when Reviewed-by
get carried on patches that undergo modifications. Sadly I don't see
anything in the Linux Documentation/Submitting* to explain what
the rules are for that.

> 
> > Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> > Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Is: Reviewed-by, Acked-by rules. Was:Re: [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific
  2014-06-11 13:57     ` Is: Reviewed-by, Acked-by rules. Was:Re: " Konrad Rzeszutek Wilk
@ 2014-06-11 20:25       ` Jan Beulich
  2014-06-12 11:10         ` George Dunlap
  0 siblings, 1 reply; 58+ messages in thread
From: Jan Beulich @ 2014-06-11 20:25 UTC (permalink / raw)
  To: konrad.wilk, time
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima, boris.ostrovsky

>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 06/11/14 3:57 PM >>>
>On Wed, Jun 11, 2014 at 11:07:14AM +0100, Jan Beulich wrote:
>> I pointed this out on an earlier version of the series, perhaps on
>> another patch (but comments like this should be taken to apply
>> globally): An ack by a non-maintainer of any of the code being
>> modified is bogus/confusing - it should either be Reviewed-by or
>> left off.
>
>That is at odds with what the Linux style has:

I think you said mostly the same when I made a similar statement a couple
of months ago; nothing has changed since then afaict.

>Now obviously Xen != Linux, but if we want to this 'Acked-by' or
>'Reviewed-by' policy to be different from Linux (which I think we
>should not), we should have it written somewhere so folks do not
>get confused.

Exactly - Xen != Linux. We've been following the maintainer-only-ack
style for quite a while now, with infrequent acks from non-maintainers
just ignored (or applied) silently. With this series however, I felt it would
be better to mention the misuse as it has been on more than one patch.

>And I think George has a different opinion of when Reviewed-by
>get carried on patches that undergo modifications. Sadly I don't see
>anything in the Linux Documentation/Submitting* to explain what
>the rules are for that.

I don't recall having a "different" opinion here - at least I agree: With
non-benign changes to a patch, acks and reviews should be dropped.
Fixing bugs in a patch - which I think is the recent case you recall -
is clearly a non-benign change.

In fact I would go farther and suggest that more than a week or two
old tags should also get dropped. There have been a number of cases
not that long ago that new versions of patch series got posted weeks
if not months after the most recent earlier versions. And among them
have been cases where I wasn't really certain whether a tag with my
name was actually validly there, or valid anymore.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Is: Reviewed-by, Acked-by rules. Was:Re: [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific
  2014-06-11 20:25       ` Jan Beulich
@ 2014-06-12 11:10         ` George Dunlap
  2014-06-12 16:21           ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: George Dunlap @ 2014-06-12 11:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Keir Fraser, time, Nakajima, Jun, Tim Deegan,
	Dietmar Hahn, xen-devel, Suravee Suthikulpanit, Boris Ostrovsky

On Wed, Jun 11, 2014 at 9:25 PM, Jan Beulich <jbeulich@suse.com> wrote:
>>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 06/11/14 3:57 PM >>>
>>On Wed, Jun 11, 2014 at 11:07:14AM +0100, Jan Beulich wrote:
>>> I pointed this out on an earlier version of the series, perhaps on
>>> another patch (but comments like this should be taken to apply
>>> globally): An ack by a non-maintainer of any of the code being
>>> modified is bogus/confusing - it should either be Reviewed-by or
>>> left off.
>>
>>That is at odds with what the Linux style has:
>
> I think you said mostly the same when I made a similar statement a couple
> of months ago; nothing has changed since then afaict.
>
>>Now obviously Xen != Linux, but if we want to this 'Acked-by' or
>>'Reviewed-by' policy to be different from Linux (which I think we
>>should not), we should have it written somewhere so folks do not
>>get confused.
>
> Exactly - Xen != Linux. We've been following the maintainer-only-ack
> style for quite a while now, with infrequent acks from non-maintainers
> just ignored (or applied) silently. With this series however, I felt it would
> be better to mention the misuse as it has been on more than one patch.

I'm not sure why you *object* to non-maintainer Acks, Jan -- people
who are either pushing the patch or applying the patch need to know
who the maintainers are anyway, if for no other reason to know when a
"Reviewed-by" counts as a maintainer Ack or not.

It seems to me there are several times when non-maintainer Acks are
useful; for example, when someone has previously objected to a patch,
and is now OK with it, but has not actually reviewed it.

>>And I think George has a different opinion of when Reviewed-by
>>get carried on patches that undergo modifications. Sadly I don't see
>>anything in the Linux Documentation/Submitting* to explain what
>>the rules are for that.
>
> I don't recall having a "different" opinion here - at least I agree: With
> non-benign changes to a patch, acks and reviews should be dropped.
> Fixing bugs in a patch - which I think is the recent case you recall -
> is clearly a non-benign change.
>
> In fact I would go farther and suggest that more than a week or two
> old tags should also get dropped. There have been a number of cases
> not that long ago that new versions of patch series got posted weeks
> if not months after the most recent earlier versions. And among them
> have been cases where I wasn't really certain whether a tag with my
> name was actually validly there, or valid anymore.

FWIW, one example was Dario's soft-affinity series; when he was here
before the hackathon we actually discussed what the protocol should be
for that.  I didn't think it was clear, so I said he should retain
them, and people could complain if that's not what they wanted. :-)

For my part, I tend to appreciate being reminded of what my opinion
was the last time around, so I don't think time itself should expire
my tags.  The tags should expire when either the patch itself, or the
code it depends on, has changed in a functionally significant way.  It
should be the job of the person submitting the code to keep track of
that.

The only risk to doing things this way, I suppose, would be if code
the patch depends on had changed in a way that was incompatible, and
the patch author didn't notice; the "Reviewed-by" might lull the
maintainer into a false sense of security.

 -George

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: Is: Reviewed-by, Acked-by rules. Was:Re: [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific
  2014-06-12 11:10         ` George Dunlap
@ 2014-06-12 16:21           ` Jan Beulich
  0 siblings, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2014-06-12 16:21 UTC (permalink / raw)
  To: dunlapg
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, time, dietmar.hahn,
	xen-devel, jun.nakajima, boris.ostrovsky

>>> George Dunlap <dunlapg@umich.edu> 06/12/14 1:10 PM >>>
>I'm not sure why you *object* to non-maintainer Acks, Jan -- people
>who are either pushing the patch or applying the patch need to know
>who the maintainers are anyway, if for no other reason to know when a
>"Reviewed-by" counts as a maintainer Ack or not.
>
>It seems to me there are several times when non-maintainer Acks are
>useful; for example, when someone has previously objected to a patch,
>and is now OK with it, but has not actually reviewed it.

This is valid point. But other than unusual (I would think) situations like
this - what meaning would an ack of a non-maintainer really have? It's
not been a proper review (which, by refusing non-maintainer acks I would
hope we might get more of), and it also doesn't count for anything else
from a purely abstract pov. And I guess - back to the point you make -
there are other ways to express that reasons for previous objection
have got eliminated. After all - along the lines of what you said - the
person pushing a patch is expected to have followed an eventual
discussion relatively closely anyway.

>>>And I think George has a different opinion of when Reviewed-by
>>>get carried on patches that undergo modifications. Sadly I don't see
>>>anything in the Linux Documentation/Submitting* to explain what
>>>the rules are for that.
>>
>> I don't recall having a "different" opinion here - at least I agree: With
>> non-benign changes to a patch, acks and reviews should be dropped.
>> Fixing bugs in a patch - which I think is the recent case you recall -
>> is clearly a non-benign change.
>>
>> In fact I would go farther and suggest that more than a week or two
>> old tags should also get dropped. There have been a number of cases
>> not that long ago that new versions of patch series got posted weeks
>> if not months after the most recent earlier versions. And among them
>> have been cases where I wasn't really certain whether a tag with my
>> name was actually validly there, or valid anymore.
>
>FWIW, one example was Dario's soft-affinity series; when he was here
>before the hackathon we actually discussed what the protocol should be
>for that.  I didn't think it was clear, so I said he should retain
>them, and people could complain if that's not what they wanted. :-)

And iirc you then were the one who complained...

>For my part, I tend to appreciate being reminded of what my opinion
>was the last time around, so I don't think time itself should expire
>my tags.  The tags should expire when either the patch itself, or the
>code it depends on, has changed in a functionally significant way.  It
>should be the job of the person submitting the code to keep track of
>that.

Yes, I appreciate that there are reasons to retain old tags, and I didn't
mean to push (or initiate the pushing of) any new policy here. I merely
wanted to express that, as you say:

>The only risk to doing things this way, I suppose, would be if code
>the patch depends on had changed in a way that was incompatible, and
>the patch author didn't notice; the "Reviewed-by" might lull the
>maintainer into a false sense of security.

 ... there's a risk with retaining tags over long periods of time. Ideally of
course things would get followed up within shorter time distances, but I
know very well that this isn't always possible.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags
  2014-06-11 10:14   ` Jan Beulich
@ 2014-06-17 15:01     ` Boris Ostrovsky
  2014-06-17 15:14       ` Jan Beulich
  0 siblings, 1 reply; 58+ messages in thread
From: Boris Ostrovsky @ 2014-06-17 15:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

On 06/11/2014 06:14 AM, Jan Beulich wrote:
>>>> On 06.06.14 at 19:40, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/x86_64/compat/entry.S
>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>> @@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
>>           .quad do_domctl
>>           .quad compat_kexec_op
>>           .quad do_tmem_op
>> +        .quad do_ni_hypercall           /* reserved for XenClient */
>> +        .quad do_xenpmu_op              /* 40 */
>>           .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
>>           .quad compat_ni_hypercall
>>           .endr
>> @@ -465,6 +467,8 @@ ENTRY(compat_hypercall_args_table)
>>           .byte 1 /* do_domctl                */
>>           .byte 2 /* compat_kexec_op          */
>>           .byte 1 /* do_tmem_op               */
>> +        .byte 0 /* reserved for XenClient   */
>> +        .byte 2 /* do_xenpmu_op             */  /* 40 */
>>           .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
>>           .byte 0 /* compat_ni_hypercall      */
>>           .endr
> I'm pretty certain I said on an earlier version of the patch series
> already: If you use the same handler for native and compat mode
> hypercalls, you should add structure layout verification (see the ?
> entries in xen/include/xlat.lst and how what they produce is being
> used in various places). And the fact that the interface structure
> contains a handle already makes clear that this check will fail - I
> don't think there's  a way around having at least a thin compat
> mode wrapper here.


Right. And I decided to drop the handle since noone is using it now 
anyway and I don't want to add a compat layer for something that may 
never get used.

I have a question though. I want to add a compat check for

struct xen_pmu_arch {
     union {
         struct cpu_user_regs regs;
         uint8_t pad[256];
     } r;
     union {
         uint32_t lapic_lvtpc;
         uint64_t pad;
     } l;
     union {
         struct xen_pmu_amd_ctxt amd;
         struct xen_pmu_intel_ctxt intel;
         uint8_t pad[128];
     } c;
};

which is a shared structure (i.e. it's not used across a hypercall). The 
problem I am having is that cpu_user_regs needs to be translated. Does 
the xlat framework allow checking a structure that needs one of its 
fields translated? I went through the scripts and it appears that it 
doesn't.

Thanks.
-boris

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags
  2014-06-17 15:01     ` Boris Ostrovsky
@ 2014-06-17 15:14       ` Jan Beulich
  0 siblings, 0 replies; 58+ messages in thread
From: Jan Beulich @ 2014-06-17 15:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, tim, dietmar.hahn,
	xen-devel, jun.nakajima

>>> On 17.06.14 at 17:01, <boris.ostrovsky@oracle.com> wrote:
> I have a question though. I want to add a compat check for
> 
> struct xen_pmu_arch {
>      union {
>          struct cpu_user_regs regs;
>          uint8_t pad[256];
>      } r;
>      union {
>          uint32_t lapic_lvtpc;
>          uint64_t pad;
>      } l;
>      union {
>          struct xen_pmu_amd_ctxt amd;
>          struct xen_pmu_intel_ctxt intel;
>          uint8_t pad[128];
>      } c;
> };
> 
> which is a shared structure (i.e. it's not used across a hypercall). The 
> problem I am having is that cpu_user_regs needs to be translated. Does 
> the xlat framework allow checking a structure that needs one of its 
> fields translated? I went through the scripts and it appears that it 
> doesn't.

Indeed this is an all-or-nothing thing. Naming (and perhaps splitting
out) the latter two sub-unions may allow checking them individually
though.

Jan

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2014-06-17 15:14 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06 17:39 [PATCH v7 00/19] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
2014-06-06 17:39 ` [PATCH v7 01/19] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2014-06-06 17:57   ` Andrew Cooper
2014-06-06 19:05     ` Boris Ostrovsky
2014-06-06 19:13       ` Andrew Cooper
2014-06-10 11:31   ` Jan Beulich
2014-06-06 17:39 ` [PATCH v7 02/19] VPMU: Mark context LOADED before registers are loaded Boris Ostrovsky
2014-06-06 17:59   ` Andrew Cooper
2014-06-10 15:29     ` Jan Beulich
2014-06-10 15:40       ` Boris Ostrovsky
2014-06-06 17:39 ` [PATCH v7 03/19] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
2014-06-06 18:02   ` Andrew Cooper
2014-06-06 19:07     ` Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 04/19] x86/VPMU: Make vpmu marcos a bit more efficient Boris Ostrovsky
2014-06-06 18:13   ` Andrew Cooper
2014-06-06 19:28     ` Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 05/19] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
2014-06-06 18:34   ` Andrew Cooper
2014-06-06 19:43     ` Boris Ostrovsky
2014-06-10  8:19     ` Jan Beulich
2014-06-06 17:40 ` [PATCH v7 06/19] vmx: Merge MSR management routines Boris Ostrovsky
2014-06-11  9:55   ` Jan Beulich
2014-06-06 17:40 ` [PATCH v7 07/19] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 08/19] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 09/19] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2014-06-06 19:32   ` Andrew Cooper
2014-06-11 10:03   ` Jan Beulich
2014-06-11 12:32     ` Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 10/19] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2014-06-11 10:07   ` Jan Beulich
2014-06-11 13:57     ` Is: Reviewed-by, Acked-by rules. Was:Re: " Konrad Rzeszutek Wilk
2014-06-11 20:25       ` Jan Beulich
2014-06-12 11:10         ` George Dunlap
2014-06-12 16:21           ` Jan Beulich
2014-06-06 17:40 ` [PATCH v7 11/19] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2014-06-06 19:58   ` Andrew Cooper
2014-06-06 20:42     ` Boris Ostrovsky
2014-06-11 10:14   ` Jan Beulich
2014-06-17 15:01     ` Boris Ostrovsky
2014-06-17 15:14       ` Jan Beulich
2014-06-06 17:40 ` [PATCH v7 12/19] x86/VPMU: Initialize PMU for PV guests Boris Ostrovsky
2014-06-06 20:13   ` Andrew Cooper
2014-06-06 20:49     ` Boris Ostrovsky
2014-06-11 10:20   ` Jan Beulich
2014-06-11 12:49     ` Boris Ostrovsky
2014-06-11 12:53       ` Jan Beulich
2014-06-06 17:40 ` [PATCH v7 13/19] x86/VPMU: Add support for PMU register handling on " Boris Ostrovsky
2014-06-06 20:26   ` Andrew Cooper
2014-06-06 20:53     ` Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 14/19] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2014-06-06 20:40   ` Andrew Cooper
2014-06-06 21:21     ` Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 15/19] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 16/19] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 17/19] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 18/19] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
2014-06-06 17:40 ` [PATCH v7 19/19] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
2014-06-06 21:05   ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.