All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support
@ 2015-03-17 14:53 Boris Ostrovsky
  2015-03-17 14:53 ` [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called Boris Ostrovsky
                   ` (14 more replies)
  0 siblings, 15 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:53 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel


Changes in v19:

* Do not allow changing mode to/from OFF/ALL while guests are
  running. This significantly simplifies code due
  to large number of corner cases that I had to deal with. Most of the
  changes are in patch#5. This also makes patch 4 from last version
  unnecessary
* Defer NMI support (drop patch#14 from last version)
* Make patch#15 from last series be patch#1 (vpmu init cleanup)
* Other changes are listed per patch


Changes in v18:

* Return 1 (i.e. "handled") in vpmu_do_interrupt() if PMU_CACHED is
  set. This is needed since we can get an interrupt while this flag is
  set on AMD processors when multiple counters are in use (**) (AMD
  processor don't mask LVTPC when PMC interrupt happens and so there
  is a window in vpmu_do_interrupt() until it sets the mask
  bit). Patch #14
* Unload both current and last_vcpu (if different) vpmu and clear
  this_cpu(last_vcpu) in vpmu_unload_all. Patch #5
* Make major version check for certain xenpmu_ops. Patch #5
* Make xenpmu_op()'s first argument unsigned. Patch #5
* Don't use format specifier for __stringify(). Patch #6
* Don't print generic error in vpmu_init(). Patch #6
* Don't test for VPMU existance in vpmu_initialise(). New patch #15
* Added vpmu_disabled flag to make sure VPMU doesn't get reenabled from
  dom0 (for example when watchdog is active). Patch #5
* Updated tags on some patches to better reflect latest reviewed status)

(**) While testing this I discovered that AMD VPMU is quite broken for
HVM: when multiple counters are in use linux dom0 often gets
unexpected NMIs. This may have something to do with what I mentioned
in the first bullet. However, this doesn't appear to be related to
this patch series (or earlier VPMU patches) --- I can reproduce this
all the way back to 4.1

Changes in v17:
* Disable VPMU when unknown CPU vendor is detected (patch #2)
* Remove unnecessary vendor tests in vendor-specific init routines (patch #14)
* Remember first CPU that starts mode change and use it to stop the cycle (patch #13)
* If vpmu ops is not present, return 0 as value for VPMU MSR read (as opposed to 
  returning an error as was the case in previous patch.) (patch #18)
  * Change slightly vpmu_do_msr() logic as result of this chage (patch #20)
* stringify VPMU version (patch #14)
* Use 'CS > 1' to mark sample as PMU_SAMPLE_USER (patch #19)

Changes in v16:

* Many changes in VPMU mode patch (#13):
  * Replaced arguments to some vpmu routines (vcpu -> vpmu). New patch (#12)
  * Added vpmu_unload vpmu op to completely unload vpmu data (e.g clear
    MSR bitmaps). This routine may be called in context switch (vpmu_switch_to()).
  * Added vmx_write_guest_msr_vcpu() interface to write MSRs of non-current VCPU
  * Use cpumask_cycle instead of cpumask_next
  * Dropped timeout error
  * Adjusted types of mode variables
  * Don't allow oprofile to allocate its context on MSR access if VPMU context
    has already been allocated (which may happen when VMPU mode was set to off
    while the guest was running)
* vpmu_initialise() no longer turns off VPMU globally on failure. New patch (#2)
* vpmu_do_msr() will return 1 (failure) if vpmu_ops are not set. This is done to
  prevent PV guests that are not VPMU-enabled from wrongly assuming that they have
  access to counters (Linux check_hw_exists() will make this assumption) (patch #18)
* Add cpl field to shared structure that will be passed for HVM guests' samples
  (instead of PMU_SAMPLE_USER flag). Add PMU_SAMPLE_PV flag to mark whose sample
  is passed up. (Patches ## 10, 19, 22)

Changes in v15:

* Rewrote vpmu_force_context_switch() to use continue_hypercall_on_cpu()
* Added vpmu_init initcall that will call vendor-specific init routines
* Added a lock to vpmu_struct to serialize pmu_init()/pmu_finish()
* Use SS instead of CS for DPL (instead of RPL)
* Don't take lock for XENPMU_mode_get
* Make vmpu_mode/features an unsigned int (from uint64_t)
* Adjusted pvh_hypercall64_table[] order
* Replaced address range check [XEN_VIRT_START..XEN_VIRT_END] with guest_mode()
* A few style cleanups

Changes in v14:

* Moved struct xen_pmu_regs to pmu.h
* Moved CHECK_pmu_* to an earlier patch (when structures are first introduced)
* Added PMU_SAMPLE_REAL flag to indicate whether the sample was taken in real mode
* Simplified slightly setting rules for xenpmu_data flags
* Rewrote vpmu_force_context_switch() to again use continuations. (Returning EAGAIN
  to user would mean that VPMU mode may get into inconsistent state (across processors)
  and dealing with that is more compicated than I'd like).
* Fixed msraddr_to_bitpos() and converted it into an inline
* Replaced address range check in vmpu_do_interrupt() with guest_mode()
* No error returns from __initcall
* Rebased on top of recent VPMU changes
* Various cleanups

Changes in v13:

* Rearranged data in xenpf_symdata to eliminate a hole (no change in
  structure size)
* Removed unnecessary zeroing of last character in name string during
  symbol read hypercall
* Updated comment in access_vectors for pmu_use operation
* Compute AMD MSR bank size at runtime
* Moved a couple of BUILD_BUG_ON later, to when the structures they are
  checking are first used
* Added ss and eflags to xen_pmu_registers structure
* vpmu_force_context_switch() uses per-cpu tasklet pointers.
* Moved most of arch-specific VPMU initialization code into an __initcall()
  to avoid re-calculating/re-checking things more than once (new patch, #12)
* Replaced is_*_domain() with is_*_vcpu()
* choose_hwdom_vcpu() will now assume that hardware_domain->vcpu[idx]
  is always there (callers will still verify whether or not that's true)
* Fixed a couple of sampled vs. sampling tests in vpmu_do_interrupt()
* Pass CS to the guest unchanged, add pmu_flags's flag to indicate whether the
  sample was for a user or kernel space. Move pmu_flags from xen_pmu_data into
  xen_pmu_arch
* Removed local msr_content variable from vpmu_do_wrmsr()
* Re-arranged code in parse_vpmu_param()
* Made vpmu_interrupt_type checks test for value, not individual bits
* Moved PMU_SOFTIRQ definition into arch-specific header
* Moved vpmu*.c files into xen/arch/x86/cpu/ instead of xen/arch/x86/
* For hypervisor samples, report DOMID_XEN to the guest
* Changed logic to select which registers to report to the guest (include RIP
  check to see whether we are in the hypervisor)

Changes in v12:

* Added XSM support
* Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL
* Updated documentation for 'vpmu=nmi' option
* Added more text to a bunch of commit messages (per Konrad's request)

Changes in v11:

* Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch.
  - as part of this re-work noticed that CS registers were set in later patch then
    needed. Moved those changes to appropriate place
* Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only provide dom0
  with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will be what
  XENPMU_MODE_SELF used to be.
* Kept  vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around vmx_add_msr()
* Cleaned up VPMU context switch macros (moved  'if(prev!=next)' back to context_switch())
* Dropped hypercall continuation from vpmu_force_context_switch() and replaced it with
  -EAGAIN error if hypercall_preempt_check() is true after 2ms.
* Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr()
* Move context switching patch (#13) earlier in the series (for proper bisection support)
* Various comment updates and cleanups
* Dropped a bunch of Reviewed-by and all Tested-by tags

Changes in v10:

* Swapped address and name fields of xenpf_symdata (to make it smaller on 32-bit)
* Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more complicated.
* Cleaned up vlapic_reg_write()
* Call vpmu_destroy() for both HVM and PVH VCPUs
* Verify that (xen_pmu_data+PMU register bank) fit into a page
* Return error codes from arch-specific VPMU init code
* Moved VPMU-related context switch logic into inlines
* vpmu_force_context_switch() changes:
  o Avoid greater than page-sized allocations
  o Prevent another VCPU from starting VPMU sync while the first sync is in progress
* Avoid stack leak in do_xenpmu_op()
* Checked validity of Intel VPMU MSR values before they are committed
* Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on AMD)
* Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => sampling)
* Clarified commit messages (patches 2, 13, 18) 
* Various cleanups

Changes in v9:

* Restore VPMU context after context_saved() is called in
  context_switch(). This is needed because vpmu_load() may end up
  calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
  to be correctly set/cleared. (patch 18, dropped review acks)
* Added patch 2 to properly manage VPMU_CONTEXT_LOADED
* Addressed most of Jan's comments.
  o Keep track of time in vpmu_force_context_switch() to properly break
    out of a loop when using hypercall continuations
  o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
  o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
    make it more clear which vcpu we are using
  o Cleaned up vpmu_do_wrmsr()
  o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
    amd_vpmu_initialise(): throughout the code registers are declared as
    uint64_t and if we are to add a new type (e.g. reg_t) this should be
    done in a separate patch, unrelated to this series.
  o Various more minor cleanups and code style fixes
  
Changes in v8:

* Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
* Added compat checks for vpmu structures
* Converted vpmu flag manipulation macros to inline routines
* Reimplemented vpmu_unload_all() to avoid long loops
* Reworked PMU fault generation and handling (new patch #12)
* Added checks for domain->vcpu[] non-NULLness
* Added more comments, renamed some routines and macros, code style cleanup


Changes in v7:

* When reading hypervisor symbols make the caller pass buffer length
  (as opposed to having this length be part of the API). Make the
  hypervisor buffer static, make xensyms_read() return zero-length
  string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
  drop compat_pf_symdata definition.
* Spread PVH support across patches as opposed to lumping it into a
  separate patch
* Rename vpmu_is_set_all() to vpmu_are_all_set()
* Split VPMU cleanup patch in two
* Use memmove when copying VMX guest and host MSRs
* Make padding of xen_arch_pmu's context union a constand that does not
  depend on arch context size.
* Set interface version to 0.1
* Check pointer validity in pvpmu_init/destroy()
* Fixed crash in core2_vpmu_dump()
* Fixed crash in vmx_add_msr()
* Break handling of Intel and AMD MSRs in traps.c into separate cases
* Pass full CS selector to guests
* Add lock in pvpmu init code to prevent potential race


Changes in v6:

* Two new patches:
  o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
  o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
* Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled
  guest from overwriting sampled_vcpu written by de-scheduled VPCU.
* Keep track of enabled counters on Intel. This was removed in earlier patches and
  was a mistake. As result of this change struct vpmu will have a pointer to private
  context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer
  on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests
  earlier).
  Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed
  agan (core2_vpmu_do_wrmsr() routine, mostly)
* Replaced references to dom0 with hardware_domain (and is_control_domain with
  is_hardware_domain for consistency)
* Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
* Reverted unnecessary changes in vpmu_initialise()'s switch statement
* Fixed comment in vpmu_do_interrupt


Changes in v5:

* Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()")
  as no longer needed
* Added patch number 2 that marks context as loaded before PMU registers are
  loaded. This prevents situation where a PMU interrupt may occur while context
  is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
  code)
* Renamed xenpmu.h files to pmu.h
* More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
  has_hvm_container_domain(). Also explicitly disabled support for PVH until
  patch 16 to make distinction between usage of the above macros more clear.
* Added support for disabling VPMU support during runtime.
* Disable VPMUs for non-privileged domains when switching to privileged
  profiling mode
* Added ARM stub for xen_arch_pmu_t
* Separated vpmu_mode from vpmu_features
* Moved CS register query to make sure we use appropriate query mechanism for
  various guest types.
* LVTPC is now set from value in shared area, not copied from dom0
* Various code and comments cleanup as suggested by Jan.

Changes in v4:

* Added support for PVH guests:
  o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10
  o more careful use of is_hvm_domain
  o Additional patch (16)
* Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
* Fixed dom0's VCPU selection in privileged mode
* Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
  (don't want to expose compat_cpu_user_regs in a public header)
* Renamed public structures by prefixing them with "xen_"
* Added an entry for xenpf_symdata in xlat.lst
* Fixed pv_cpuid check for vpmu-specific cpuid adjustments
* Varios code style fixes
* Eliminated anonymous unions
* Added more verbiage to NMI patch description


Changes in v3:

* Moved PMU MSR banks out from architectural context data structures to allow
for future expansion without protocol changes
* PMU interrupts can be either NMIs or regular vector interrupts (the latter
is the default)
* Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
race conditions with the guest
* Fixed races with PV guest in MSR access handlers
* More Intel VPMU cleanup
* Moved NMI-unsafe code from NMI handler
* Dropped changes to vcpu->is_running
* Added LVTPC apic handling (cached for PV guests)
* Separated privileged profiling mode into a standalone patch
* Separated NMI handling into a standalone patch


Changes in v2:

* Xen symbols are exported as data structure (as opoosed to a set of formatted
strings in v1). Even though one symbol per hypercall is returned performance
appears to be acceptable: reading whole file from dom0 userland takes on average
about twice as long as reading /proc/kallsyms
* More cleanup of Intel VPMU code to simplify publicly exported structures
* There is an architecture-independent and x86-specific public include files (ARM
has a stub)
* General cleanup of public include files to make them more presentable (and
to make auto doc generation better)
* Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
changes to common/schedule.c architecture-independent). Note that this is not
tested since I don't have access to ARM hardware.
* PCPU ID of interrupted processor is now passed to PV guest


The following patch series adds PMU support in Xen for PV(H)
guests. There is a companion patchset for Linux kernel. In addition,
another set of changes will be provided (later) for userland perf
code.

This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that have running dom0 VCPUs
on them.
* No backtrace support.

A few notes that may help reviewing:

* A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
CPU is used for passing registers' values as well as PMU state at the time of
PMU interrupt.
* PMU interrupts are taken by hypervisor either as NMIs or regular vector
interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
and as virtual interrupts to PV(H) guests
* PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
accesses xenpmu_data_t and flushes it to HW it before returning.
* PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
in addition to 'vpmu' boot option (which is preserved for back compatibility).
The following modes are provided:
  * disable: VPMU is off
  * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
  * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
    in guests is suspended.
* /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
/proc/kallsyms)
* VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
up from hvm subtree



Boris Ostrovsky (14):
  x86/VPMU: VPMU should not exist when vpmu_initialise() is called
  common/symbols: Export hypervisor symbols to privileged guest
  x86/VPMU: Add public xenpmu.h
  x86/VPMU: Make vpmu not HVM-specific
  x86/VPMU: Interface for setting PMU mode and flags
  x86/VPMU: Initialize VPMUs with __initcall
  x86/VPMU: Initialize PMU for PV(H) guests
  x86/VPMU: Save VPMU state for PV guests during context switch
  x86/VPMU: When handling MSR accesses, leave fault injection to callers
  x86/VPMU: Add support for PMU register handling on PV guests
  x86/VPMU: Handle PMU interrupts for PV guests
  x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  x86/VPMU: Add privileged PMU mode
  x86/VPMU: Move VPMU files up from hvm/ directory

 tools/flask/policy/policy/modules/xen/xen.te |   7 +
 xen/arch/x86/cpu/Makefile                    |   1 +
 xen/arch/x86/cpu/vpmu.c                      | 791 ++++++++++++++++++++++
 xen/arch/x86/cpu/vpmu_amd.c                  | 498 ++++++++++++++
 xen/arch/x86/cpu/vpmu_intel.c                | 939 +++++++++++++++++++++++++++
 xen/arch/x86/domain.c                        |  27 +-
 xen/arch/x86/hvm/Makefile                    |   1 -
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/hvm/svm/Makefile                |   1 -
 xen/arch/x86/hvm/svm/svm.c                   |  10 +-
 xen/arch/x86/hvm/svm/vpmu.c                  | 497 --------------
 xen/arch/x86/hvm/vlapic.c                    |   2 +-
 xen/arch/x86/hvm/vmx/Makefile                |   1 -
 xen/arch/x86/hvm/vmx/vmx.c                   |  28 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c            | 877 -------------------------
 xen/arch/x86/hvm/vpmu.c                      | 316 ---------
 xen/arch/x86/oprofile/op_model_ppro.c        |   8 +-
 xen/arch/x86/platform_hypercall.c            |  28 +
 xen/arch/x86/traps.c                         |  64 +-
 xen/arch/x86/x86_64/compat/entry.S           |   4 +
 xen/arch/x86/x86_64/entry.S                  |   4 +
 xen/common/event_channel.c                   |   1 +
 xen/common/symbols.c                         |  54 ++
 xen/include/Makefile                         |   3 +-
 xen/include/asm-x86/domain.h                 |   2 +
 xen/include/asm-x86/hvm/vcpu.h               |   3 -
 xen/include/asm-x86/hvm/vmx/vmcs.h           |   2 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h     |  32 -
 xen/include/asm-x86/hvm/vpmu.h               | 123 ----
 xen/include/asm-x86/vpmu.h                   | 143 ++++
 xen/include/public/arch-arm.h                |   3 +
 xen/include/public/arch-x86/pmu.h            |  97 +++
 xen/include/public/platform.h                |  19 +
 xen/include/public/pmu.h                     |  90 +++
 xen/include/public/xen.h                     |   2 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xen/symbols.h                    |   3 +
 xen/include/xlat.lst                         |   6 +
 xen/include/xsm/dummy.h                      |  20 +
 xen/include/xsm/xsm.h                        |   6 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  28 +
 xen/xsm/flask/policy/access_vectors          |   6 +
 43 files changed, 2871 insertions(+), 1882 deletions(-)
 create mode 100644 xen/arch/x86/cpu/vpmu.c
 create mode 100644 xen/arch/x86/cpu/vpmu_amd.c
 create mode 100644 xen/arch/x86/cpu/vpmu_intel.c
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
@ 2015-03-17 14:53 ` Boris Ostrovsky
  2015-03-19  8:43   ` Dietmar Hahn
  2015-03-17 14:53 ` [PATCH v19 02/14] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:53 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

We don't need to try to destroy it since it can't be already allocated at the
time we try to initialize it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
---

Changes in v19:
* Removed unnecesary test for VPMU_CONTEXT_ALLOCATED in svm/vpmu.c

 xen/arch/x86/hvm/svm/vpmu.c | 3 ---
 xen/arch/x86/hvm/vpmu.c     | 5 +----
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 64dc167..6764070 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -359,9 +359,6 @@ static int amd_vpmu_initialise(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
 
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return 0;
-
     if ( counters == NULL )
     {
          switch ( family )
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 0e6b6c0..c3273ee 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -236,10 +236,7 @@ void vpmu_initialise(struct vcpu *v)
     if ( is_pvh_vcpu(v) )
         return;
 
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        vpmu_destroy(v);
-    vpmu_clear(vpmu);
-    vpmu->context = NULL;
+    ASSERT(!vpmu->flags && !vpmu->context);
 
     switch ( vendor )
     {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 02/14] common/symbols: Export hypervisor symbols to privileged guest
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2015-03-17 14:53 ` [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called Boris Ostrovsky
@ 2015-03-17 14:53 ` Boris Ostrovsky
  2015-03-17 14:54 ` [PATCH v19 03/14] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:53 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
hypercall

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/platform_hypercall.c   | 28 +++++++++++++++++++
 xen/common/symbols.c                | 54 +++++++++++++++++++++++++++++++++++++
 xen/include/public/platform.h       | 19 +++++++++++++
 xen/include/xen/symbols.h           |  3 +++
 xen/include/xlat.lst                |  1 +
 xen/xsm/flask/hooks.c               |  4 +++
 xen/xsm/flask/policy/access_vectors |  2 ++
 7 files changed, 111 insertions(+)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 334d474..7626261 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include <xen/cpu.h>
 #include <xen/pmstat.h>
 #include <xen/irq.h>
+#include <xen/symbols.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -798,6 +799,33 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_get_symbol:
+    {
+        static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
+        XEN_GUEST_HANDLE(char) nameh;
+        uint32_t namelen, copylen;
+
+        guest_from_compat_handle(nameh, op->u.symdata.name);
+
+        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
+                           &op->u.symdata.address, name);
+
+        namelen = strlen(name) + 1;
+
+        if ( namelen > op->u.symdata.namelen )
+            copylen = op->u.symdata.namelen;
+        else
+            copylen = namelen;
+
+        op->u.symdata.namelen = namelen;
+
+        if ( !ret && copy_to_guest(nameh, name, copylen) )
+            ret = -EFAULT;
+        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
+            ret = -EFAULT;
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index bc2fde6..2c0942d 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <public/platform.h>
+#include <xen/guest_access.h>
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -148,3 +150,55 @@ const char *symbols_lookup(unsigned long addr,
     *offset = addr - symbols_address(low);
     return namebuf;
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+    /*
+     * Get just the first code, look it up in the token table,
+     * and return the first char from this token.
+     */
+    return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name)
+{
+    /*
+     * Symbols are most likely accessed sequentially so we remember position
+     * from previous read. This can help us avoid the extra call to
+     * get_symbol_offset().
+     */
+    static uint64_t next_symbol, next_offset;
+    static DEFINE_SPINLOCK(symbols_mutex);
+
+    if ( *symnum > symbols_num_syms )
+        return -ERANGE;
+    if ( *symnum == symbols_num_syms )
+    {
+        /* No more symbols */
+        name[0] = '\0';
+        return 0;
+    }
+
+    spin_lock(&symbols_mutex);
+
+    if ( *symnum == 0 )
+        next_offset = next_symbol = 0;
+    if ( next_symbol != *symnum )
+        /* Non-sequential access */
+        next_offset = get_symbol_offset(*symnum);
+
+    *type = symbols_get_symbol_type(next_offset);
+    next_offset = symbols_expand_symbol(next_offset, name);
+    *address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
+
+    next_symbol = ++*symnum;
+
+    spin_unlock(&symbols_mutex);
+
+    return 0;
+}
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 82ec84e..1e6a6ce 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -590,6 +590,24 @@ struct xenpf_resource_op {
 typedef struct xenpf_resource_op xenpf_resource_op_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t);
 
+#define XENPF_get_symbol   63
+struct xenpf_symdata {
+    /* IN/OUT variables */
+    uint32_t namelen; /* IN:  size of name buffer                       */
+                      /* OUT: strlen(name) of hypervisor symbol (may be */
+                      /*      larger than what's been copied to guest)  */
+    uint32_t symnum;  /* IN:  Symbol to read                            */
+                      /* OUT: Next available symbol. If same as IN then */
+                      /*      we reached the end                        */
+
+    /* OUT variables */
+    XEN_GUEST_HANDLE(char) name;
+    uint64_t address;
+    char type;
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -619,6 +637,7 @@ struct xen_platform_op {
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
         struct xenpf_resource_op       resource_op;
+        struct xenpf_symdata           symdata;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 87cd77d..1fa0537 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -11,4 +11,7 @@ const char *symbols_lookup(unsigned long addr,
                            unsigned long *offset,
                            char *namebuf);
 
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..906e6fc 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -89,6 +89,7 @@
 ?	processor_px			platform.h
 !	psd_package			platform.h
 ?	xenpf_enter_acpi_sleep		platform.h
+!	xenpf_symdata			platform.h
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
 ?	xenpf_resource_entry		platform.h
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 65094bb..c6431b5 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1411,6 +1411,10 @@ static int flask_platform_op(uint32_t op)
         return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
                                     XEN2__RESOURCE_OP, NULL);
 
+    case XENPF_get_symbol:
+        return avc_has_perm(domain_sid(current->domain), SECINITSID_XEN,
+                            SECCLASS_XEN2, XEN2__GET_SYMBOL, NULL);
+
     default:
         printk("flask_platform_op: Unknown op %d\n", op);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 8f44b9d..3a97577 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -84,6 +84,8 @@ class xen2
     resource_op
 # XEN_SYSCTL_psr_cmt_op
     psr_cmt_op
+# XENPF_get_symbol
+    get_symbol
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 03/14] x86/VPMU: Add public xenpmu.h
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2015-03-17 14:53 ` [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called Boris Ostrovsky
  2015-03-17 14:53 ` [PATCH v19 02/14] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-17 14:54 ` [PATCH v19 04/14] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.

While making these updates, also:
* Remove unused vpmu_domain() macro from vpmu.h
* Convert msraddr_to_bitpos() into an inline and make it a little faster by
  realizing that all Intel's PMU-related MSRs are in the lower MSR range.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---

Change in v19:
* Moved PMU-related structs in xlat.lst to alphabetical order

 xen/arch/x86/hvm/svm/vpmu.c              |  83 +++++++++++----------
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 123 +++++++++++++++++--------------
 xen/arch/x86/hvm/vpmu.c                  |  10 +++
 xen/arch/x86/oprofile/op_model_ppro.c    |   6 +-
 xen/include/Makefile                     |   3 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 --------
 xen/include/asm-x86/hvm/vpmu.h           |  16 ++--
 xen/include/public/arch-arm.h            |   3 +
 xen/include/public/arch-x86/pmu.h        |  91 +++++++++++++++++++++++
 xen/include/public/pmu.h                 |  38 ++++++++++
 xen/include/xlat.lst                     |   4 +
 11 files changed, 275 insertions(+), 134 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 6764070..a8b79df 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vpmu.h>
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include <public/pmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,9 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
     MSR_K7_PERFCTR0,
@@ -83,12 +83,14 @@ static const u32 AMD_F15H_CTRLS[] = {
     MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-    u64 counters[MAX_NUM_COUNTERS];
-    u64 ctrls[MAX_NUM_COUNTERS];
-    bool_t msr_bitmap_set;
-};
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)    do {                                    \
+                                   (vpmu)->priv_context = (void *)-1L; \
+                               } while (0)
+#define msr_bitmap_off(vpmu)   do {                                    \
+                                   (vpmu)->priv_context = NULL;        \
+                               } while (0)
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
 
 static inline int get_pmu_reg_type(u32 addr)
 {
@@ -142,7 +144,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -150,14 +151,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
     }
 
-    ctxt->msr_bitmap_set = 1;
+    msr_bitmap_on(vpmu);
 }
 
 static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -165,7 +165,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
     }
 
-    ctxt->msr_bitmap_set = 0;
+    msr_bitmap_off(vpmu);
 }
 
 static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
@@ -177,19 +177,22 @@ static inline void context_load(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     for ( i = 0; i < num_counters; i++ )
     {
-        wrmsrl(counters[i], ctxt->counters[i]);
-        wrmsrl(ctrls[i], ctxt->ctrls[i]);
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
     }
 }
 
 static void amd_vpmu_load(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     vpmu_reset(vpmu, VPMU_FROZEN);
 
@@ -198,7 +201,7 @@ static void amd_vpmu_load(struct vcpu *v)
         unsigned int i;
 
         for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctxt->ctrls[i]);
+            wrmsrl(ctrls[i], ctrl_regs[i]);
 
         return;
     }
@@ -212,17 +215,17 @@ static inline void context_save(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
 
     /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
     for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], ctxt->counters[i]);
+        rdmsrl(counters[i], counter_regs[i]);
 }
 
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctx = vpmu->context;
     unsigned int i;
 
     /*
@@ -245,7 +248,7 @@ static int amd_vpmu_save(struct vcpu *v)
     context_save(v);
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_vcpu(v) && ctx->msr_bitmap_set )
+         has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -256,7 +259,9 @@ static void context_update(unsigned int msr, u64 msr_content)
     unsigned int i;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     if ( k7_counters_mirrored &&
         ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
@@ -268,12 +273,12 @@ static void context_update(unsigned int msr, u64 msr_content)
     {
        if ( msr == ctrls[i] )
        {
-           ctxt->ctrls[i] = msr_content;
+           ctrl_regs[i] = msr_content;
            return;
        }
         else if (msr == counters[i] )
         {
-            ctxt->counters[i] = msr_content;
+            counter_regs[i] = msr_content;
             return;
         }
     }
@@ -303,8 +308,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
 
-        if ( has_hvm_container_vcpu(v) &&
-             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -313,8 +317,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( has_hvm_container_vcpu(v) &&
-             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -355,7 +358,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 static int amd_vpmu_initialise(struct vcpu *v)
 {
-    struct amd_vpmu_context *ctxt;
+    struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
 
@@ -382,7 +385,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc(struct amd_vpmu_context);
+    ctxt = xzalloc_bytes(sizeof(*ctxt) +
+                         2 * sizeof(uint64_t) * num_counters);
     if ( !ctxt )
     {
         gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
@@ -391,7 +395,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
         return -ENOMEM;
     }
 
+    ctxt->counters = sizeof(*ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
+
     vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
     return 0;
 }
@@ -400,8 +408,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( has_hvm_container_vcpu(v) &&
-         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
@@ -418,7 +425,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
 static void amd_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct amd_vpmu_context *ctxt = vpmu->context;
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    const uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    const uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
     unsigned int i;
 
     printk("    VPMU state: 0x%x ", vpmu->flags);
@@ -448,8 +457,8 @@ static void amd_vpmu_dump(const struct vcpu *v)
         rdmsrl(ctrls[i], ctrl);
         rdmsrl(counters[i], cntr);
         printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctxt->ctrls[i], ctrl,
-               counters[i], ctxt->counters[i], cntr);
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
     }
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 7793145..c2405bf 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -35,8 +35,8 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <public/sched.h>
 #include <public/hvm/save.h>
+#include <public/pmu.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
@@ -68,6 +68,10 @@
 #define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
 static bool_t __read_mostly full_width_write;
 
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
 /*
  * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
  * counters. 4 bits for every counter.
@@ -75,17 +79,6 @@ static bool_t __read_mostly full_width_write;
 #define FIXED_CTR_CTRL_BITS 4
 #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
 
-#define VPMU_CORE2_MAX_FIXED_PMCS     4
-struct core2_vpmu_context {
-    u64 fixed_ctrl;
-    u64 ds_area;
-    u64 pebs_enable;
-    u64 global_ovf_status;
-    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
-    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 /* Number of general-purpose and fixed performance counters */
 static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
 
@@ -222,6 +215,12 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     }
 }
 
+static inline int msraddr_to_bitpos(int x)
+{
+    ASSERT(x == (x & 0x1fff));
+    return x;
+}
+
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
@@ -291,12 +290,15 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 static inline void __core2_vpmu_save(struct vcpu *v)
 {
     int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -319,10 +321,13 @@ static int core2_vpmu_save(struct vcpu *v)
 static inline void __core2_vpmu_load(struct vcpu *v)
 {
     unsigned int i, pmc_start;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
@@ -330,8 +335,8 @@ static inline void __core2_vpmu_load(struct vcpu *v)
         pmc_start = MSR_IA32_PERFCTR0;
     for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL(i), core2_vpmu_cxt->arch_msr_pair[i].control);
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
     }
 
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
@@ -354,7 +359,8 @@ static void core2_vpmu_load(struct vcpu *v)
 static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *p = NULL;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
@@ -367,12 +373,20 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-    if ( !core2_vpmu_cxt )
+    core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
+                                   sizeof(uint64_t) * fixed_pmc_cnt +
+                                   sizeof(struct xen_pmu_cntr_pair) *
+                                   arch_pmc_cnt);
+    p = xzalloc(uint64_t);
+    if ( !core2_vpmu_cxt || !p )
         goto out_err;
 
-    vpmu->context = (void *)core2_vpmu_cxt;
+    core2_vpmu_cxt->fixed_counters = sizeof(*core2_vpmu_cxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+                                    sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = core2_vpmu_cxt;
+    vpmu->priv_context = p;
 
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
@@ -381,6 +395,9 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 out_err:
     release_pmu_ownship(PMU_OWNER_HVM);
 
+    xfree(core2_vpmu_cxt);
+    xfree(p);
+
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
            v->vcpu_id, v->domain->domain_id);
 
@@ -418,7 +435,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
+    uint64_t *enabled_cntrs;
 
     if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -446,10 +464,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     ASSERT(!supported);
 
     core2_vpmu_cxt = vpmu->context;
+    enabled_cntrs = vpmu->priv_context;
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_status &= ~msr_content;
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -483,15 +502,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        core2_vpmu_cxt->enabled_cntrs &=
-                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
             u64 val = msr_content;
             for ( i = 0; i < fixed_pmc_cnt; i++ )
             {
                 if ( val & 3 )
-                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                    *enabled_cntrs |= (1ULL << 32) << i;
                 val >>= FIXED_CTR_CTRL_BITS;
             }
         }
@@ -502,19 +520,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         tmp = msr - MSR_P6_EVNTSEL(0);
         if ( tmp >= 0 && tmp < arch_pmc_cnt )
         {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
-                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+                *enabled_cntrs |= 1ULL << tmp;
             else
-                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+                *enabled_cntrs &= ~(1ULL << tmp);
 
-            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            xen_pmu_cntr_pair[tmp].control = msr_content;
         }
     }
 
-    if ( (global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
-         (core2_vpmu_cxt->ds_area != 0)  )
+    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -560,7 +580,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
 
     if ( core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -571,7 +591,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = 0;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_ovf_status;
+            *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
@@ -620,10 +640,12 @@ static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
     unsigned int i;
-    const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
     u64 val;
+    uint64_t *fixed_counters;
+    struct xen_pmu_cntr_pair *cntr_pair;
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
          return;
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
@@ -636,16 +658,15 @@ static void core2_vpmu_dump(const struct vcpu *v)
     }
 
     printk("    vPMU running\n");
-    core2_vpmu_cxt = vpmu->context;
+
+    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
 
     /* Print the contents of the counter and its configuration msr. */
     for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
-
         printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, msr_pair[i].counter, msr_pair[i].control);
-    }
+            i, cntr_pair[i].counter, cntr_pair[i].control);
+
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
@@ -654,7 +675,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, core2_vpmu_cxt->fix_counters[i],
+               i, fixed_counters[i],
                val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
@@ -665,14 +686,14 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     u64 msr_content;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
 
     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
     if ( msr_content )
     {
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_ovf_status |= msr_content;
+        core2_vpmu_cxt->global_status |= msr_content;
         msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
@@ -739,13 +760,6 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
-    {
-        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
-        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
-               fixed_pmc_cnt);
-    }
-
     check_pmc_quirk();
     return 0;
 }
@@ -755,6 +769,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     xfree(vpmu->context);
+    xfree(vpmu->priv_context);
     if ( has_hvm_container_vcpu(v) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index c3273ee..ef88528 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -32,6 +32,13 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <public/pmu.h>
+
+#include <compat/pmu.h>
+CHECK_pmu_intel_ctxt;
+CHECK_pmu_amd_ctxt;
+CHECK_pmu_cntr_pair;
+CHECK_pmu_regs;
 
 /*
  * "vpmu" :     vpmu generally enabled
@@ -233,6 +240,9 @@ void vpmu_initialise(struct vcpu *v)
     uint8_t vendor = current_cpu_data.x86_vendor;
     int ret;
 
+    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
+
     if ( is_pvh_vcpu(v) )
         return;
 
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index aa99e4d..ca429a1 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -20,11 +20,15 @@
 #include <asm/regs.h>
 #include <asm/current.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
 
+struct arch_msr_pair {
+    u64 counter;
+    u64 control;
+};
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/xen/include/Makefile b/xen/include/Makefile
index d48a642..aeded2c 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -22,10 +22,11 @@ headers-y := \
     compat/version.h \
     compat/xen.h \
     compat/xenoprof.h
+headers-$(CONFIG_X86)     += compat/arch-x86/pmu.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-mca.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-$(compat-arch-y).h
-headers-y                 += compat/arch-$(compat-arch-y).h compat/xlat.h
+headers-y                 += compat/arch-$(compat-arch-y).h compat/pmu.h compat/xlat.h
 headers-$(FLASK_ENABLE)   += compat/xsm/flask_op.h
 
 cppflags-y                := -include public/xen-compat.h
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
deleted file mode 100644
index 410372d..0000000
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-
-/*
- * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_CORE_H_
-#define __ASM_X86_HVM_VPMU_CORE_H_
-
-struct arch_msr_pair {
-    u64 counter;
-    u64 control;
-};
-
-#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
-
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 9c4e65a..83eea7e 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -22,6 +22,8 @@
 #ifndef __ASM_X86_HVM_VPMU_H_
 #define __ASM_X86_HVM_VPMU_H_
 
+#include <public/pmu.h>
+
 /*
  * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
  * See arch/x86/hvm/vpmu.c.
@@ -29,12 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.hvm_vcpu.vpmu))
-#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
@@ -42,6 +41,9 @@
 #define MSR_TYPE_ARCH_COUNTER       3
 #define MSR_TYPE_ARCH_CTRL          4
 
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
 
 /* Arch specific operations shared by all vpmus */
 struct arch_vpmu_ops {
@@ -65,7 +67,8 @@ struct vpmu_struct {
     u32 flags;
     u32 last_pcpu;
     u32 hw_lapic_lvtpc;
-    void *context;
+    void *context;      /* May be shared with PV guest */
+    void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
 };
 
@@ -77,11 +80,6 @@ struct vpmu_struct {
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
 
-/* VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-
 static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
 {
     vpmu->flags |= mask;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index c2dcb66..f900925 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -420,6 +420,9 @@ typedef uint64_t xen_callback_t;
 
 #endif
 
+/* Stub definition of PMU structure */
+typedef struct xen_pmu_arch {} xen_pmu_arch_t;
+
 #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
 
 /*
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
new file mode 100644
index 0000000..8bbe341
--- /dev/null
+++ b/xen/include/public/arch-x86/pmu.h
@@ -0,0 +1,91 @@
+#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
+#define __XEN_PUBLIC_ARCH_X86_PMU_H__
+
+/* x86-specific PMU definitions */
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+    /* Offsets to counter and control MSRs (relative to xen_pmu_arch.c.amd) */
+    uint32_t counters;
+    uint32_t ctrls;
+};
+typedef struct xen_pmu_amd_ctxt xen_pmu_amd_ctxt_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_amd_ctxt_t);
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+    uint64_t counter;
+    uint64_t control;
+};
+typedef struct xen_pmu_cntr_pair xen_pmu_cntr_pair_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_cntr_pair_t);
+
+struct xen_pmu_intel_ctxt {
+    uint64_t global_ctrl;
+    uint64_t global_ovf_ctrl;
+    uint64_t global_status;
+    uint64_t fixed_ctrl;
+    uint64_t ds_area;
+    uint64_t pebs_enable;
+    uint64_t debugctl;
+    /*
+     * Offsets to fixed and architectural counter MSRs (relative to
+     * xen_pmu_arch.c.intel)
+     */
+    uint32_t fixed_counters;
+    uint32_t arch_counters;
+};
+typedef struct xen_pmu_intel_ctxt xen_pmu_intel_ctxt_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_intel_ctxt_t);
+
+/* Sampled domain's registers */
+struct xen_pmu_regs {
+    uint64_t ip;
+    uint64_t sp;
+    uint64_t flags;
+    uint16_t cs;
+    uint16_t ss;
+    uint8_t cpl;
+    uint8_t pad[3];
+};
+typedef struct xen_pmu_regs xen_pmu_regs_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
+
+struct xen_pmu_arch {
+    union {
+        struct xen_pmu_regs regs;
+        /* Padding for adding new registers to xen_pmu_regs in the future */
+#define XENPMU_REGS_PAD_SZ  64
+        uint8_t pad[XENPMU_REGS_PAD_SZ];
+    } r;
+    uint64_t pmu_flags;
+    union {
+        uint32_t lapic_lvtpc;
+        uint64_t pad;
+    } l;
+    union {
+        struct xen_pmu_amd_ctxt amd;
+        struct xen_pmu_intel_ctxt intel;
+
+        /*
+         * Padding for contexts (fixed parts only, does not include MSR banks
+         * that are specified by offsets
+         */
+#define XENPMU_CTXT_PAD_SZ  128
+        uint8_t pad[XENPMU_CTXT_PAD_SZ];
+    } c;
+};
+typedef struct xen_pmu_arch xen_pmu_arch_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
new file mode 100644
index 0000000..f97106d
--- /dev/null
+++ b/xen/include/public/pmu.h
@@ -0,0 +1,38 @@
+#ifndef __XEN_PUBLIC_PMU_H__
+#define __XEN_PUBLIC_PMU_H__
+
+#include "xen.h"
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/pmu.h"
+#elif defined (__arm__) || defined (__aarch64__)
+#include "arch-arm.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    1
+
+
+/* Shared between hypervisor and PV domain */
+struct xen_pmu_data {
+    uint32_t domain_id;
+    uint32_t vcpu_id;
+    uint32_t pcpu_id;
+
+    uint32_t pad;
+
+    xen_pmu_arch_t pmu;
+};
+
+#endif /* __XEN_PUBLIC_PMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 906e6fc..94eedad 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -8,6 +8,10 @@
 !	start_info			xen.h
 ?	vcpu_info			xen.h
 ?	vcpu_time_info			xen.h
+?	pmu_amd_ctxt			arch-x86/pmu.h
+?	pmu_cntr_pair			arch-x86/pmu.h
+?	pmu_intel_ctxt			arch-x86/pmu.h
+?	pmu_regs			arch-x86/pmu.h
 !	cpu_user_regs			arch-x86/xen-@arch@.h
 !	trap_info			arch-x86/xen.h
 ?	cpu_offline_action		arch-x86/xen-mca.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 04/14] x86/VPMU: Make vpmu not HVM-specific
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 03/14] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-17 14:54 ` [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 5 ++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 9cdffa8..2686a4f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -434,6 +434,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    struct vpmu_struct vpmu;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 3d8f4dc..0faf60d 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -151,9 +151,6 @@ struct hvm_vcpu {
     u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
 
-    /* VPMU */
-    struct vpmu_struct  vpmu;
-
     union {
         struct arch_vmx_struct vmx;
         struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 83eea7e..82bfa0e 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,8 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 04/14] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-19 13:12   ` Dietmar Hahn
  2015-03-24 13:56   ` Jan Beulich
  2015-03-17 14:54 ` [PATCH v19 06/14] x86/VPMU: Initialize VPMUs with __initcall Boris Ostrovsky
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* XENPMU_MODE_OFF:  PMU is not virtualized
* XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
* XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
  can profile itself and the hypervisor.

Note that PMU modes are different from what can be provided at Xen's boot line
with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
Any other value, on the other hand, will cause VPMU mode to be set to
XENPMU_MODE_SELF during boot.

For feature flags only Intel's BTS is currently supported.

Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---

Changes in v19:
* Keep track of active vpmu count and allow certain mode changes only when the count
  is zero
* Drop vpmu_unload routines
* Revert to to using opt_vpmu_enabled
* Changes to oprofile code are no longer needed
* Changes to vmcs.c are no longer needed
* Simplified vpmu_switch_from/to inlines

 tools/flask/policy/policy/modules/xen/xen.te |   3 +
 xen/arch/x86/domain.c                        |   4 +-
 xen/arch/x86/hvm/svm/vpmu.c                  |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c            |  10 +-
 xen/arch/x86/hvm/vpmu.c                      | 155 +++++++++++++++++++++++++--
 xen/arch/x86/x86_64/compat/entry.S           |   4 +
 xen/arch/x86/x86_64/entry.S                  |   4 +
 xen/include/asm-x86/hvm/vpmu.h               |  27 +++--
 xen/include/public/pmu.h                     |  45 ++++++++
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xlat.lst                         |   1 +
 xen/include/xsm/dummy.h                      |  15 +++
 xen/include/xsm/xsm.h                        |   6 ++
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  18 ++++
 xen/xsm/flask/policy/access_vectors          |   2 +
 17 files changed, 279 insertions(+), 25 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index c0128aa..870ff81 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -68,6 +68,9 @@ allow dom0_t xen_t:xen2 {
     resource_op
     psr_cmt_op
 };
+allow dom0_t xen_t:xen2 {
+    pmu_ctrl
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 21f0766..60d9a80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1536,7 +1536,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     if ( is_hvm_vcpu(prev) )
     {
         if (prev != next)
-            vpmu_save(prev);
+            vpmu_switch_from(prev);
 
         if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
             pt_save_timer(prev);
@@ -1581,7 +1581,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
     if (is_hvm_vcpu(next) && (prev != next) )
         /* Must be done with interrupts enabled */
-        vpmu_load(next);
+        vpmu_switch_to(next);
 
     context_saved(prev);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index a8b79df..481ea7b 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -472,14 +472,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
     .arch_vpmu_dump = amd_vpmu_dump
 };
 
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int svm_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
     int ret = 0;
 
     /* vpmu enabled? */
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index c2405bf..6280644 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -708,13 +708,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+static int core2_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     u64 msr_content;
     static bool_t ds_warned;
 
-    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
         goto func_out;
     /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
     while ( boot_cpu_has(X86_FEATURE_DS) )
@@ -826,7 +826,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
     .do_cpuid = core2_no_vpmu_do_cpuid,
 };
 
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int vmx_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
@@ -834,7 +834,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     int ret = 0;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     if ( family == 6 )
@@ -877,7 +877,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         /* future: */
         case 0x3d:
         case 0x4e:
-            ret = core2_vpmu_initialise(v, vpmu_flags);
+            ret = core2_vpmu_initialise(v);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;
             return ret;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index ef88528..164accf 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,8 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -33,8 +35,10 @@
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
 #include <public/pmu.h>
+#include <xsm/xsm.h>
 
 #include <compat/pmu.h>
+CHECK_pmu_params;
 CHECK_pmu_intel_ctxt;
 CHECK_pmu_amd_ctxt;
 CHECK_pmu_cntr_pair;
@@ -46,9 +50,14 @@ CHECK_pmu_regs;
  * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
  */
 static unsigned int __read_mostly opt_vpmu_enabled;
+unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+unsigned int __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
+static DEFINE_SPINLOCK(vpmu_lock);
+static unsigned vpmu_count;
+
 static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
 
 static void __init parse_vpmu_param(char *s)
@@ -59,7 +68,7 @@ static void __init parse_vpmu_param(char *s)
         break;
     default:
         if ( !strcmp(s, "bts") )
-            opt_vpmu_enabled |= VPMU_BOOT_BTS;
+            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
         else if ( *s )
         {
             printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
@@ -67,7 +76,9 @@ static void __init parse_vpmu_param(char *s)
         }
         /* fall through */
     case 1:
-        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
+        /* Default VPMU mode */
+        vpmu_mode = XENPMU_MODE_SELF;
+        opt_vpmu_enabled = 1;
         break;
     }
 }
@@ -76,7 +87,7 @@ void vpmu_lvtpc_update(uint32_t val)
 {
     struct vpmu_struct *vpmu;
 
-    if ( !opt_vpmu_enabled )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return;
 
     vpmu = vcpu_vpmu(current);
@@ -89,6 +100,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
         return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
     return 0;
@@ -98,6 +112,12 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+    {
+        *msr_content = 0;
+        return 0;
+    }
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
         return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
     return 0;
@@ -248,28 +268,41 @@ void vpmu_initialise(struct vcpu *v)
 
     ASSERT(!vpmu->flags && !vpmu->context);
 
+    spin_lock(&vpmu_lock);
+    vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
+    spin_unlock(&vpmu_lock);
+
     switch ( vendor )
     {
     case X86_VENDOR_AMD:
-        ret = svm_vpmu_initialise(v, opt_vpmu_enabled);
+        ret = svm_vpmu_initialise(v);
         break;
 
     case X86_VENDOR_INTEL:
-        ret = vmx_vpmu_initialise(v, opt_vpmu_enabled);
+        ret = vmx_vpmu_initialise(v);
         break;
 
     default:
-        if ( opt_vpmu_enabled )
+        if ( vpmu_mode != XENPMU_MODE_OFF )
         {
             printk(XENLOG_G_WARNING "VPMU: Unknown CPU vendor %d. "
                    "Disabling VPMU\n", vendor);
             opt_vpmu_enabled = 0;
+            vpmu_mode = XENPMU_MODE_OFF;
         }
-        return;
+        return; /* Don't bother restoring vpmu_count, VPMU is off forever */
     }
 
     if ( ret )
         printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
+
+    /* Intel needs to initialize VPMU ops even if VPMU is not in use */
+    if ( ret || (vpmu_mode == XENPMU_MODE_OFF) )
+    {
+        spin_lock(&vpmu_lock);
+        vpmu_count--;
+        spin_unlock(&vpmu_lock);
+    }
 }
 
 static void vpmu_clear_last(void *arg)
@@ -298,6 +331,10 @@ void vpmu_destroy(struct vcpu *v)
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+
+    spin_lock(&vpmu_lock);
+    vpmu_count--;
+    spin_unlock(&vpmu_lock);
 }
 
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
@@ -309,6 +346,109 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret;
+    struct xen_pmu_params pmu_params = {.val = 0};
+
+    if ( !opt_vpmu_enabled )
+        return -EOPNOTSUPP;
+
+    ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
+    if ( ret )
+        return ret;
+
+    /* Check major version when parameters are specified */
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    case XENPMU_feature_set:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.version.maj != XENPMU_VER_MAJ )
+            return -EINVAL;
+    }
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    {
+        if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) ||
+             (hweight64(pmu_params.val) > 1) )
+            return -EINVAL;
+
+        /* 32-bit dom0 can only sample itself. */
+        if ( is_pv_32bit_vcpu(current) && (pmu_params.val & XENPMU_MODE_HV) )
+            return -EINVAL;
+
+        spin_lock(&vpmu_lock);
+
+        /*
+         * We can always safely switch between XENPMU_MODE_SELF and
+         * XENPMU_MODE_HV while other VPMUs are active.
+         */
+        if ( (vpmu_count == 0) || (vpmu_mode == pmu_params.val) ||
+             ((vpmu_mode ^ pmu_params.val) ==
+              (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+            vpmu_mode = pmu_params.val;
+        else
+        {
+            printk(XENLOG_WARNING "VPMU: Cannot change mode while"
+                                  " active VPMUs exist\n");
+            ret = -EBUSY;
+        }
+
+        spin_unlock(&vpmu_lock);
+
+        break;
+    }
+
+    case XENPMU_mode_get:
+        memset(&pmu_params, 0, sizeof(pmu_params));
+        pmu_params.val = vpmu_mode;
+
+        pmu_params.version.maj = XENPMU_VER_MAJ;
+        pmu_params.version.min = XENPMU_VER_MIN;
+
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+
+        break;
+
+    case XENPMU_feature_set:
+        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        spin_lock(&vpmu_lock);
+
+        if ( vpmu_count == 0 )
+            vpmu_features = pmu_params.val;
+        else
+        {
+            printk(XENLOG_WARNING "VPMU: Cannot change features while"
+                                  " active VPMUs exist\n");
+            ret = -EBUSY;
+        }
+
+        spin_unlock(&vpmu_lock);
+
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.val = vpmu_features;
+        if ( copy_field_to_guest(arg, &pmu_params, val) )
+            return -EFAULT;
+
+        break;
+
+    default:
+        ret = -EINVAL;
+    }
+
+    return ret;
+}
+
 static int __init vpmu_init(void)
 {
     /* NMI watchdog uses LVTPC and HW counter */
@@ -316,6 +456,7 @@ static int __init vpmu_init(void)
     {
         printk(XENLOG_WARNING "NMI watchdog is enabled. Turning VPMU off.\n");
         opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
     }
 
     return 0;
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 5b0af61..7691a79 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall           /* reserved for XenClient */
+        .quad do_xenpmu_op              /* 40 */
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -466,6 +468,8 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 0 /* reserved for XenClient   */
+        .byte 2 /* do_xenpmu_op             */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 2d25d57..403e97c 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -764,6 +764,8 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall       /* reserved for XenClient */
+        .quad do_xenpmu_op          /* 40 */
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -813,6 +815,8 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 0 /* reserved for XenClient */
+        .byte 2 /* do_xenpmu_op         */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 82bfa0e..88ffc19 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -24,13 +24,6 @@
 
 #include <public/pmu.h>
 
-/*
- * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
- * See arch/x86/hvm/vpmu.c.
- */
-#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
-#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
-
 #define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
 #define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
@@ -59,8 +52,8 @@ struct arch_vpmu_ops {
     void (*arch_vpmu_dump)(const struct vcpu *);
 };
 
-int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
-int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
+int vmx_vpmu_initialise(struct vcpu *);
+int svm_vpmu_initialise(struct vcpu *);
 
 struct vpmu_struct {
     u32 flags;
@@ -116,5 +109,21 @@ void vpmu_dump(struct vcpu *v);
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
+extern unsigned int vpmu_mode;
+extern unsigned int vpmu_features;
+
+/* Context switch */
+static inline void vpmu_switch_from(struct vcpu *prev)
+{
+    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        vpmu_save(prev);
+}
+
+static inline void vpmu_switch_to(struct vcpu *next)
+{
+    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        vpmu_load(next);
+}
+
 #endif /* __ASM_X86_HVM_VPMU_H_*/
 
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index f97106d..66cc494 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -13,6 +13,51 @@
 #define XENPMU_VER_MAJ    0
 #define XENPMU_VER_MIN    1
 
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_feature_get     2
+#define XENPMU_feature_set     3
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xen_pmu_params {
+    /* IN/OUT parameters */
+    struct {
+        uint32_t maj;
+        uint32_t min;
+    } version;
+    uint64_t val;
+
+    /* IN parameters */
+    uint32_t vcpu;
+    uint32_t pad;
+};
+typedef struct xen_pmu_params xen_pmu_params_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_SELF:  Guests can profile themselves
+ * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ */
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_SELF          (1<<0)
+#define XENPMU_MODE_HV            (1<<1)
+
+/*
+ * PMU features:
+ * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FEATURE_INTEL_BTS  1
 
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 3703c39..0dd3c97 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index eda8a36..ef665db 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -14,6 +14,7 @@
 #include <public/event_channel.h>
 #include <public/tmem.h>
 #include <public/version.h>
+#include <public/pmu.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -144,6 +145,9 @@ do_tmem_op(
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+extern long
+do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 94eedad..19ab5f7 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -97,6 +97,7 @@
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
 ?	xenpf_resource_entry		platform.h
+?	pmu_params			pmu.h
 !	sched_poll			sched.h
 ?	sched_remote_shutdown		sched.h
 ?	sched_shutdown			sched.h
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index f20e89c..c637454 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -655,4 +655,19 @@ static XSM_INLINE int xsm_ioport_mapping(XSM_DEFAULT_ARG struct domain *d, uint3
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    case XENPMU_mode_get:
+    case XENPMU_feature_set:
+    case XENPMU_feature_get:
+        return xsm_default_action(XSM_PRIV, d, current->domain);
+    default:
+        return -EPERM;
+    }
+}
+
 #endif /* CONFIG_X86 */
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 4ce089f..90edbb1 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -173,6 +173,7 @@ struct xsm_operations {
     int (*unbind_pt_irq) (struct domain *d, struct xen_domctl_bind_pt_irq *bind);
     int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
+    int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
 };
 
@@ -665,6 +666,11 @@ static inline int xsm_ioport_mapping (xsm_default_t def, struct domain *d, uint3
     return xsm_ops->ioport_mapping(d, s, e, allow);
 }
 
+static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, int op)
+{
+    return xsm_ops->pmu_op(d, op);
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* XSM_NO_WRAPPERS */
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 8eb3050..94f1cf0 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -144,5 +144,6 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, unbind_pt_irq);
     set_to_dummy_if_null(ops, ioport_permission);
     set_to_dummy_if_null(ops, ioport_mapping);
+    set_to_dummy_if_null(ops, pmu_op);
 #endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index c6431b5..982e879 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1505,6 +1505,23 @@ static int flask_unbind_pt_irq (struct domain *d, struct xen_domctl_bind_pt_irq
 {
     return current_has_perm(d, SECCLASS_RESOURCE, RESOURCE__REMOVE);
 }
+
+static int flask_pmu_op (struct domain *d, unsigned int op)
+{
+    u32 dsid = domain_sid(d);
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    case XENPMU_mode_get:
+    case XENPMU_feature_set:
+    case XENPMU_feature_get:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
+                            XEN2__PMU_CTRL, NULL);
+    default:
+        return -EPERM;
+    }
+}
 #endif /* CONFIG_X86 */
 
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
@@ -1627,6 +1644,7 @@ static struct xsm_operations flask_ops = {
     .unbind_pt_irq = flask_unbind_pt_irq,
     .ioport_permission = flask_ioport_permission,
     .ioport_mapping = flask_ioport_mapping,
+    .pmu_op = flask_pmu_op,
 #endif
 };
 
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 3a97577..626850d 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -86,6 +86,8 @@ class xen2
     psr_cmt_op
 # XENPF_get_symbol
     get_symbol
+# PMU control
+    pmu_ctrl
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 06/14] x86/VPMU: Initialize VPMUs with __initcall
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-17 14:54 ` [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Move some VPMU initilization operations into __initcalls to avoid performing
same tests and calculations for each vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 106 ++++++++++++--------------
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 151 +++++++++++++++++++-------------------
 xen/arch/x86/hvm/vpmu.c           |  32 ++++++++
 xen/include/asm-x86/hvm/vpmu.h    |   2 +
 4 files changed, 155 insertions(+), 136 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 481ea7b..b60ca40 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -356,54 +356,6 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     return 1;
 }
 
-static int amd_vpmu_initialise(struct vcpu *v)
-{
-    struct xen_pmu_amd_ctxt *ctxt;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-
-    if ( counters == NULL )
-    {
-         switch ( family )
-	 {
-	 case 0x15:
-	     num_counters = F15H_NUM_COUNTERS;
-	     counters = AMD_F15H_COUNTERS;
-	     ctrls = AMD_F15H_CTRLS;
-	     k7_counters_mirrored = 1;
-	     break;
-	 case 0x10:
-	 case 0x12:
-	 case 0x14:
-	 case 0x16:
-	 default:
-	     num_counters = F10H_NUM_COUNTERS;
-	     counters = AMD_F10H_COUNTERS;
-	     ctrls = AMD_F10H_CTRLS;
-	     k7_counters_mirrored = 0;
-	     break;
-	 }
-    }
-
-    ctxt = xzalloc_bytes(sizeof(*ctxt) +
-                         2 * sizeof(uint64_t) * num_counters);
-    if ( !ctxt )
-    {
-        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-            " PMU feature is unavailable on domain %d vcpu %d.\n",
-            v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
-    }
-
-    ctxt->counters = sizeof(*ctxt);
-    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
-
-    vpmu->context = ctxt;
-    vpmu->priv_context = NULL;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    return 0;
-}
-
 static void amd_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
@@ -474,30 +426,62 @@ struct arch_vpmu_ops amd_vpmu_ops = {
 
 int svm_vpmu_initialise(struct vcpu *v)
 {
+    struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    int ret = 0;
 
-    /* vpmu enabled? */
     if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
-    switch ( family )
+    if ( !counters )
+        return -EINVAL;
+
+    ctxt = xzalloc_bytes(sizeof(*ctxt) +
+                         2 * sizeof(uint64_t) * num_counters);
+    if ( !ctxt )
     {
+        printk(XENLOG_G_WARNING "Insufficient memory for PMU, "
+               " PMU feature is unavailable on domain %d vcpu %d.\n",
+               v->vcpu_id, v->domain->domain_id);
+        return -ENOMEM;
+    }
+
+    ctxt->counters = sizeof(*ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
+
+    vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
+
+    vpmu->arch_vpmu_ops = &amd_vpmu_ops;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    return 0;
+}
+
+int __init amd_vpmu_init(void)
+{
+    switch ( current_cpu_data.x86 )
+    {
+    case 0x15:
+        num_counters = F15H_NUM_COUNTERS;
+        counters = AMD_F15H_COUNTERS;
+        ctrls = AMD_F15H_CTRLS;
+        k7_counters_mirrored = 1;
+        break;
     case 0x10:
     case 0x12:
     case 0x14:
-    case 0x15:
     case 0x16:
-        ret = amd_vpmu_initialise(v);
-        if ( !ret )
-            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
-        return ret;
+        num_counters = F10H_NUM_COUNTERS;
+        counters = AMD_F10H_COUNTERS;
+        ctrls = AMD_F10H_CTRLS;
+        k7_counters_mirrored = 0;
+        break;
+    default:
+        printk(XENLOG_WARNING "VPMU: Unsupported CPU family %#x\n",
+               current_cpu_data.x86);
+        return -EINVAL;
     }
 
-    printk("VPMU: Initialization failed. "
-           "AMD processor family %d has not "
-           "been supported\n", family);
-    return -EINVAL;
+    return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 6280644..17d1b04 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -708,62 +708,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    u64 msr_content;
-    static bool_t ds_warned;
-
-    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
-        goto func_out;
-    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
-    while ( boot_cpu_has(X86_FEATURE_DS) )
-    {
-        if ( !boot_cpu_has(X86_FEATURE_DTES64) )
-        {
-            if ( !ds_warned )
-                printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
-                       " - Debug Store disabled for guests\n");
-            break;
-        }
-        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
-        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
-        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
-        {
-            /* If BTS_UNAVAIL is set reset the DS feature. */
-            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
-            if ( !ds_warned )
-                printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
-                       " - Debug Store disabled for guests\n");
-            break;
-        }
-
-        vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
-        if ( !ds_warned )
-        {
-            if ( !boot_cpu_has(X86_FEATURE_DSCPL) )
-                printk(XENLOG_G_INFO
-                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
-            printk("******************************************************\n");
-            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
-            printk("** Using this processor feature in a virtualized    **\n");
-            printk("** environment is not 100%% safe.                    **\n");
-            printk("** Setting the DS buffer address with wrong values  **\n");
-            printk("** may lead to hypervisor hangs or crashes.         **\n");
-            printk("** It is NOT recommended for production use!        **\n");
-            printk("******************************************************\n");
-        }
-        break;
-    }
-    ds_warned = 1;
- func_out:
-
-    arch_pmc_cnt = core2_get_arch_pmc_count();
-    fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    check_pmc_quirk();
-    return 0;
-}
-
 static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
@@ -829,23 +773,77 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
 int vmx_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    uint8_t cpu_model = current_cpu_data.x86_model;
-    int ret = 0;
+    u64 msr_content;
+    static bool_t ds_warned;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
     if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
-    if ( family == 6 )
-    {
-        u64 caps;
+    if ( (arch_pmc_cnt + fixed_pmc_cnt) == 0 )
+        return -EINVAL;
 
-        rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
-        full_width_write = (caps >> 13) & 1;
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
+        goto func_out;
+    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
+    while ( boot_cpu_has(X86_FEATURE_DS) )
+    {
+        if ( !boot_cpu_has(X86_FEATURE_DTES64) )
+        {
+            if ( !ds_warned )
+                printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
+                       " - Debug Store disabled for guests\n");
+            break;
+        }
+        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
+        {
+            /* If BTS_UNAVAIL is set reset the DS feature. */
+            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
+            if ( !ds_warned )
+                printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
+                       " - Debug Store disabled for guests\n");
+            break;
+        }
 
-        switch ( cpu_model )
+        vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
+        if ( !ds_warned )
         {
+            if ( !boot_cpu_has(X86_FEATURE_DSCPL) )
+                printk(XENLOG_G_INFO
+                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
+            printk("******************************************************\n");
+            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
+            printk("** Using this processor feature in a virtualized    **\n");
+            printk("** environment is not 100%% safe.                    **\n");
+            printk("** Setting the DS buffer address with wrong values  **\n");
+            printk("** may lead to hypervisor hangs or crashes.         **\n");
+            printk("** It is NOT recommended for production use!        **\n");
+            printk("******************************************************\n");
+        }
+        break;
+    }
+    ds_warned = 1;
+ func_out:
+
+    vpmu->arch_vpmu_ops = &core2_vpmu_ops;
+
+    return 0;
+}
+
+int __init core2_vpmu_init(void)
+{
+    u64 caps;
+
+    if ( current_cpu_data.x86 != 6 )
+    {
+        printk(XENLOG_WARNING "VPMU: only family 6 is supported\n");
+        return -EINVAL;
+    }
+
+    switch ( current_cpu_data.x86_model )
+    {
         /* Core2: */
         case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
         case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
@@ -877,16 +875,19 @@ int vmx_vpmu_initialise(struct vcpu *v)
         /* future: */
         case 0x3d:
         case 0x4e:
-            ret = core2_vpmu_initialise(v);
-            if ( !ret )
-                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
-            return ret;
-        }
+            break;
+    default:
+        printk(XENLOG_WARNING "VPMU: Unsupported CPU model %#x\n",
+               current_cpu_data.x86_model);
+        return -EINVAL;
     }
 
-    printk("VPMU: Initialization failed. "
-           "Intel processor family %d model %d has not "
-           "been supported\n", family, cpu_model);
-    return -EINVAL;
-}
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
+    full_width_write = (caps >> 13) & 1;
 
+    check_pmc_quirk();
+
+    return 0;
+}
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 164accf..4b1a06a 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -451,14 +451,46 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 
 static int __init vpmu_init(void)
 {
+    int vendor = current_cpu_data.x86_vendor;
+
+    if ( !opt_vpmu_enabled )
+    {
+        printk(XENLOG_INFO "VPMU: disabled\n");
+        return 0;
+    }
+
     /* NMI watchdog uses LVTPC and HW counter */
     if ( opt_watchdog && opt_vpmu_enabled )
     {
         printk(XENLOG_WARNING "NMI watchdog is enabled. Turning VPMU off.\n");
         opt_vpmu_enabled = 0;
         vpmu_mode = XENPMU_MODE_OFF;
+        return 0;
+    }
+
+    switch ( vendor )
+    {
+    case X86_VENDOR_AMD:
+        if ( amd_vpmu_init() )
+           vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    case X86_VENDOR_INTEL:
+        if ( core2_vpmu_init() )
+           vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    default:
+        printk(XENLOG_WARNING "VPMU: Unknown CPU vendor: %d. "
+               "Turning VPMU off.\n", vendor);
+        vpmu_mode = XENPMU_MODE_OFF;
+        break;
     }
 
+    if ( vpmu_mode != XENPMU_MODE_OFF )
+        printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "."
+               __stringify(XENPMU_VER_MIN) "\n");
+    else
+        opt_vpmu_enabled = 0;
+
     return 0;
 }
 __initcall(vpmu_init);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 88ffc19..96f7666 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -52,7 +52,9 @@ struct arch_vpmu_ops {
     void (*arch_vpmu_dump)(const struct vcpu *);
 };
 
+int core2_vpmu_init(void);
 int vmx_vpmu_initialise(struct vcpu *);
+int amd_vpmu_init(void);
 int svm_vpmu_initialise(struct vcpu *);
 
 struct vpmu_struct {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 06/14] x86/VPMU: Initialize VPMUs with __initcall Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-24 14:08   ` Jan Beulich
  2015-03-17 14:54 ` [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---

Changes in v19:
* Keep track of PV(H) VPMU count for non-dom0 VPMUs
* Move vpmu.xenpmu_data test in pvpmu_init() under lock
* Return better error codes in pvpmu_init()

 tools/flask/policy/policy/modules/xen/xen.te |   4 +
 xen/arch/x86/domain.c                        |   2 +
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/hvm/svm/svm.c                   |   4 +-
 xen/arch/x86/hvm/svm/vpmu.c                  |  44 ++++++----
 xen/arch/x86/hvm/vmx/vmx.c                   |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c            |  79 ++++++++++++-----
 xen/arch/x86/hvm/vpmu.c                      | 121 +++++++++++++++++++++++++--
 xen/common/event_channel.c                   |   1 +
 xen/include/asm-x86/hvm/vpmu.h               |   2 +
 xen/include/public/pmu.h                     |   2 +
 xen/include/public/xen.h                     |   1 +
 xen/include/xsm/dummy.h                      |   3 +
 xen/xsm/flask/hooks.c                        |   4 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 15 files changed, 226 insertions(+), 48 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 870ff81..73bbe7b 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -120,6 +120,10 @@ domain_comms(dom0_t, dom0_t)
 # Allow all domains to use (unprivileged parts of) the tmem hypercall
 allow domain_type xen_t:xen tmem_op;
 
+# Allow all domains to use PMU (but not to change its settings --- that's what
+# pmu_ctrl is for)
+allow domain_type xen_t:xen2 pmu_use;
+
 ###############################################################################
 #
 # Domain creation
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 60d9a80..f19087e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -437,6 +437,8 @@ int vcpu_initialise(struct vcpu *v)
         vmce_init_vcpu(v);
     }
 
+    spin_lock_init(&v->arch.vpmu.vpmu_lock);
+
     if ( has_hvm_container_domain(d) )
     {
         rc = hvm_vcpu_initialise(v);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4734d71..07ad171 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4915,6 +4915,7 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
     HYPERCALL(domctl),
+    HYPERCALL(xenpmu_op),
     [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation
 };
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index b6e77cd..e523d12 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1166,7 +1166,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_vcpu(v) )
+        vpmu_initialise(v);
 
     svm_guest_osvw_init(v);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index b60ca40..58a0dc4 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -360,17 +360,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
-        amd_vpmu_unset_msr_bitmap(v);
+    if ( has_hvm_container_vcpu(v) )
+    {
+        if ( is_msr_bitmap_on(vpmu) )
+            amd_vpmu_unset_msr_bitmap(v);
 
-    xfree(vpmu->context);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+        if ( is_hvm_vcpu(v) )
+            xfree(vpmu->context);
 
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
@@ -435,15 +437,19 @@ int svm_vpmu_initialise(struct vcpu *v)
     if ( !counters )
         return -EINVAL;
 
-    ctxt = xzalloc_bytes(sizeof(*ctxt) +
-                         2 * sizeof(uint64_t) * num_counters);
-    if ( !ctxt )
+    if ( is_hvm_vcpu(v) )
     {
-        printk(XENLOG_G_WARNING "Insufficient memory for PMU, "
-               " PMU feature is unavailable on domain %d vcpu %d.\n",
-               v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
+        ctxt = xzalloc_bytes(sizeof(*ctxt) +
+                             2 * sizeof(uint64_t) * num_counters);
+        if ( !ctxt )
+        {
+            printk(XENLOG_G_WARNING "%pv: Insufficient memory for PMU, "
+                   " PMU feature is unavailable\n", v);
+            return -ENOMEM;
+        }
     }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
 
     ctxt->counters = sizeof(*ctxt);
     ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
@@ -482,6 +488,16 @@ int __init amd_vpmu_init(void)
         return -EINVAL;
     }
 
+    if ( sizeof(struct xen_pmu_data) +
+         2 * sizeof(uint64_t) * num_counters > PAGE_SIZE )
+    {
+        printk(XENLOG_WARNING
+               "VPMU: Register bank does not fit into VPMU shared page\n");
+        counters = ctrls = NULL;
+        num_counters = 0;
+        return -ENOSPC;
+    }
+
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e1c55ce..83b740a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -116,7 +116,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_vcpu(v) )
+        vpmu_initialise(v);
 
     vmx_install_vlapic_mapping(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 17d1b04..39c3626 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -362,24 +362,34 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
     uint64_t *p = NULL;
 
-    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-        return 0;
-
-    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+    p = xzalloc(uint64_t);
+    if ( !p )
         goto out_err;
 
-    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    if ( has_hvm_container_vcpu(v) )
+    {
+        if ( is_hvm_vcpu(v) && !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err_hvm;
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err_hvm;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    }
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
-                                   sizeof(uint64_t) * fixed_pmc_cnt +
-                                   sizeof(struct xen_pmu_cntr_pair) *
-                                   arch_pmc_cnt);
-    p = xzalloc(uint64_t);
-    if ( !core2_vpmu_cxt || !p )
-        goto out_err;
+    if ( is_hvm_vcpu(v) )
+    {
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) *
+                                       arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err_hvm;
+    }
+    else
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
 
     core2_vpmu_cxt->fixed_counters = sizeof(*core2_vpmu_cxt);
     core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
@@ -392,10 +402,12 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     return 1;
 
-out_err:
-    release_pmu_ownship(PMU_OWNER_HVM);
-
+ out_err_hvm:
     xfree(core2_vpmu_cxt);
+    if ( is_hvm_vcpu(v) )
+        release_pmu_ownship(PMU_OWNER_HVM);
+
+ out_err:
     xfree(p);
 
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
@@ -712,12 +724,20 @@ static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    xfree(vpmu->context);
+    if ( has_hvm_container_vcpu(v) )
+    {
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+        if ( is_hvm_vcpu(v) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
     xfree(vpmu->priv_context);
-    if ( has_hvm_container_vcpu(v) && cpu_has_vmx_msr_bitmap )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-    release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 struct arch_vpmu_ops core2_vpmu_ops = {
@@ -827,6 +847,10 @@ int vmx_vpmu_initialise(struct vcpu *v)
     ds_warned = 1;
  func_out:
 
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_vcpu(v) && !core2_vpmu_alloc_resource(v) )
+        return -EIO;
+
     vpmu->arch_vpmu_ops = &core2_vpmu_ops;
 
     return 0;
@@ -889,5 +913,14 @@ int __init core2_vpmu_init(void)
 
     check_pmc_quirk();
 
+    if ( sizeof(struct xen_pmu_data) + sizeof(uint64_t) * fixed_pmc_cnt +
+         sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > PAGE_SIZE )
+    {
+        printk(XENLOG_WARNING
+               "VPMU: Register bank does not fit into VPMU share page\n");
+        arch_pmc_cnt = fixed_pmc_cnt = 0;
+        return -ENOSPC;
+    }
+
     return 0;
 }
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 4b1a06a..ea63b6d 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -27,6 +27,7 @@
 #include <asm/types.h>
 #include <asm/msr.h>
 #include <asm/nmi.h>
+#include <asm/p2m.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
@@ -263,14 +264,14 @@ void vpmu_initialise(struct vcpu *v)
     BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
     BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
 
-    if ( is_pvh_vcpu(v) )
-        return;
-
     ASSERT(!vpmu->flags && !vpmu->context);
 
-    spin_lock(&vpmu_lock);
-    vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
-    spin_unlock(&vpmu_lock);
+    if ( v->domain != hardware_domain )
+    {
+        spin_lock(&vpmu_lock);
+        vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
+        spin_unlock(&vpmu_lock);
+    }
 
     switch ( vendor )
     {
@@ -297,7 +298,8 @@ void vpmu_initialise(struct vcpu *v)
         printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
 
     /* Intel needs to initialize VPMU ops even if VPMU is not in use */
-    if ( ret || (vpmu_mode == XENPMU_MODE_OFF) )
+    if ( (v->domain != hardware_domain) &&
+         (ret || (vpmu_mode == XENPMU_MODE_OFF)) )
     {
         spin_lock(&vpmu_lock);
         vpmu_count--;
@@ -330,13 +332,104 @@ void vpmu_destroy(struct vcpu *v)
                          vpmu_clear_last, v, 1);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
-        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, v, 1);
+         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
 
     spin_lock(&vpmu_lock);
-    vpmu_count--;
+    if ( v->domain != hardware_domain )
+        vpmu_count--;
     spin_unlock(&vpmu_lock);
 }
 
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+    struct page_info *page;
+    uint64_t gfn = params->val;
+
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return -EINVAL;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    v = d->vcpu[params->vcpu];
+    vpmu = vcpu_vpmu(v);
+
+    spin_lock(&vpmu->vpmu_lock);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        spin_unlock(&vpmu->vpmu_lock);
+        return -EEXIST;
+    }
+
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        spin_unlock(&vpmu->vpmu_lock);
+        return -ENOMEM;
+    }
+
+    vpmu_initialise(v);
+
+    spin_unlock(&vpmu->vpmu_lock);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+    uint64_t mfn;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if ( v != current )
+        vcpu_pause(v);
+
+    vpmu = vcpu_vpmu(v);
+    spin_lock(&vpmu->vpmu_lock);
+
+    vpmu_destroy(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        ASSERT(mfn != 0);
+        unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+        put_page_and_type(mfn_to_page(mfn));
+        v->arch.vpmu.xenpmu_data = NULL;
+    }
+
+    spin_unlock(&vpmu->vpmu_lock);
+
+    if ( v != current )
+        vcpu_unpause(v);
+}
+
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
 void vpmu_dump(struct vcpu *v)
 {
@@ -363,6 +456,8 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
     {
     case XENPMU_mode_set:
     case XENPMU_feature_set:
+    case XENPMU_init:
+    case XENPMU_finish:
         if ( copy_from_guest(&pmu_params, arg, 1) )
             return -EFAULT;
 
@@ -442,6 +537,14 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 
         break;
 
+    case XENPMU_init:
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+
     default:
         ret = -EINVAL;
     }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index fae242d..310f590 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -106,6 +106,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_XENPMU:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 96f7666..642a4b7 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -64,6 +64,8 @@ struct vpmu_struct {
     void *context;      /* May be shared with PV guest */
     void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
+    struct xen_pmu_data *xenpmu_data;
+    spinlock_t vpmu_lock;
 };
 
 /* VPMU states */
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 66cc494..afb4ca1 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -25,6 +25,8 @@
 #define XENPMU_mode_set        1
 #define XENPMU_feature_get     2
 #define XENPMU_feature_set     3
+#define XENPMU_init            4
+#define XENPMU_finish          5
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 0dd3c97..a6b26fe 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
+#define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index c637454..ae47135 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -665,6 +665,9 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
     case XENPMU_feature_set:
     case XENPMU_feature_get:
         return xsm_default_action(XSM_PRIV, d, current->domain);
+    case XENPMU_init:
+    case XENPMU_finish: 
+        return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 982e879..5011cb9 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1518,6 +1518,10 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
     case XENPMU_feature_get:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
                             XEN2__PMU_CTRL, NULL);
+    case XENPMU_init:
+    case XENPMU_finish:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
+                            XEN2__PMU_USE, NULL);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 626850d..ef5b867 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -88,6 +88,8 @@ class xen2
     get_symbol
 # PMU control
     pmu_ctrl
+# PMU use (domains, including unprivileged ones, will be using this operation)
+    pmu_use
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-24 14:15   ` Jan Beulich
  2015-03-17 14:54 ` [PATCH v19 09/14] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Save VPMU state during context switch for both HVM and PV(H) guests.

A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible
for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs
is_running to be correctly set/cleared. To prepare for that, call context_saved()
before vpmu_switch_to() is executed. (Note that while this change could have
been dalayed until that later patch, the changes are harmless to existing code
and so we do it here)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v19:
* Adjusted for new vpmu_switch_to/from interface

 xen/arch/x86/domain.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f19087e..c7f8210 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1533,17 +1533,14 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     }
 
     if ( prev != next )
-        _update_runstate_area(prev);
-
-    if ( is_hvm_vcpu(prev) )
     {
-        if (prev != next)
-            vpmu_switch_from(prev);
-
-        if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
-            pt_save_timer(prev);
+        _update_runstate_area(prev);
+        vpmu_switch_from(prev);
     }
 
+    if ( is_hvm_vcpu(prev) && !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+        pt_save_timer(prev);
+
     local_irq_disable();
 
     set_current(next);
@@ -1581,15 +1578,16 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if (is_hvm_vcpu(next) && (prev != next) )
-        /* Must be done with interrupts enabled */
-        vpmu_switch_to(next);
-
     context_saved(prev);
 
     if ( prev != next )
+    {
         _update_runstate_area(next);
 
+        /* Must be done with interrupts enabled */
+        vpmu_switch_to(next);
+    }
+
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 09/14] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-17 14:54 ` [PATCH v19 10/14] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

With this patch return value of 1 of vpmu_do_msr() will now indicate whether an
error was encountered during MSR processing (instead of stating that the access
was to a VPMU register).

As part of this patch we also check for validity of certain MSR accesses right
when we determine which register is being written, as opposed to postponing this
until later.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/svm.c        |  6 ++-
 xen/arch/x86/hvm/svm/vpmu.c       |  6 +--
 xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 82 ++++++++++++++-------------------------
 4 files changed, 55 insertions(+), 63 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e523d12..4fe36e9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1709,7 +1709,8 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_rdmsr(msr, msr_content);
+        if ( vpmu_do_rdmsr(msr, msr_content) )
+            goto gpf;
         break;
 
     case MSR_AMD64_DR0_ADDRESS_MASK:
@@ -1860,7 +1861,8 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_wrmsr(msr, msr_content, 0);
+        if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+            goto gpf;
         break;
 
     case MSR_IA32_MCx_MISC(4): /* Threshold register */
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 58a0dc4..474d0db 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -305,7 +305,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 1;
+            return 0;
         vpmu_set(vpmu, VPMU_RUNNING);
 
         if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
@@ -335,7 +335,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
 
     /* Write to hw counters */
     wrmsrl(msr, msr_content);
-    return 1;
+    return 0;
 }
 
 static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -353,7 +353,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
     rdmsrl(msr, *msr_content);
 
-    return 1;
+    return 0;
 }
 
 static void amd_vpmu_destroy(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 83b740a..206e50d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2127,12 +2127,17 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
                        MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
         /* Perhaps vpmu will change some bits. */
+        /* FALLTHROUGH */
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_IA32_PEBS_ENABLE:
+    case MSR_IA32_DS_AREA:
         if ( vpmu_do_rdmsr(msr, msr_content) )
-            goto done;
+            goto gp_fault;
         break;
     default:
-        if ( vpmu_do_rdmsr(msr, msr_content) )
-            break;
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
         switch ( long_mode_do_msr_read(msr, msr_content) )
@@ -2308,7 +2313,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( msr_content & ~supported )
         {
             /* Perhaps some other bits are supported in vpmu. */
-            if ( !vpmu_do_wrmsr(msr, msr_content, supported) )
+            if ( vpmu_do_wrmsr(msr, msr_content, supported) )
                 break;
         }
         if ( msr_content & IA32_DEBUGCTLMSR_LBR )
@@ -2336,9 +2341,16 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( !nvmx_msr_write_intercept(msr, msr_content) )
             goto gp_fault;
         break;
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_IA32_PEBS_ENABLE:
+    case MSR_IA32_DS_AREA:
+         if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+            goto gp_fault;
+        break;
     default:
-        if ( vpmu_do_wrmsr(msr, msr_content, 0) )
-            return X86EMUL_OKAY;
         if ( passive_domain_do_wrmsr(msr, msr_content) )
             return X86EMUL_OKAY;
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 39c3626..d10e3e7 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -463,36 +463,41 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                              IA32_DEBUGCTLMSR_BTS_OFF_USR;
             if ( !(msr_content & ~supported) &&
                  vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                return 1;
+                return 0;
             if ( (msr_content & supported) &&
                  !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
                 printk(XENLOG_G_WARNING
                        "%pv: Debug Store unsupported on this CPU\n",
                        current);
         }
-        return 0;
+        return 1;
     }
 
     ASSERT(!supported);
 
+    if ( type == MSR_TYPE_COUNTER &&
+         (msr_content &
+          ~((1ull << core2_get_bitwidth_fix_count()) - 1)) )
+        /* Writing unsupported bits to a fixed counter */
+        return 1;
+
     core2_vpmu_cxt = vpmu->context;
     enabled_cntrs = vpmu->priv_context;
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_status &= ~msr_content;
-        return 1;
+        return 0;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
                  "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        hvm_inject_hw_exception(TRAP_gp_fault, 0);
         return 1;
     case MSR_IA32_PEBS_ENABLE:
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
         core2_vpmu_cxt->pebs_enable = msr_content;
-        return 1;
+        return 0;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
         {
@@ -501,18 +506,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                 gdprintk(XENLOG_WARNING,
                          "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
                          msr_content);
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
             core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
-        return 1;
+        return 0;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
+        if ( msr_content &
+             ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
+            return 1;
+
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
         *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
@@ -535,6 +543,9 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
                 vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
+            if ( msr_content & (~((1ull << 32) - 1)) )
+                return 1;
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
@@ -546,45 +557,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         }
     }
 
+    if ( type != MSR_TYPE_GLOBAL )
+        wrmsrl(msr, msr_content);
+    else
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+
     if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    if ( type != MSR_TYPE_GLOBAL )
-    {
-        u64 mask;
-        int inject_gp = 0;
-        switch ( type )
-        {
-        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
-            mask = ~((1ull << 32) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
-            if  ( msr == MSR_IA32_DS_AREA )
-                break;
-            /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
-            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        }
-        if (inject_gp)
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
-        else
-            wrmsrl(msr, msr_content);
-    }
-    else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-
-    return 1;
+    return 0;
 }
 
 static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -612,19 +595,14 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             rdmsrl(msr, *msr_content);
         }
     }
-    else
+    else if ( msr == MSR_IA32_MISC_ENABLE )
     {
         /* Extension for BTS */
-        if ( msr == MSR_IA32_MISC_ENABLE )
-        {
-            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-        }
-        else
-            return 0;
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
     }
 
-    return 1;
+    return 0;
 }
 
 static void core2_vpmu_do_cpuid(unsigned int input,
@@ -777,9 +755,9 @@ static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     int type = -1, index = -1;
     if ( !is_core2_vpmu_msr(msr, &type, &index) )
-        return 0;
+        return 1;
     *msr_content = 0;
-    return 1;
+    return 0;
 }
 
 /*
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 10/14] x86/VPMU: Add support for PMU register handling on PV guests
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 09/14] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-17 14:54 ` [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Intercept accesses to PMU MSRs and process them in VPMU module. If vpmu ops
for VCPU are not initialized (which is the case, for example, for PV guests that
are not "VPMU-enlightened") access to MSRs will return failure.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/domain.c             |  3 +--
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 49 +++++++++++++++++++++++++++++++------
 xen/arch/x86/hvm/vpmu.c           |  3 +++
 xen/arch/x86/traps.c              | 51 +++++++++++++++++++++++++++++++++++++--
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index c7f8210..a48d824 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2065,8 +2065,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
     paging_dump_vcpu_info(v);
 
-    if ( is_hvm_vcpu(v) )
-        vpmu_dump(v);
+    vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index d10e3e7..66d7bc0 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/apic.h>
+#include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
 #include <asm/hvm/support.h>
@@ -299,12 +300,18 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
         rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( !has_hvm_container_vcpu(v) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+    if ( !has_hvm_container_vcpu(v) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
     if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
@@ -342,6 +349,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
     wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
     wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( !has_hvm_container_vcpu(v) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -442,7 +456,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                                uint64_t supported)
 {
-    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -486,7 +499,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( msr_content & ~(0xC000000000000000 |
+                             (((1ULL << fixed_pmc_cnt) - 1) << 32) |
+                             ((1ULL << arch_pmc_cnt) - 1)) )
+            return 1;
         core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
         return 0;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -514,14 +532,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 0;
     case MSR_CORE_PERF_GLOBAL_CTRL:
-        global_ctrl = msr_content;
+        core2_vpmu_cxt->global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         if ( msr_content &
              ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
             return 1;
 
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        if ( has_hvm_container_vcpu(v) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                               &core2_vpmu_cxt->global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
         *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
@@ -546,7 +568,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             if ( msr_content & (~((1ull << 32) - 1)) )
                 return 1;
 
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            if ( has_hvm_container_vcpu(v) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                                   &core2_vpmu_cxt->global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
                 *enabled_cntrs |= 1ULL << tmp;
@@ -560,9 +586,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     if ( type != MSR_TYPE_GLOBAL )
         wrmsrl(msr, msr_content);
     else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    {
+        if ( has_hvm_container_vcpu(v) )
+            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+        else
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
 
-    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
+    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -589,7 +621,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            if ( has_hvm_container_vcpu(v) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
         default:
             rdmsrl(msr, *msr_content);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index ea63b6d..26eda34 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -121,6 +121,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
         return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    else
+        *msr_content = 0;
+
     return 0;
 }
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 7a2e2d4..1eb7bb4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,6 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
+#include <asm/hvm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
@@ -968,8 +969,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
         __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
         break;
 
+    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
+        break;
+
     case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
     case 0x0000000b: /* Extended Topology Enumeration */
     case 0x8000000a: /* SVM revision and features */
     case 0x8000001b: /* Instruction Based Sampling */
@@ -985,6 +988,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
     }
 
  out:
+    /* VPMU may decide to modify some of the leaves */
+    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
+
     regs->eax = a;
     regs->ebx = b;
     regs->ecx = c;
@@ -2009,6 +2015,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
     char io_emul_stub[32];
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val;
+    bool_t vpmu_msr;
 
     if ( !read_descriptor(regs->cs, v, regs,
                           &code_base, &code_limit, &ar,
@@ -2499,7 +2506,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         uint32_t eax = regs->eax;
         uint32_t edx = regs->edx;
         uint64_t msr_content = ((uint64_t)edx << 32) | eax;
-
+        vpmu_msr = 0;
         switch ( regs->_ecx )
         {
         case MSR_FS_BASE:
@@ -2636,6 +2643,22 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
                 wrmsrl(regs->_ecx, msr_content);
             break;
+        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+            {
+                vpmu_msr = 1;
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+                if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+                {
+                    if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
+                        goto fail;
+                }
+                break;
+            }
+            /*FALLTHROUGH*/
 
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
@@ -2671,6 +2694,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         break;
 
     case 0x32: /* RDMSR */
+        vpmu_msr = 0;
         switch ( regs->_ecx )
         {
         case MSR_FS_BASE:
@@ -2738,6 +2762,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                             [regs->_ecx - MSR_AMD64_DR1_ADDRESS_MASK + 1];
             regs->edx = 0;
             break;
+        case MSR_IA32_PERF_CAPABILITIES:
+            /* No extra capabilities are supported */
+            regs->eax = regs->edx = 0;
+            break;
+        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+            {
+                vpmu_msr = 1;
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+                if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+                {
+                    if ( vpmu_do_rdmsr(regs->ecx, &val) )
+                        goto fail;
+
+                    regs->eax = (uint32_t)val;
+                    regs->edx = (uint32_t)(val >> 32);
+                }
+                break;
+            }
+            /*FALLTHROUGH*/
 
         default:
             if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (9 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 10/14] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-24 14:28   ` Jan Beulich
  2015-03-17 14:54 ` [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Add support for handling PMU interrupts for PV guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Since the interrupt handler may now force VPMU context save (i.e. set
VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which
until now expected this flag to be set only when the counters were stopped.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---

Changes in v19:
* Adjusted for new ops interfaces (passing vcpu vs. vpmu)
* Test for domain->max_cpu in choose_hwdom_vcpu() instead of 'domain->vcpu!=NULL'
* Replaced '!(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV))' test with 
  'vpmu_mode == XENPMU_MODE_OFF' in vpmu_rd/wrmsr() (to make more logical diff
  in patch#13)

 xen/arch/x86/hvm/svm/vpmu.c       |  11 +-
 xen/arch/x86/hvm/vpmu.c           | 211 ++++++++++++++++++++++++++++++++++++--
 xen/include/public/arch-x86/pmu.h |   6 ++
 xen/include/public/pmu.h          |   2 +
 xen/include/xsm/dummy.h           |   4 +-
 xen/xsm/flask/hooks.c             |   2 +
 6 files changed, 216 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 474d0db..0997901 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -228,17 +228,12 @@ static int amd_vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     unsigned int i;
 
-    /*
-     * Stop the counters. If we came here via vpmu_save_force (i.e.
-     * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
-     */
+    for ( i = 0; i < num_counters; i++ )
+        wrmsrl(ctrls[i], 0);
+
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
     {
         vpmu_set(vpmu, VPMU_FROZEN);
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], 0);
-
         return 0;
     }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 26eda34..c287d8b 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -87,31 +87,57 @@ static void __init parse_vpmu_param(char *s)
 void vpmu_lvtpc_update(uint32_t val)
 {
     struct vpmu_struct *vpmu;
+    struct vcpu *curr;
 
     if ( vpmu_mode == XENPMU_MODE_OFF )
         return;
 
-    vpmu = vcpu_vpmu(current);
+    curr = current;
+    vpmu = vcpu_vpmu(curr);
 
     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+    if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data ||
+         !(vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu;
 
     if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
+    vpmu = vcpu_vpmu(curr);
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
+
+        /*
+         * We may have received a PMU interrupt during WRMSR handling
+         * and since do_wrmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
+             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
+
     return 0;
 }
 
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu;
 
     if ( vpmu_mode == XENPMU_MODE_OFF )
     {
@@ -119,24 +145,163 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
         return 0;
     }
 
+    vpmu = vcpu_vpmu(curr);
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
+             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     else
         *msr_content = 0;
 
     return 0;
 }
 
+static inline struct vcpu *choose_hwdom_vcpu(void)
+{
+    unsigned idx;
+
+    if ( hardware_domain->max_vcpus == 0 )
+        return NULL;
+
+    idx = smp_processor_id() % hardware_domain->max_vcpus;
+
+    return hardware_domain->vcpu[idx];
+}
+
 void vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct vcpu *sampled = current, *sampling;
+    struct vpmu_struct *vpmu;
+
+    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
+    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+    {
+        sampling = choose_hwdom_vcpu();
+        if ( !sampling )
+            return;
+    }
+    else
+        sampling = sampled;
+
+    vpmu = vcpu_vpmu(sampling);
+    if ( !is_hvm_vcpu(sampling) )
+    {
+        /* PV(H) guest */
+        const struct cpu_user_regs *cur_regs;
+        uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
+        uint32_t domid = DOMID_SELF;
+
+        if ( !vpmu->xenpmu_data )
+            return;
+
+        if ( is_pvh_vcpu(sampling) &&
+             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return;
+
+        if ( *flags & PMU_CACHED )
+            return;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        if ( has_hvm_container_vcpu(sampled) )
+            *flags = 0;
+        else
+            *flags = PMU_SAMPLE_PV;
+
+        /* Store appropriate registers in xenpmu_data */
+        /* FIXME: 32-bit PVH should go here as well */
+        if ( is_pv_32bit_vcpu(sampling) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+             * and therefore we treat it the same way as a non-privileged
+             * PV 32-bit domain.
+             */
+            struct compat_pmu_regs *cmp;
+
+            cur_regs = guest_cpu_user_regs();
+
+            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
+            cmp->ip = cur_regs->rip;
+            cmp->sp = cur_regs->rsp;
+            cmp->flags = cur_regs->eflags;
+            cmp->ss = cur_regs->ss;
+            cmp->cs = cur_regs->cs;
+            if ( (cmp->cs & 3) > 1 )
+                *flags |= PMU_SAMPLE_USER;
+        }
+        else
+        {
+            struct xen_pmu_regs *r = &vpmu->xenpmu_data->pmu.r.regs;
+
+            if ( (vpmu_mode & XENPMU_MODE_SELF) )
+                cur_regs = guest_cpu_user_regs();
+            else if ( !guest_mode(regs) && is_hardware_domain(sampling->domain) )
+            {
+                cur_regs = regs;
+                domid = DOMID_XEN;
+            }
+            else
+                cur_regs = guest_cpu_user_regs();
+
+            r->ip = cur_regs->rip;
+            r->sp = cur_regs->rsp;
+            r->flags = cur_regs->eflags;
+
+            if ( !has_hvm_container_vcpu(sampled) )
+            {
+                r->ss = cur_regs->ss;
+                r->cs = cur_regs->cs;
+                if ( !(sampled->arch.flags & TF_kernel_mode) )
+                    *flags |= PMU_SAMPLE_USER;
+            }
+            else
+            {
+                struct segment_register seg;
+
+                hvm_get_segment_register(sampled, x86_seg_cs, &seg);
+                r->cs = seg.sel;
+                hvm_get_segment_register(sampled, x86_seg_ss, &seg);
+                r->ss = seg.sel;
+                r->cpl = seg.attr.fields.dpl;
+                if ( !(sampled->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
+                    *flags |= PMU_SAMPLE_REAL;
+            }
+        }
+
+        vpmu->xenpmu_data->domain_id = domid;
+        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
+        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
+
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        *flags |= PMU_CACHED;
+
+        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
+
+        return;
+    }
 
     if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(v);
+        struct vlapic *vlapic = vcpu_vlapic(sampling);
         u32 vlapic_lvtpc;
 
+        /* We don't support (yet) HVM dom0 */
+        ASSERT(sampling == sampled);
+
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ||
              !is_vlapic_lvtpc_enabled(vlapic) )
             return;
@@ -149,7 +314,7 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
             vlapic_set_irq(vlapic, vlapic_lvtpc & APIC_VECTOR_MASK, 0);
             break;
         case APIC_MODE_NMI:
-            v->nmi_pending = 1;
+            sampling->nmi_pending = 1;
             break;
         }
     }
@@ -247,7 +412,9 @@ void vpmu_load(struct vcpu *v)
     local_irq_enable();
 
     /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (!is_hvm_vcpu(vpmu_vcpu(vpmu)) &&
+          (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED)) )
         return;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
@@ -266,6 +433,8 @@ void vpmu_initialise(struct vcpu *v)
 
     BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
     BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
 
     ASSERT(!vpmu->flags && !vpmu->context);
 
@@ -445,7 +614,9 @@ void vpmu_dump(struct vcpu *v)
 long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 {
     int ret;
+    struct vcpu *curr;
     struct xen_pmu_params pmu_params = {.val = 0};
+    struct xen_pmu_data *xenpmu_data;
 
     if ( !opt_vpmu_enabled )
         return -EOPNOTSUPP;
@@ -548,6 +719,24 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         pvpmu_finish(current->domain, &pmu_params);
         break;
 
+    case XENPMU_lvtpc_set:
+        curr = current;
+        xenpmu_data = curr->arch.vpmu.xenpmu_data;
+        if ( xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
+        break;
+
+    case XENPMU_flush:
+        curr = current;
+        xenpmu_data = curr->arch.vpmu.xenpmu_data;
+        if ( xenpmu_data == NULL )
+            return -EINVAL;
+        xenpmu_data->pmu.pmu_flags &= ~PMU_CACHED;
+        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
+        vpmu_load(curr);
+        break;
+
     default:
         ret = -EINVAL;
     }
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
index 8bbe341..74a91ac 100644
--- a/xen/include/public/arch-x86/pmu.h
+++ b/xen/include/public/arch-x86/pmu.h
@@ -51,6 +51,12 @@ struct xen_pmu_regs {
 typedef struct xen_pmu_regs xen_pmu_regs_t;
 DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
 
+/* PMU flags */
+#define PMU_CACHED         (1<<0) /* PMU MSRs are cached in the context */
+#define PMU_SAMPLE_USER    (1<<1) /* Sample is from user or kernel mode */
+#define PMU_SAMPLE_REAL    (1<<2) /* Sample is from realmode */
+#define PMU_SAMPLE_PV      (1<<3) /* Sample from a PV guest */
+
 struct xen_pmu_arch {
     union {
         struct xen_pmu_regs regs;
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index afb4ca1..db5321a 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -27,6 +27,8 @@
 #define XENPMU_feature_set     3
 #define XENPMU_init            4
 #define XENPMU_finish          5
+#define XENPMU_lvtpc_set       6
+#define XENPMU_flush           7 /* Write cached MSR values to HW     */
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index ae47135..1ad4ecc 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -666,7 +666,9 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
     case XENPMU_feature_get:
         return xsm_default_action(XSM_PRIV, d, current->domain);
     case XENPMU_init:
-    case XENPMU_finish: 
+    case XENPMU_finish:
+    case XENPMU_lvtpc_set:
+    case XENPMU_flush:
         return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
         return -EPERM;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 5011cb9..e47c86b 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1520,6 +1520,8 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
                             XEN2__PMU_CTRL, NULL);
     case XENPMU_init:
     case XENPMU_finish:
+    case XENPMU_lvtpc_set:
+    case XENPMU_flush:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
                             XEN2__PMU_USE, NULL);
     default:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (10 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-25  8:29   ` Dietmar Hahn
  2015-03-25 16:52   ` Jan Beulich
  2015-03-17 14:54 ` [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

The two routines share most of their logic.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v19:
* const-ified arch_vpmu_ops in vpmu_do_wrmsr
* non-changes:
   - kept 'current' as a non-initializer to avoid unnecessary initialization
     in the (common) non-VPMU case
   - kept 'nop' label since there are multiple dissimilar cases that can cause
     a non-emulation of VPMU access

 xen/arch/x86/hvm/vpmu.c        | 76 +++++++++++++++++-------------------------
 xen/include/asm-x86/hvm/vpmu.h | 14 ++++++--
 2 files changed, 42 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index c287d8b..beed956 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -103,63 +103,47 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+                uint64_t supported, bool_t is_write)
 {
-    struct vcpu *curr = current;
+    struct vcpu *curr;
     struct vpmu_struct *vpmu;
+    const struct arch_vpmu_ops *ops;
+    int ret = 0;
 
     if ( vpmu_mode == XENPMU_MODE_OFF )
-        return 0;
+        goto nop;
 
+    curr = current;
     vpmu = vcpu_vpmu(curr);
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
-
-        /*
-         * We may have received a PMU interrupt during WRMSR handling
-         * and since do_wrmsr may load VPMU context we should save
-         * (and unload) it again.
-         */
-        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
-             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
-    }
-
-    return 0;
-}
-
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *curr = current;
-    struct vpmu_struct *vpmu;
+    ops = vpmu->arch_vpmu_ops;
+    if ( !ops )
+        goto nop;
+
+    if ( is_write && ops->do_wrmsr )
+        ret = ops->do_wrmsr(msr, *msr_content, supported);
+    else if ( !is_write && ops->do_rdmsr )
+        ret = ops->do_rdmsr(msr, msr_content);
+    else
+        goto nop;
 
-    if ( vpmu_mode == XENPMU_MODE_OFF )
+    /*
+     * We may have received a PMU interrupt while handling MSR access
+     * and since do_wr/rdmsr may load VPMU context we should save
+     * (and unload) it again.
+     */
+    if ( !is_hvm_vcpu(curr) &&
+         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
     {
-        *msr_content = 0;
-        return 0;
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+        ops->arch_vpmu_save(curr);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
     }
 
-    vpmu = vcpu_vpmu(curr);
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    return ret;
 
-        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
-             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
-    }
-    else
+ nop:
+    if ( !is_write )
         *msr_content = 0;
 
     return 0;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 642a4b7..63851a7 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -99,8 +99,8 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
 }
 
 void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+                uint64_t supported, bool_t is_write);
 void vpmu_do_interrupt(struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                        unsigned int *ecx, unsigned int *edx);
@@ -110,6 +110,16 @@ void vpmu_save(struct vcpu *v);
 void vpmu_load(struct vcpu *v);
 void vpmu_dump(struct vcpu *v);
 
+static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+                                uint64_t supported)
+{
+    return vpmu_do_msr(msr, &msr_content, supported, 1);
+}
+static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    return vpmu_do_msr(msr, msr_content, 0, 0);
+}
+
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (11 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-25 12:25   ` Dietmar Hahn
  2015-03-25 16:57   ` Jan Beulich
  2015-03-17 14:54 ` [PATCH v19 14/14] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
  2015-03-19 14:32 ` [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Dietmar Hahn
  14 siblings, 2 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged
domain (dom0) profile both itself (and the hypervisor) and the guests. While
this mode is on profiling in guests is disabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v19:
* Slightly different mode changing logic in xenpmu_op() since we no longer
  allow mode changes while VPMUs are active

 xen/arch/x86/hvm/vpmu.c  | 34 +++++++++++++++++++++++++---------
 xen/arch/x86/traps.c     | 13 +++++++++++++
 xen/include/public/pmu.h |  3 +++
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index beed956..71c5063 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -111,7 +111,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
     const struct arch_vpmu_ops *ops;
     int ret = 0;
 
-    if ( vpmu_mode == XENPMU_MODE_OFF )
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_ALL) &&
+          !is_hardware_domain(current->domain)) )
         goto nop;
 
     curr = current;
@@ -166,8 +168,12 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *sampled = current, *sampling;
     struct vpmu_struct *vpmu;
 
-    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
-    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+    /*
+     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+     * in XENPMU_MODE_ALL, for everyone.
+     */
+    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
     {
         sampling = choose_hwdom_vcpu();
         if ( !sampling )
@@ -177,17 +183,18 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
         sampling = sampled;
 
     vpmu = vcpu_vpmu(sampling);
-    if ( !is_hvm_vcpu(sampling) )
+    if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) )
     {
         /* PV(H) guest */
         const struct cpu_user_regs *cur_regs;
         uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
-        uint32_t domid = DOMID_SELF;
+        uint32_t domid;
 
         if ( !vpmu->xenpmu_data )
             return;
 
         if ( is_pvh_vcpu(sampling) &&
+             !(vpmu_mode & XENPMU_MODE_ALL) &&
              !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return;
 
@@ -204,6 +211,11 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
         else
             *flags = PMU_SAMPLE_PV;
 
+        if ( sampled == sampling )
+            domid = DOMID_SELF;
+        else
+            domid = sampled->domain->domain_id;
+
         /* Store appropriate registers in xenpmu_data */
         /* FIXME: 32-bit PVH should go here as well */
         if ( is_pv_32bit_vcpu(sampling) )
@@ -232,7 +244,8 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
 
             if ( (vpmu_mode & XENPMU_MODE_SELF) )
                 cur_regs = guest_cpu_user_regs();
-            else if ( !guest_mode(regs) && is_hardware_domain(sampling->domain) )
+            else if ( !guest_mode(regs) &&
+                      is_hardware_domain(sampling->domain) )
             {
                 cur_regs = regs;
                 domid = DOMID_XEN;
@@ -508,7 +521,8 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
     struct page_info *page;
     uint64_t gfn = params->val;
 
-    if ( vpmu_mode == XENPMU_MODE_OFF )
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_ALL) && !is_hardware_domain(d)) )
         return -EINVAL;
 
     if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
@@ -627,12 +641,14 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
     {
     case XENPMU_mode_set:
     {
-        if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) ||
+        if ( (pmu_params.val &
+              ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
              (hweight64(pmu_params.val) > 1) )
             return -EINVAL;
 
         /* 32-bit dom0 can only sample itself. */
-        if ( is_pv_32bit_vcpu(current) && (pmu_params.val & XENPMU_MODE_HV) )
+        if ( is_pv_32bit_vcpu(current) &&
+             (pmu_params.val & (XENPMU_MODE_HV | XENPMU_MODE_ALL)) )
             return -EINVAL;
 
         spin_lock(&vpmu_lock);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 1eb7bb4..8a40deb 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2653,6 +2653,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
                 if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
                 {
+                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                         !is_hardware_domain(v->domain) )
+                        break;
+
                     if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
                         goto fail;
                 }
@@ -2776,6 +2780,15 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
                 if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
                 {
+
+                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                         !is_hardware_domain(v->domain) )
+                    {
+                        /* Don't leak PMU MSRs to unprivileged domains */
+                        regs->eax = regs->edx = 0;
+                        break;
+                    }
+
                     if ( vpmu_do_rdmsr(regs->ecx, &val) )
                         goto fail;
 
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index db5321a..a83ae71 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -52,10 +52,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  * - XENPMU_MODE_SELF:  Guests can profile themselves
  * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
  *                      itself and Xen
+ * - XENPMU_MODE_ALL:   Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
  */
 #define XENPMU_MODE_OFF           0
 #define XENPMU_MODE_SELF          (1<<0)
 #define XENPMU_MODE_HV            (1<<1)
+#define XENPMU_MODE_ALL           (1<<2)
 
 /*
  * PMU features:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v19 14/14] x86/VPMU: Move VPMU files up from hvm/ directory
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (12 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2015-03-17 14:54 ` Boris Ostrovsky
  2015-03-19 14:32 ` [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Dietmar Hahn
  14 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-17 14:54 UTC (permalink / raw)
  To: JBeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra, andrew.cooper3
  Cc: boris.ostrovsky, tim, jun.nakajima, xen-devel

Since PMU is now not HVM specific we can move VPMU-related files up from
arch/x86/hvm/ directory.

Specifically:
    arch/x86/hvm/vpmu.c -> arch/x86/cpu/vpmu.c
    arch/x86/hvm/svm/vpmu.c -> arch/x86/cpu/vpmu_amd.c
    arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/cpu/vpmu_intel.c
    include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/cpu/Makefile             |   1 +
 xen/arch/x86/cpu/vpmu.c               | 791 ++++++++++++++++++++++++++++
 xen/arch/x86/cpu/vpmu_amd.c           | 498 ++++++++++++++++++
 xen/arch/x86/cpu/vpmu_intel.c         | 939 ++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/Makefile             |   1 -
 xen/arch/x86/hvm/svm/Makefile         |   1 -
 xen/arch/x86/hvm/svm/vpmu.c           | 498 ------------------
 xen/arch/x86/hvm/vlapic.c             |   2 +-
 xen/arch/x86/hvm/vmx/Makefile         |   1 -
 xen/arch/x86/hvm/vmx/vpmu_core2.c     | 939 ----------------------------------
 xen/arch/x86/hvm/vpmu.c               | 791 ----------------------------
 xen/arch/x86/oprofile/op_model_ppro.c |   2 +-
 xen/arch/x86/traps.c                  |   2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h    |   2 +-
 xen/include/asm-x86/hvm/vpmu.h        | 143 ------
 xen/include/asm-x86/vpmu.h            | 143 ++++++
 16 files changed, 2376 insertions(+), 2378 deletions(-)
 create mode 100644 xen/arch/x86/cpu/vpmu.c
 create mode 100644 xen/arch/x86/cpu/vpmu_amd.c
 create mode 100644 xen/arch/x86/cpu/vpmu_intel.c
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h

diff --git a/xen/arch/x86/cpu/Makefile b/xen/arch/x86/cpu/Makefile
index d73d93a..74f23ae 100644
--- a/xen/arch/x86/cpu/Makefile
+++ b/xen/arch/x86/cpu/Makefile
@@ -7,3 +7,4 @@ obj-y += common.o
 obj-y += intel.o
 obj-y += intel_cacheinfo.o
 obj-y += mwait-idle.o
+obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
new file mode 100644
index 0000000..2412e7b
--- /dev/null
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -0,0 +1,791 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/msr.h>
+#include <asm/nmi.h>
+#include <asm/p2m.h>
+#include <asm/vpmu.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <asm/hvm/svm/svm.h>
+#include <asm/hvm/svm/vmcb.h>
+#include <asm/apic.h>
+#include <public/pmu.h>
+#include <xsm/xsm.h>
+
+#include <compat/pmu.h>
+CHECK_pmu_params;
+CHECK_pmu_intel_ctxt;
+CHECK_pmu_amd_ctxt;
+CHECK_pmu_cntr_pair;
+CHECK_pmu_regs;
+
+/*
+ * "vpmu" :     vpmu generally enabled
+ * "vpmu=off" : vpmu generally disabled
+ * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
+ */
+static unsigned int __read_mostly opt_vpmu_enabled;
+unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+unsigned int __read_mostly vpmu_features = 0;
+static void parse_vpmu_param(char *s);
+custom_param("vpmu", parse_vpmu_param);
+
+static DEFINE_SPINLOCK(vpmu_lock);
+static unsigned vpmu_count;
+
+static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+
+static void __init parse_vpmu_param(char *s)
+{
+    switch ( parse_bool(s) )
+    {
+    case 0:
+        break;
+    default:
+        if ( !strcmp(s, "bts") )
+            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
+        else if ( *s )
+        {
+            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+            break;
+        }
+        /* fall through */
+    case 1:
+        /* Default VPMU mode */
+        vpmu_mode = XENPMU_MODE_SELF;
+        opt_vpmu_enabled = 1;
+        break;
+    }
+}
+
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu;
+    struct vcpu *curr;
+
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return;
+
+    curr = current;
+    vpmu = vcpu_vpmu(curr);
+
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+
+    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+    if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data ||
+         !(vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+                uint64_t supported, bool_t is_write)
+{
+    struct vcpu *curr;
+    struct vpmu_struct *vpmu;
+    const struct arch_vpmu_ops *ops;
+    int ret = 0;
+
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_ALL) &&
+          !is_hardware_domain(current->domain)) )
+        goto nop;
+
+    curr = current;
+    vpmu = vcpu_vpmu(curr);
+    ops = vpmu->arch_vpmu_ops;
+    if ( !ops )
+        goto nop;
+
+    if ( is_write && ops->do_wrmsr )
+        ret = ops->do_wrmsr(msr, *msr_content, supported);
+    else if ( !is_write && ops->do_rdmsr )
+        ret = ops->do_rdmsr(msr, msr_content);
+    else
+        goto nop;
+
+    /*
+     * We may have received a PMU interrupt while handling MSR access
+     * and since do_wr/rdmsr may load VPMU context we should save
+     * (and unload) it again.
+     */
+    if ( !is_hvm_vcpu(curr) &&
+         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+    {
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+        ops->arch_vpmu_save(curr);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+    }
+
+    return ret;
+
+ nop:
+    if ( !is_write )
+        *msr_content = 0;
+
+    return 0;
+}
+
+static inline struct vcpu *choose_hwdom_vcpu(void)
+{
+    unsigned idx;
+
+    if ( hardware_domain->max_vcpus == 0 )
+        return NULL;
+
+    idx = smp_processor_id() % hardware_domain->max_vcpus;
+
+    return hardware_domain->vcpu[idx];
+}
+
+void vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *sampled = current, *sampling;
+    struct vpmu_struct *vpmu;
+
+    /*
+     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+     * in XENPMU_MODE_ALL, for everyone.
+     */
+    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+        sampling = choose_hwdom_vcpu();
+        if ( !sampling )
+            return;
+    }
+    else
+        sampling = sampled;
+
+    vpmu = vcpu_vpmu(sampling);
+    if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) )
+    {
+        /* PV(H) guest */
+        const struct cpu_user_regs *cur_regs;
+        uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
+        uint32_t domid;
+
+        if ( !vpmu->xenpmu_data )
+            return;
+
+        if ( is_pvh_vcpu(sampling) &&
+             !(vpmu_mode & XENPMU_MODE_ALL) &&
+             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return;
+
+        if ( *flags & PMU_CACHED )
+            return;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        if ( has_hvm_container_vcpu(sampled) )
+            *flags = 0;
+        else
+            *flags = PMU_SAMPLE_PV;
+
+        if ( sampled == sampling )
+            domid = DOMID_SELF;
+        else
+            domid = sampled->domain->domain_id;
+
+        /* Store appropriate registers in xenpmu_data */
+        /* FIXME: 32-bit PVH should go here as well */
+        if ( is_pv_32bit_vcpu(sampling) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+             * and therefore we treat it the same way as a non-privileged
+             * PV 32-bit domain.
+             */
+            struct compat_pmu_regs *cmp;
+
+            cur_regs = guest_cpu_user_regs();
+
+            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
+            cmp->ip = cur_regs->rip;
+            cmp->sp = cur_regs->rsp;
+            cmp->flags = cur_regs->eflags;
+            cmp->ss = cur_regs->ss;
+            cmp->cs = cur_regs->cs;
+            if ( (cmp->cs & 3) > 1 )
+                *flags |= PMU_SAMPLE_USER;
+        }
+        else
+        {
+            struct xen_pmu_regs *r = &vpmu->xenpmu_data->pmu.r.regs;
+
+            if ( (vpmu_mode & XENPMU_MODE_SELF) )
+                cur_regs = guest_cpu_user_regs();
+            else if ( !guest_mode(regs) &&
+                      is_hardware_domain(sampling->domain) )
+            {
+                cur_regs = regs;
+                domid = DOMID_XEN;
+            }
+            else
+                cur_regs = guest_cpu_user_regs();
+
+            r->ip = cur_regs->rip;
+            r->sp = cur_regs->rsp;
+            r->flags = cur_regs->eflags;
+
+            if ( !has_hvm_container_vcpu(sampled) )
+            {
+                r->ss = cur_regs->ss;
+                r->cs = cur_regs->cs;
+                if ( !(sampled->arch.flags & TF_kernel_mode) )
+                    *flags |= PMU_SAMPLE_USER;
+            }
+            else
+            {
+                struct segment_register seg;
+
+                hvm_get_segment_register(sampled, x86_seg_cs, &seg);
+                r->cs = seg.sel;
+                hvm_get_segment_register(sampled, x86_seg_ss, &seg);
+                r->ss = seg.sel;
+                r->cpl = seg.attr.fields.dpl;
+                if ( !(sampled->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
+                    *flags |= PMU_SAMPLE_REAL;
+            }
+        }
+
+        vpmu->xenpmu_data->domain_id = domid;
+        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
+        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
+
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        *flags |= PMU_CACHED;
+
+        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
+
+        return;
+    }
+
+    if ( vpmu->arch_vpmu_ops )
+    {
+        struct vlapic *vlapic = vcpu_vlapic(sampling);
+        u32 vlapic_lvtpc;
+
+        /* We don't support (yet) HVM dom0 */
+        ASSERT(sampling == sampled);
+
+        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ||
+             !is_vlapic_lvtpc_enabled(vlapic) )
+            return;
+
+        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+
+        switch ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) )
+        {
+        case APIC_MODE_FIXED:
+            vlapic_set_irq(vlapic, vlapic_lvtpc & APIC_VECTOR_MASK, 0);
+            break;
+        case APIC_MODE_NMI:
+            sampling->nmi_pending = 1;
+            break;
+        }
+    }
+}
+
+void vpmu_do_cpuid(unsigned int input,
+                   unsigned int *eax, unsigned int *ebx,
+                   unsigned int *ecx, unsigned int *edx)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid )
+        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
+}
+
+static void vpmu_save_force(void *arg)
+{
+    struct vcpu *v = (struct vcpu *)arg;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
+    if ( vpmu->arch_vpmu_ops )
+        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+
+    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
+
+    per_cpu(last_vcpu, smp_processor_id()) = NULL;
+}
+
+void vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
+       return;
+
+    vpmu->last_pcpu = pcpu;
+    per_cpu(last_vcpu, pcpu) = v;
+
+    if ( vpmu->arch_vpmu_ops )
+        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+}
+
+void vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+    struct vcpu *prev = NULL;
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    /* First time this VCPU is running here */
+    if ( vpmu->last_pcpu != pcpu )
+    {
+        /*
+         * Get the context from last pcpu that we ran on. Note that if another
+         * VCPU is running there it must have saved this VPCU's context before
+         * startig to run (see below).
+         * There should be no race since remote pcpu will disable interrupts
+         * before saving the context.
+         */
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        {
+            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                             vpmu_save_force, (void *)v, 1);
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+        }
+    } 
+
+    /* Prevent forced context save from remote CPU */
+    local_irq_disable();
+
+    prev = per_cpu(last_vcpu, pcpu);
+
+    if ( prev != v && prev )
+    {
+        vpmu = vcpu_vpmu(prev);
+
+        /* Someone ran here before us */
+        vpmu_save_force(prev);
+        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+        vpmu = vcpu_vpmu(v);
+    }
+
+    local_irq_enable();
+
+    /* Only when PMU is counting, we load PMU context immediately. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (!is_hvm_vcpu(vpmu_vcpu(vpmu)) &&
+          (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED)) )
+        return;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
+    {
+        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        /* Arch code needs to set VPMU_CONTEXT_LOADED */
+        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
+    }
+}
+
+void vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t vendor = current_cpu_data.x86_vendor;
+    int ret;
+
+    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
+
+    ASSERT(!vpmu->flags && !vpmu->context);
+
+    if ( v->domain != hardware_domain )
+    {
+        spin_lock(&vpmu_lock);
+        vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
+        spin_unlock(&vpmu_lock);
+    }
+
+    switch ( vendor )
+    {
+    case X86_VENDOR_AMD:
+        ret = svm_vpmu_initialise(v);
+        break;
+
+    case X86_VENDOR_INTEL:
+        ret = vmx_vpmu_initialise(v);
+        break;
+
+    default:
+        if ( vpmu_mode != XENPMU_MODE_OFF )
+        {
+            printk(XENLOG_G_WARNING "VPMU: Unknown CPU vendor %d. "
+                   "Disabling VPMU\n", vendor);
+            opt_vpmu_enabled = 0;
+            vpmu_mode = XENPMU_MODE_OFF;
+        }
+        return; /* Don't bother restoring vpmu_count, VPMU is off forever */
+    }
+
+    if ( ret )
+        printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
+
+    /* Intel needs to initialize VPMU ops even if VPMU is not in use */
+    if ( (v->domain != hardware_domain) &&
+         (ret || (vpmu_mode == XENPMU_MODE_OFF)) )
+    {
+        spin_lock(&vpmu_lock);
+        vpmu_count--;
+        spin_unlock(&vpmu_lock);
+    }
+}
+
+static void vpmu_clear_last(void *arg)
+{
+    if ( this_cpu(last_vcpu) == arg )
+        this_cpu(last_vcpu) = NULL;
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    /*
+     * Need to clear last_vcpu in case it points to v.
+     * We can check here non-atomically whether it is 'v' since
+     * last_vcpu can never become 'v' again at this point.
+     * We will test it again in vpmu_clear_last() with interrupts
+     * disabled to make sure we don't clear someone else.
+     */
+    if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v )
+        on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                         vpmu_clear_last, v, 1);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, v, 1);
+         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
+
+    spin_lock(&vpmu_lock);
+    if ( v->domain != hardware_domain )
+        vpmu_count--;
+    spin_unlock(&vpmu_lock);
+}
+
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+    struct page_info *page;
+    uint64_t gfn = params->val;
+
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_ALL) && !is_hardware_domain(d)) )
+        return -EINVAL;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    v = d->vcpu[params->vcpu];
+    vpmu = vcpu_vpmu(v);
+
+    spin_lock(&vpmu->vpmu_lock);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        spin_unlock(&vpmu->vpmu_lock);
+        return -EEXIST;
+    }
+
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        spin_unlock(&vpmu->vpmu_lock);
+        return -ENOMEM;
+    }
+
+    vpmu_initialise(v);
+
+    spin_unlock(&vpmu->vpmu_lock);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+    uint64_t mfn;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if ( v != current )
+        vcpu_pause(v);
+
+    vpmu = vcpu_vpmu(v);
+    spin_lock(&vpmu->vpmu_lock);
+
+    vpmu_destroy(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        ASSERT(mfn != 0);
+        unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+        put_page_and_type(mfn_to_page(mfn));
+        v->arch.vpmu.xenpmu_data = NULL;
+    }
+
+    spin_unlock(&vpmu->vpmu_lock);
+
+    if ( v != current )
+        vcpu_unpause(v);
+}
+
+/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
+void vpmu_dump(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump )
+        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
+}
+
+long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret;
+    struct vcpu *curr;
+    struct xen_pmu_params pmu_params = {.val = 0};
+    struct xen_pmu_data *xenpmu_data;
+
+    if ( !opt_vpmu_enabled )
+        return -EOPNOTSUPP;
+
+    ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
+    if ( ret )
+        return ret;
+
+    /* Check major version when parameters are specified */
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    case XENPMU_feature_set:
+    case XENPMU_init:
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.version.maj != XENPMU_VER_MAJ )
+            return -EINVAL;
+    }
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    {
+        if ( (pmu_params.val &
+              ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
+             (hweight64(pmu_params.val) > 1) )
+            return -EINVAL;
+
+        /* 32-bit dom0 can only sample itself. */
+        if ( is_pv_32bit_vcpu(current) &&
+             (pmu_params.val & (XENPMU_MODE_HV | XENPMU_MODE_ALL)) )
+            return -EINVAL;
+
+        spin_lock(&vpmu_lock);
+
+        /*
+         * We can always safely switch between XENPMU_MODE_SELF and
+         * XENPMU_MODE_HV while other VPMUs are active.
+         */
+        if ( (vpmu_count == 0) || (vpmu_mode == pmu_params.val) ||
+             ((vpmu_mode ^ pmu_params.val) ==
+              (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+            vpmu_mode = pmu_params.val;
+        else
+        {
+            printk(XENLOG_WARNING "VPMU: Cannot change mode while"
+                                  " active VPMUs exist\n");
+            ret = -EBUSY;
+        }
+
+        spin_unlock(&vpmu_lock);
+
+        break;
+    }
+
+    case XENPMU_mode_get:
+        memset(&pmu_params, 0, sizeof(pmu_params));
+        pmu_params.val = vpmu_mode;
+
+        pmu_params.version.maj = XENPMU_VER_MAJ;
+        pmu_params.version.min = XENPMU_VER_MIN;
+
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+
+        break;
+
+    case XENPMU_feature_set:
+        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        spin_lock(&vpmu_lock);
+
+        if ( vpmu_count == 0 )
+            vpmu_features = pmu_params.val;
+        else
+        {
+            printk(XENLOG_WARNING "VPMU: Cannot change features while"
+                                  " active VPMUs exist\n");
+            ret = -EBUSY;
+        }
+
+        spin_unlock(&vpmu_lock);
+
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.val = vpmu_features;
+        if ( copy_field_to_guest(arg, &pmu_params, val) )
+            return -EFAULT;
+
+        break;
+
+    case XENPMU_init:
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_lvtpc_set:
+        curr = current;
+        xenpmu_data = curr->arch.vpmu.xenpmu_data;
+        if ( xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
+        break;
+
+    case XENPMU_flush:
+        curr = current;
+        xenpmu_data = curr->arch.vpmu.xenpmu_data;
+        if ( xenpmu_data == NULL )
+            return -EINVAL;
+        xenpmu_data->pmu.pmu_flags &= ~PMU_CACHED;
+        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
+        vpmu_load(curr);
+        break;
+
+    default:
+        ret = -EINVAL;
+    }
+
+    return ret;
+}
+
+static int __init vpmu_init(void)
+{
+    int vendor = current_cpu_data.x86_vendor;
+
+    if ( !opt_vpmu_enabled )
+    {
+        printk(XENLOG_INFO "VPMU: disabled\n");
+        return 0;
+    }
+
+    /* NMI watchdog uses LVTPC and HW counter */
+    if ( opt_watchdog && opt_vpmu_enabled )
+    {
+        printk(XENLOG_WARNING "NMI watchdog is enabled. Turning VPMU off.\n");
+        opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
+        return 0;
+    }
+
+    switch ( vendor )
+    {
+    case X86_VENDOR_AMD:
+        if ( amd_vpmu_init() )
+           vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    case X86_VENDOR_INTEL:
+        if ( core2_vpmu_init() )
+           vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    default:
+        printk(XENLOG_WARNING "VPMU: Unknown CPU vendor: %d. "
+               "Turning VPMU off.\n", vendor);
+        vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    }
+
+    if ( vpmu_mode != XENPMU_MODE_OFF )
+        printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "."
+               __stringify(XENPMU_VER_MIN) "\n");
+    else
+        opt_vpmu_enabled = 0;
+
+    return 0;
+}
+__initcall(vpmu_init);
diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c
new file mode 100644
index 0000000..ebe1970
--- /dev/null
+++ b/xen/arch/x86/cpu/vpmu_amd.c
@@ -0,0 +1,498 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2010, Advanced Micro Devices, Inc.
+ * Parts of this code are Copyright (c) 2007, Intel Corporation
+ *
+ * Author: Wei Wang <wei.wang2@amd.com>
+ * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/xenoprof.h>
+#include <xen/hvm/save.h>
+#include <xen/sched.h>
+#include <xen/irq.h>
+#include <asm/apic.h>
+#include <asm/vpmu.h>
+#include <asm/hvm/vlapic.h>
+#include <public/pmu.h>
+
+#define MSR_F10H_EVNTSEL_GO_SHIFT   40
+#define MSR_F10H_EVNTSEL_EN_SHIFT   22
+#define MSR_F10H_COUNTER_LENGTH     48
+
+#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT))
+#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1))))
+
+static unsigned int __read_mostly num_counters;
+static const u32 __read_mostly *counters;
+static const u32 __read_mostly *ctrls;
+static bool_t __read_mostly k7_counters_mirrored;
+
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+
+/* PMU Counter MSRs. */
+static const u32 AMD_F10H_COUNTERS[] = {
+    MSR_K7_PERFCTR0,
+    MSR_K7_PERFCTR1,
+    MSR_K7_PERFCTR2,
+    MSR_K7_PERFCTR3
+};
+
+/* PMU Control MSRs. */
+static const u32 AMD_F10H_CTRLS[] = {
+    MSR_K7_EVNTSEL0,
+    MSR_K7_EVNTSEL1,
+    MSR_K7_EVNTSEL2,
+    MSR_K7_EVNTSEL3
+};
+
+static const u32 AMD_F15H_COUNTERS[] = {
+    MSR_AMD_FAM15H_PERFCTR0,
+    MSR_AMD_FAM15H_PERFCTR1,
+    MSR_AMD_FAM15H_PERFCTR2,
+    MSR_AMD_FAM15H_PERFCTR3,
+    MSR_AMD_FAM15H_PERFCTR4,
+    MSR_AMD_FAM15H_PERFCTR5
+};
+
+static const u32 AMD_F15H_CTRLS[] = {
+    MSR_AMD_FAM15H_EVNTSEL0,
+    MSR_AMD_FAM15H_EVNTSEL1,
+    MSR_AMD_FAM15H_EVNTSEL2,
+    MSR_AMD_FAM15H_EVNTSEL3,
+    MSR_AMD_FAM15H_EVNTSEL4,
+    MSR_AMD_FAM15H_EVNTSEL5
+};
+
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)    do {                                    \
+                                   (vpmu)->priv_context = (void *)-1L; \
+                               } while (0)
+#define msr_bitmap_off(vpmu)   do {                                    \
+                                   (vpmu)->priv_context = NULL;        \
+                               } while (0)
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
+
+static inline int get_pmu_reg_type(u32 addr)
+{
+    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
+        return MSR_TYPE_CTRL;
+
+    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
+        return MSR_TYPE_COUNTER;
+
+    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
+         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
+    {
+        if (addr & 1)
+            return MSR_TYPE_COUNTER;
+        else
+            return MSR_TYPE_CTRL;
+    }
+
+    /* unsupported registers */
+    return -1;
+}
+
+static inline u32 get_fam15h_addr(u32 addr)
+{
+    switch ( addr )
+    {
+    case MSR_K7_PERFCTR0:
+        return MSR_AMD_FAM15H_PERFCTR0;
+    case MSR_K7_PERFCTR1:
+        return MSR_AMD_FAM15H_PERFCTR1;
+    case MSR_K7_PERFCTR2:
+        return MSR_AMD_FAM15H_PERFCTR2;
+    case MSR_K7_PERFCTR3:
+        return MSR_AMD_FAM15H_PERFCTR3;
+    case MSR_K7_EVNTSEL0:
+        return MSR_AMD_FAM15H_EVNTSEL0;
+    case MSR_K7_EVNTSEL1:
+        return MSR_AMD_FAM15H_EVNTSEL1;
+    case MSR_K7_EVNTSEL2:
+        return MSR_AMD_FAM15H_EVNTSEL2;
+    case MSR_K7_EVNTSEL3:
+        return MSR_AMD_FAM15H_EVNTSEL3;
+    default:
+        break;
+    }
+
+    return addr;
+}
+
+static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
+    }
+
+    msr_bitmap_on(vpmu);
+}
+
+static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
+    }
+
+    msr_bitmap_off(vpmu);
+}
+
+static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    return 1;
+}
+
+static inline void context_load(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
+    }
+}
+
+static void amd_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    vpmu_reset(vpmu, VPMU_FROZEN);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        unsigned int i;
+
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], ctrl_regs[i]);
+
+        return;
+    }
+
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+
+    context_load(v);
+}
+
+static inline void context_save(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+
+    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
+    for ( i = 0; i < num_counters; i++ )
+        rdmsrl(counters[i], counter_regs[i]);
+}
+
+static int amd_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    unsigned int i;
+
+    for ( i = 0; i < num_counters; i++ )
+        wrmsrl(ctrls[i], 0);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+    {
+        vpmu_set(vpmu, VPMU_FROZEN);
+        return 0;
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return 0;
+
+    context_save(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
+        amd_vpmu_unset_msr_bitmap(v);
+
+    return 1;
+}
+
+static void context_update(unsigned int msr, u64 msr_content)
+{
+    unsigned int i;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+
+    if ( k7_counters_mirrored &&
+        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
+    {
+        msr = get_fam15h_addr(msr);
+    }
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+       if ( msr == ctrls[i] )
+       {
+           ctrl_regs[i] = msr_content;
+           return;
+       }
+        else if (msr == counters[i] )
+        {
+            counter_regs[i] = msr_content;
+            return;
+        }
+    }
+}
+
+static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+                             uint64_t supported)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    ASSERT(!supported);
+
+    /* For all counters, enable guest only mode for HVM guest */
+    if ( has_hvm_container_vcpu(v) &&
+         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+         !is_guest_mode(msr_content) )
+    {
+        set_guest_mode(msr_content);
+    }
+
+    /* check if the first counter is enabled */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            return 0;
+        vpmu_set(vpmu, VPMU_RUNNING);
+
+        if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
+             amd_vpmu_set_msr_bitmap(v);
+    }
+
+    /* stop saving & restore if guest stops first counter */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        vpmu_reset(vpmu, VPMU_RUNNING);
+        if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
+             amd_vpmu_unset_msr_bitmap(v);
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    /* Update vpmu context immediately */
+    context_update(msr, msr_content);
+
+    /* Write to hw counters */
+    wrmsrl(msr, msr_content);
+    return 0;
+}
+
+static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    rdmsrl(msr, *msr_content);
+
+    return 0;
+}
+
+static void amd_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( has_hvm_container_vcpu(v) )
+    {
+        if ( is_msr_bitmap_on(vpmu) )
+            amd_vpmu_unset_msr_bitmap(v);
+
+        if ( is_hvm_vcpu(v) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
+}
+
+/* VPMU part of the 'q' keyhandler */
+static void amd_vpmu_dump(const struct vcpu *v)
+{
+    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    const uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    const uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
+    unsigned int i;
+
+    printk("    VPMU state: 0x%x ", vpmu->flags);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    {
+         printk("\n");
+         return;
+    }
+
+    printk("(");
+    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
+        printk("PASSIVE_DOMAIN_ALLOCATED, ");
+    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
+        printk("FROZEN, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+        printk("SAVE, ");
+    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
+        printk("RUNNING, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        printk("LOADED, ");
+    printk("ALLOCATED)\n");
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        uint64_t ctrl, cntr;
+
+        rdmsrl(ctrls[i], ctrl);
+        rdmsrl(counters[i], cntr);
+        printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
+    }
+}
+
+struct arch_vpmu_ops amd_vpmu_ops = {
+    .do_wrmsr = amd_vpmu_do_wrmsr,
+    .do_rdmsr = amd_vpmu_do_rdmsr,
+    .do_interrupt = amd_vpmu_do_interrupt,
+    .arch_vpmu_destroy = amd_vpmu_destroy,
+    .arch_vpmu_save = amd_vpmu_save,
+    .arch_vpmu_load = amd_vpmu_load,
+    .arch_vpmu_dump = amd_vpmu_dump
+};
+
+int svm_vpmu_initialise(struct vcpu *v)
+{
+    struct xen_pmu_amd_ctxt *ctxt;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return 0;
+
+    if ( !counters )
+        return -EINVAL;
+
+    if ( is_hvm_vcpu(v) )
+    {
+        ctxt = xzalloc_bytes(sizeof(*ctxt) +
+                             2 * sizeof(uint64_t) * num_counters);
+        if ( !ctxt )
+        {
+            printk(XENLOG_G_WARNING "%pv: Insufficient memory for PMU, "
+                   " PMU feature is unavailable\n", v);
+            return -ENOMEM;
+        }
+    }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
+
+    ctxt->counters = sizeof(*ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
+
+    vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
+
+    vpmu->arch_vpmu_ops = &amd_vpmu_ops;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    return 0;
+}
+
+int __init amd_vpmu_init(void)
+{
+    switch ( current_cpu_data.x86 )
+    {
+    case 0x15:
+        num_counters = F15H_NUM_COUNTERS;
+        counters = AMD_F15H_COUNTERS;
+        ctrls = AMD_F15H_CTRLS;
+        k7_counters_mirrored = 1;
+        break;
+    case 0x10:
+    case 0x12:
+    case 0x14:
+    case 0x16:
+        num_counters = F10H_NUM_COUNTERS;
+        counters = AMD_F10H_COUNTERS;
+        ctrls = AMD_F10H_CTRLS;
+        k7_counters_mirrored = 0;
+        break;
+    default:
+        printk(XENLOG_WARNING "VPMU: Unsupported CPU family %#x\n",
+               current_cpu_data.x86);
+        return -EINVAL;
+    }
+
+    if ( sizeof(struct xen_pmu_data) +
+         2 * sizeof(uint64_t) * num_counters > PAGE_SIZE )
+    {
+        printk(XENLOG_WARNING
+               "VPMU: Register bank does not fit into VPMU shared page\n");
+        counters = ctrls = NULL;
+        num_counters = 0;
+        return -ENOSPC;
+    }
+
+    return 0;
+}
+
diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
new file mode 100644
index 0000000..31dfdca
--- /dev/null
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -0,0 +1,939 @@
+/*
+ * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/irq.h>
+#include <asm/system.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/apic.h>
+#include <asm/traps.h>
+#include <asm/msr.h>
+#include <asm/msr-index.h>
+#include <asm/vpmu.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <public/sched.h>
+#include <public/hvm/save.h>
+#include <public/pmu.h>
+
+/*
+ * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
+ * instruction.
+ * cpuid 0xa - Architectural Performance Monitoring Leaf
+ * Register eax
+ */
+#define PMU_VERSION_SHIFT        0  /* Version ID */
+#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
+#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT)
+
+#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
+#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
+#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT)
+
+#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
+#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
+#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT)
+/* Register edx */
+#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
+#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
+#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT)
+
+#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
+#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
+#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT)
+
+/* Alias registers (0x4c1) for full-width writes to PMCs */
+#define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
+static bool_t __read_mostly full_width_write;
+
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
+/*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
+ * QUIRK to workaround an issue on various family 6 cpus.
+ * The issue leads to endless PMC interrupt loops on the processor.
+ * If the interrupt handler is running and a pmc reaches the value 0, this
+ * value remains forever and it triggers immediately a new interrupt after
+ * finishing the handler.
+ * A workaround is to read all flagged counters and if the value is 0 write
+ * 1 (or another value != 0) into it.
+ * There exist no errata and the real cause of this behaviour is unknown.
+ */
+bool_t __read_mostly is_pmc_quirk;
+
+static void check_pmc_quirk(void)
+{
+    if ( current_cpu_data.x86 == 6 )
+        is_pmc_quirk = 1;
+    else
+        is_pmc_quirk = 0;    
+}
+
+static void handle_pmc_quirk(u64 msr_content)
+{
+    int i;
+    u64 val;
+
+    if ( !is_pmc_quirk )
+        return;
+
+    val = msr_content;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_P6_PERFCTR(i), cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_P6_PERFCTR(i), 1);
+        }
+        val >>= 1;
+    }
+    val = msr_content >> 32;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+}
+
+/*
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ */
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
+
+    eax = cpuid_eax(0xa);
+    return MASK_EXTR(eax, PMU_GENERAL_NR_MASK);
+}
+
+/*
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
+ */
+static int core2_get_fixed_pmc_count(void)
+{
+    u32 eax;
+
+    eax = cpuid_eax(0xa);
+    return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
+}
+
+/* edx bits 5-12: Bit width of fixed-function performance counters  */
+static int core2_get_bitwidth_fix_count(void)
+{
+    u32 edx;
+
+    edx = cpuid_edx(0xa);
+    return MASK_EXTR(edx, PMU_FIXED_WIDTH_MASK);
+}
+
+static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
+{
+    u32 msr_index_pmc;
+
+    switch ( msr_index )
+    {
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+    case MSR_IA32_DS_AREA:
+    case MSR_IA32_PEBS_ENABLE:
+        *type = MSR_TYPE_CTRL;
+        return 1;
+
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        *type = MSR_TYPE_GLOBAL;
+        return 1;
+
+    default:
+
+        if ( (msr_index >= MSR_CORE_PERF_FIXED_CTR0) &&
+             (msr_index < MSR_CORE_PERF_FIXED_CTR0 + fixed_pmc_cnt) )
+        {
+            *index = msr_index - MSR_CORE_PERF_FIXED_CTR0;
+            *type = MSR_TYPE_COUNTER;
+            return 1;
+        }
+
+        if ( (msr_index >= MSR_P6_EVNTSEL(0)) &&
+             (msr_index < MSR_P6_EVNTSEL(arch_pmc_cnt)) )
+        {
+            *index = msr_index - MSR_P6_EVNTSEL(0);
+            *type = MSR_TYPE_ARCH_CTRL;
+            return 1;
+        }
+
+        msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
+        if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
+             (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
+        {
+            *type = MSR_TYPE_ARCH_COUNTER;
+            *index = msr_index_pmc - MSR_IA32_PERFCTR0;
+            return 1;
+        }
+        return 0;
+    }
+}
+
+static inline int msraddr_to_bitpos(int x)
+{
+    ASSERT(x == (x & 0x1fff));
+    return x;
+}
+
+static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    /* Allow Read/Write PMU Counters MSR Directly. */
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+
+        if ( full_width_write )
+        {
+            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
+            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
+                      msr_bitmap + 0x800/BYTES_PER_LONG);
+        }
+    }
+
+    /* Allow Read PMU Non-global Controls Directly. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+
+        if ( full_width_write )
+        {
+            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
+            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
+                      msr_bitmap + 0x800/BYTES_PER_LONG);
+        }
+    }
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static inline void __core2_vpmu_save(struct vcpu *v)
+{
+    int i;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( !has_hvm_container_vcpu(v) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
+}
+
+static int core2_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !has_hvm_container_vcpu(v) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
+        return 0;
+
+    __core2_vpmu_save(v);
+
+    /* Unset PMU MSR bitmap to trap lazy load. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_vcpu(v) && cpu_has_vmx_msr_bitmap )
+        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+    return 1;
+}
+
+static inline void __core2_vpmu_load(struct vcpu *v)
+{
+    unsigned int i, pmc_start;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
+
+    if ( full_width_write )
+        pmc_start = MSR_IA32_A_PERFCTR0;
+    else
+        pmc_start = MSR_IA32_PERFCTR0;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
+    }
+
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( !has_hvm_container_vcpu(v) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
+}
+
+static void core2_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+
+    __core2_vpmu_load(v);
+}
+
+static int core2_vpmu_alloc_resource(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *p = NULL;
+
+    p = xzalloc(uint64_t);
+    if ( !p )
+        goto out_err;
+
+    if ( has_hvm_container_vcpu(v) )
+    {
+        if ( is_hvm_vcpu(v) && !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err_hvm;
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err_hvm;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    }
+
+    if ( is_hvm_vcpu(v) )
+    {
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) *
+                                       arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err_hvm;
+    }
+    else
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+
+    core2_vpmu_cxt->fixed_counters = sizeof(*core2_vpmu_cxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+                                    sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = core2_vpmu_cxt;
+    vpmu->priv_context = p;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
+    return 1;
+
+ out_err_hvm:
+    xfree(core2_vpmu_cxt);
+    if ( is_hvm_vcpu(v) )
+        release_pmu_ownship(PMU_OWNER_HVM);
+
+ out_err:
+    xfree(p);
+
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
+}
+
+static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( !is_core2_vpmu_msr(msr_index, type, index) )
+        return 0;
+
+    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
+         !core2_vpmu_alloc_resource(current) )
+        return 0;
+
+    /* Do the lazy load staff. */
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        __core2_vpmu_load(current);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        if ( has_hvm_container_vcpu(current) &&
+             cpu_has_vmx_msr_bitmap )
+            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
+    }
+    return 1;
+}
+
+static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+                               uint64_t supported)
+{
+    int i, tmp;
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
+    uint64_t *enabled_cntrs;
+
+    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        /* Special handling for BTS */
+        if ( msr == MSR_IA32_DEBUGCTLMSR )
+        {
+            supported |= IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
+                         IA32_DEBUGCTLMSR_BTINT;
+
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
+                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
+            if ( !(msr_content & ~supported) &&
+                 vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                return 0;
+            if ( (msr_content & supported) &&
+                 !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                printk(XENLOG_G_WARNING
+                       "%pv: Debug Store unsupported on this CPU\n",
+                       current);
+        }
+        return 1;
+    }
+
+    ASSERT(!supported);
+
+    if ( type == MSR_TYPE_COUNTER &&
+         (msr_content &
+          ~((1ull << core2_get_bitwidth_fix_count()) - 1)) )
+        /* Writing unsupported bits to a fixed counter */
+        return 1;
+
+    core2_vpmu_cxt = vpmu->context;
+    enabled_cntrs = vpmu->priv_context;
+    switch ( msr )
+    {
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( msr_content & ~(0xC000000000000000 |
+                             (((1ULL << fixed_pmc_cnt) - 1) << 32) |
+                             ((1ULL << arch_pmc_cnt) - 1)) )
+            return 1;
+        core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+        return 0;
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
+                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
+        return 1;
+    case MSR_IA32_PEBS_ENABLE:
+        if ( msr_content & 1 )
+            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
+                     "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
+        return 0;
+    case MSR_IA32_DS_AREA:
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            if ( !is_canonical_address(msr_content) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
+                         msr_content);
+                return 1;
+            }
+            core2_vpmu_cxt->ds_area = msr_content;
+            break;
+        }
+        gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
+        return 0;
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+        core2_vpmu_cxt->global_ctrl = msr_content;
+        break;
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+        if ( msr_content &
+             ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
+            return 1;
+
+        if ( has_hvm_container_vcpu(v) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                               &core2_vpmu_cxt->global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
+        if ( msr_content != 0 )
+        {
+            u64 val = msr_content;
+            for ( i = 0; i < fixed_pmc_cnt; i++ )
+            {
+                if ( val & 3 )
+                    *enabled_cntrs |= (1ULL << 32) << i;
+                val >>= FIXED_CTR_CTRL_BITS;
+            }
+        }
+
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
+        break;
+    default:
+        tmp = msr - MSR_P6_EVNTSEL(0);
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
+            if ( msr_content & (~((1ull << 32) - 1)) )
+                return 1;
+
+            if ( has_hvm_container_vcpu(v) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                                   &core2_vpmu_cxt->global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+
+            if ( msr_content & (1ULL << 22) )
+                *enabled_cntrs |= 1ULL << tmp;
+            else
+                *enabled_cntrs &= ~(1ULL << tmp);
+
+            xen_pmu_cntr_pair[tmp].control = msr_content;
+        }
+    }
+
+    if ( type != MSR_TYPE_GLOBAL )
+        wrmsrl(msr, msr_content);
+    else
+    {
+        if ( has_hvm_container_vcpu(v) )
+            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+        else
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
+
+    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0) )
+        vpmu_set(vpmu, VPMU_RUNNING);
+    else
+        vpmu_reset(vpmu, VPMU_RUNNING);
+
+    return 0;
+}
+
+static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
+
+    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        core2_vpmu_cxt = vpmu->context;
+        switch ( msr )
+        {
+        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            *msr_content = 0;
+            break;
+        case MSR_CORE_PERF_GLOBAL_STATUS:
+            *msr_content = core2_vpmu_cxt->global_status;
+            break;
+        case MSR_CORE_PERF_GLOBAL_CTRL:
+            if ( has_hvm_container_vcpu(v) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
+            break;
+        default:
+            rdmsrl(msr, *msr_content);
+        }
+    }
+    else if ( msr == MSR_IA32_MISC_ENABLE )
+    {
+        /* Extension for BTS */
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
+    }
+
+    return 0;
+}
+
+static void core2_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    if (input == 0x1)
+    {
+        struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            /* Switch on the 'Debug Store' feature in CPUID.EAX[1]:EDX[21] */
+            *edx |= cpufeat_mask(X86_FEATURE_DS);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
+        }
+    }
+}
+
+/* Dump vpmu info on console, called in the context of keyhandler 'q'. */
+static void core2_vpmu_dump(const struct vcpu *v)
+{
+    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    unsigned int i;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
+    u64 val;
+    uint64_t *fixed_counters;
+    struct xen_pmu_cntr_pair *cntr_pair;
+
+    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+         return;
+
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            printk("    vPMU loaded\n");
+        else
+            printk("    vPMU allocated\n");
+        return;
+    }
+
+    printk("    vPMU running\n");
+
+    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+
+    /* Print the contents of the counter and its configuration msr. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+            i, cntr_pair[i].counter, cntr_pair[i].control);
+
+    /*
+     * The configuration of the fixed counter is 4 bits each in the
+     * MSR_CORE_PERF_FIXED_CTR_CTRL.
+     */
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, fixed_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
+        val >>= FIXED_CTR_CTRL_BITS;
+    }
+}
+
+static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    u64 msr_content;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
+
+    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
+    if ( msr_content )
+    {
+        if ( is_pmc_quirk )
+            handle_pmc_quirk(msr_content);
+        core2_vpmu_cxt->global_status |= msr_content;
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+    }
+    else
+    {
+        /* No PMC overflow but perhaps a Trace Message interrupt. */
+        __vmread(GUEST_IA32_DEBUGCTL, &msr_content);
+        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
+            return 0;
+    }
+
+    return 1;
+}
+
+static void core2_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( has_hvm_container_vcpu(v) )
+    {
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+        if ( is_hvm_vcpu(v) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    xfree(vpmu->priv_context);
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
+}
+
+struct arch_vpmu_ops core2_vpmu_ops = {
+    .do_wrmsr = core2_vpmu_do_wrmsr,
+    .do_rdmsr = core2_vpmu_do_rdmsr,
+    .do_interrupt = core2_vpmu_do_interrupt,
+    .do_cpuid = core2_vpmu_do_cpuid,
+    .arch_vpmu_destroy = core2_vpmu_destroy,
+    .arch_vpmu_save = core2_vpmu_save,
+    .arch_vpmu_load = core2_vpmu_load,
+    .arch_vpmu_dump = core2_vpmu_dump
+};
+
+static void core2_no_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    /*
+     * As in this case the vpmu is not enabled reset some bits in the
+     * architectural performance monitoring related part.
+     */
+    if ( input == 0xa )
+    {
+        *eax &= ~PMU_VERSION_MASK;
+        *eax &= ~PMU_GENERAL_NR_MASK;
+        *eax &= ~PMU_GENERAL_WIDTH_MASK;
+
+        *edx &= ~PMU_FIXED_NR_MASK;
+        *edx &= ~PMU_FIXED_WIDTH_MASK;
+    }
+}
+
+/*
+ * If its a vpmu msr set it to 0.
+ */
+static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    if ( !is_core2_vpmu_msr(msr, &type, &index) )
+        return 1;
+    *msr_content = 0;
+    return 0;
+}
+
+/*
+ * These functions are used in case vpmu is not enabled.
+ */
+struct arch_vpmu_ops core2_no_vpmu_ops = {
+    .do_rdmsr = core2_no_vpmu_do_rdmsr,
+    .do_cpuid = core2_no_vpmu_do_cpuid,
+};
+
+int vmx_vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    u64 msr_content;
+    static bool_t ds_warned;
+
+    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return 0;
+
+    if ( (arch_pmc_cnt + fixed_pmc_cnt) == 0 )
+        return -EINVAL;
+
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
+        goto func_out;
+    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
+    while ( boot_cpu_has(X86_FEATURE_DS) )
+    {
+        if ( !boot_cpu_has(X86_FEATURE_DTES64) )
+        {
+            if ( !ds_warned )
+                printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
+                       " - Debug Store disabled for guests\n");
+            break;
+        }
+        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
+        {
+            /* If BTS_UNAVAIL is set reset the DS feature. */
+            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
+            if ( !ds_warned )
+                printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
+                       " - Debug Store disabled for guests\n");
+            break;
+        }
+
+        vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
+        if ( !ds_warned )
+        {
+            if ( !boot_cpu_has(X86_FEATURE_DSCPL) )
+                printk(XENLOG_G_INFO
+                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
+            printk("******************************************************\n");
+            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
+            printk("** Using this processor feature in a virtualized    **\n");
+            printk("** environment is not 100%% safe.                    **\n");
+            printk("** Setting the DS buffer address with wrong values  **\n");
+            printk("** may lead to hypervisor hangs or crashes.         **\n");
+            printk("** It is NOT recommended for production use!        **\n");
+            printk("******************************************************\n");
+        }
+        break;
+    }
+    ds_warned = 1;
+ func_out:
+
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_vcpu(v) && !core2_vpmu_alloc_resource(v) )
+        return -EIO;
+
+    vpmu->arch_vpmu_ops = &core2_vpmu_ops;
+
+    return 0;
+}
+
+int __init core2_vpmu_init(void)
+{
+    u64 caps;
+
+    if ( current_cpu_data.x86 != 6 )
+    {
+        printk(XENLOG_WARNING "VPMU: only family 6 is supported\n");
+        return -EINVAL;
+    }
+
+    switch ( current_cpu_data.x86_model )
+    {
+        /* Core2: */
+        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
+        case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
+        case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
+        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
+
+        case 0x2a: /* SandyBridge */
+        case 0x2d: /* SandyBridge, "Romley-EP" */
+
+        /* Nehalem: */
+        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
+        case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */
+        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
+
+        /* Westmere: */
+        case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */
+        case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */
+        case 0x27: /* 32 nm Westmere-EX */
+
+        case 0x3a: /* IvyBridge */
+        case 0x3e: /* IvyBridge EP */
+
+        /* Haswell: */
+        case 0x3c:
+        case 0x3f:
+        case 0x45:
+        case 0x46:
+
+        /* future: */
+        case 0x3d:
+        case 0x4e:
+            break;
+    default:
+        printk(XENLOG_WARNING "VPMU: Unsupported CPU model %#x\n",
+               current_cpu_data.x86_model);
+        return -EINVAL;
+    }
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
+    full_width_write = (caps >> 13) & 1;
+
+    check_pmc_quirk();
+
+    if ( sizeof(struct xen_pmu_data) + sizeof(uint64_t) * fixed_pmc_cnt +
+         sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > PAGE_SIZE )
+    {
+        printk(XENLOG_WARNING
+               "VPMU: Register bank does not fit into VPMU share page\n");
+        arch_pmc_cnt = fixed_pmc_cnt = 0;
+        return -ENOSPC;
+    }
+
+    return 0;
+}
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..742b83b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,3 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
index a10a55e..760d295 100644
--- a/xen/arch/x86/hvm/svm/Makefile
+++ b/xen/arch/x86/hvm/svm/Makefile
@@ -6,4 +6,3 @@ obj-y += nestedsvm.o
 obj-y += svm.o
 obj-y += svmdebug.o
 obj-y += vmcb.o
-obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
deleted file mode 100644
index 0997901..0000000
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ /dev/null
@@ -1,498 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2010, Advanced Micro Devices, Inc.
- * Parts of this code are Copyright (c) 2007, Intel Corporation
- *
- * Author: Wei Wang <wei.wang2@amd.com>
- * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- */
-
-#include <xen/config.h>
-#include <xen/xenoprof.h>
-#include <xen/hvm/save.h>
-#include <xen/sched.h>
-#include <xen/irq.h>
-#include <asm/apic.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vpmu.h>
-#include <public/pmu.h>
-
-#define MSR_F10H_EVNTSEL_GO_SHIFT   40
-#define MSR_F10H_EVNTSEL_EN_SHIFT   22
-#define MSR_F10H_COUNTER_LENGTH     48
-
-#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT))
-#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1))))
-
-static unsigned int __read_mostly num_counters;
-static const u32 __read_mostly *counters;
-static const u32 __read_mostly *ctrls;
-static bool_t __read_mostly k7_counters_mirrored;
-
-#define F10H_NUM_COUNTERS   4
-#define F15H_NUM_COUNTERS   6
-
-/* PMU Counter MSRs. */
-static const u32 AMD_F10H_COUNTERS[] = {
-    MSR_K7_PERFCTR0,
-    MSR_K7_PERFCTR1,
-    MSR_K7_PERFCTR2,
-    MSR_K7_PERFCTR3
-};
-
-/* PMU Control MSRs. */
-static const u32 AMD_F10H_CTRLS[] = {
-    MSR_K7_EVNTSEL0,
-    MSR_K7_EVNTSEL1,
-    MSR_K7_EVNTSEL2,
-    MSR_K7_EVNTSEL3
-};
-
-static const u32 AMD_F15H_COUNTERS[] = {
-    MSR_AMD_FAM15H_PERFCTR0,
-    MSR_AMD_FAM15H_PERFCTR1,
-    MSR_AMD_FAM15H_PERFCTR2,
-    MSR_AMD_FAM15H_PERFCTR3,
-    MSR_AMD_FAM15H_PERFCTR4,
-    MSR_AMD_FAM15H_PERFCTR5
-};
-
-static const u32 AMD_F15H_CTRLS[] = {
-    MSR_AMD_FAM15H_EVNTSEL0,
-    MSR_AMD_FAM15H_EVNTSEL1,
-    MSR_AMD_FAM15H_EVNTSEL2,
-    MSR_AMD_FAM15H_EVNTSEL3,
-    MSR_AMD_FAM15H_EVNTSEL4,
-    MSR_AMD_FAM15H_EVNTSEL5
-};
-
-/* Use private context as a flag for MSR bitmap */
-#define msr_bitmap_on(vpmu)    do {                                    \
-                                   (vpmu)->priv_context = (void *)-1L; \
-                               } while (0)
-#define msr_bitmap_off(vpmu)   do {                                    \
-                                   (vpmu)->priv_context = NULL;        \
-                               } while (0)
-#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
-
-static inline int get_pmu_reg_type(u32 addr)
-{
-    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
-        return MSR_TYPE_CTRL;
-
-    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
-        return MSR_TYPE_COUNTER;
-
-    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
-         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
-    {
-        if (addr & 1)
-            return MSR_TYPE_COUNTER;
-        else
-            return MSR_TYPE_CTRL;
-    }
-
-    /* unsupported registers */
-    return -1;
-}
-
-static inline u32 get_fam15h_addr(u32 addr)
-{
-    switch ( addr )
-    {
-    case MSR_K7_PERFCTR0:
-        return MSR_AMD_FAM15H_PERFCTR0;
-    case MSR_K7_PERFCTR1:
-        return MSR_AMD_FAM15H_PERFCTR1;
-    case MSR_K7_PERFCTR2:
-        return MSR_AMD_FAM15H_PERFCTR2;
-    case MSR_K7_PERFCTR3:
-        return MSR_AMD_FAM15H_PERFCTR3;
-    case MSR_K7_EVNTSEL0:
-        return MSR_AMD_FAM15H_EVNTSEL0;
-    case MSR_K7_EVNTSEL1:
-        return MSR_AMD_FAM15H_EVNTSEL1;
-    case MSR_K7_EVNTSEL2:
-        return MSR_AMD_FAM15H_EVNTSEL2;
-    case MSR_K7_EVNTSEL3:
-        return MSR_AMD_FAM15H_EVNTSEL3;
-    default:
-        break;
-    }
-
-    return addr;
-}
-
-static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
-    }
-
-    msr_bitmap_on(vpmu);
-}
-
-static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
-    }
-
-    msr_bitmap_off(vpmu);
-}
-
-static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    return 1;
-}
-
-static inline void context_load(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        wrmsrl(counters[i], counter_regs[i]);
-        wrmsrl(ctrls[i], ctrl_regs[i]);
-    }
-}
-
-static void amd_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    vpmu_reset(vpmu, VPMU_FROZEN);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        unsigned int i;
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctrl_regs[i]);
-
-        return;
-    }
-
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-
-    context_load(v);
-}
-
-static inline void context_save(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-
-    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
-    for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], counter_regs[i]);
-}
-
-static int amd_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    unsigned int i;
-
-    for ( i = 0; i < num_counters; i++ )
-        wrmsrl(ctrls[i], 0);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-    {
-        vpmu_set(vpmu, VPMU_FROZEN);
-        return 0;
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return 0;
-
-    context_save(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
-        amd_vpmu_unset_msr_bitmap(v);
-
-    return 1;
-}
-
-static void context_update(unsigned int msr, u64 msr_content)
-{
-    unsigned int i;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-
-    if ( k7_counters_mirrored &&
-        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
-    {
-        msr = get_fam15h_addr(msr);
-    }
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-       if ( msr == ctrls[i] )
-       {
-           ctrl_regs[i] = msr_content;
-           return;
-       }
-        else if (msr == counters[i] )
-        {
-            counter_regs[i] = msr_content;
-            return;
-        }
-    }
-}
-
-static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
-                             uint64_t supported)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    ASSERT(!supported);
-
-    /* For all counters, enable guest only mode for HVM guest */
-    if ( has_hvm_container_vcpu(v) &&
-         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-         !is_guest_mode(msr_content) )
-    {
-        set_guest_mode(msr_content);
-    }
-
-    /* check if the first counter is enabled */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 0;
-        vpmu_set(vpmu, VPMU_RUNNING);
-
-        if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
-             amd_vpmu_set_msr_bitmap(v);
-    }
-
-    /* stop saving & restore if guest stops first counter */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
-             amd_vpmu_unset_msr_bitmap(v);
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    /* Update vpmu context immediately */
-    context_update(msr, msr_content);
-
-    /* Write to hw counters */
-    wrmsrl(msr, msr_content);
-    return 0;
-}
-
-static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    rdmsrl(msr, *msr_content);
-
-    return 0;
-}
-
-static void amd_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( has_hvm_container_vcpu(v) )
-    {
-        if ( is_msr_bitmap_on(vpmu) )
-            amd_vpmu_unset_msr_bitmap(v);
-
-        if ( is_hvm_vcpu(v) )
-            xfree(vpmu->context);
-
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    vpmu->context = NULL;
-    vpmu_clear(vpmu);
-}
-
-/* VPMU part of the 'q' keyhandler */
-static void amd_vpmu_dump(const struct vcpu *v)
-{
-    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
-    const uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
-    const uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
-    unsigned int i;
-
-    printk("    VPMU state: 0x%x ", vpmu->flags);
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-    {
-         printk("\n");
-         return;
-    }
-
-    printk("(");
-    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
-        printk("PASSIVE_DOMAIN_ALLOCATED, ");
-    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
-        printk("FROZEN, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        printk("SAVE, ");
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-        printk("RUNNING, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        printk("LOADED, ");
-    printk("ALLOCATED)\n");
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        uint64_t ctrl, cntr;
-
-        rdmsrl(ctrls[i], ctrl);
-        rdmsrl(counters[i], cntr);
-        printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctrl_regs[i], ctrl,
-               counters[i], counter_regs[i], cntr);
-    }
-}
-
-struct arch_vpmu_ops amd_vpmu_ops = {
-    .do_wrmsr = amd_vpmu_do_wrmsr,
-    .do_rdmsr = amd_vpmu_do_rdmsr,
-    .do_interrupt = amd_vpmu_do_interrupt,
-    .arch_vpmu_destroy = amd_vpmu_destroy,
-    .arch_vpmu_save = amd_vpmu_save,
-    .arch_vpmu_load = amd_vpmu_load,
-    .arch_vpmu_dump = amd_vpmu_dump
-};
-
-int svm_vpmu_initialise(struct vcpu *v)
-{
-    struct xen_pmu_amd_ctxt *ctxt;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu_mode == XENPMU_MODE_OFF )
-        return 0;
-
-    if ( !counters )
-        return -EINVAL;
-
-    if ( is_hvm_vcpu(v) )
-    {
-        ctxt = xzalloc_bytes(sizeof(*ctxt) +
-                             2 * sizeof(uint64_t) * num_counters);
-        if ( !ctxt )
-        {
-            printk(XENLOG_G_WARNING "%pv: Insufficient memory for PMU, "
-                   " PMU feature is unavailable\n", v);
-            return -ENOMEM;
-        }
-    }
-    else
-        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
-
-    ctxt->counters = sizeof(*ctxt);
-    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
-
-    vpmu->context = ctxt;
-    vpmu->priv_context = NULL;
-
-    vpmu->arch_vpmu_ops = &amd_vpmu_ops;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    return 0;
-}
-
-int __init amd_vpmu_init(void)
-{
-    switch ( current_cpu_data.x86 )
-    {
-    case 0x15:
-        num_counters = F15H_NUM_COUNTERS;
-        counters = AMD_F15H_COUNTERS;
-        ctrls = AMD_F15H_CTRLS;
-        k7_counters_mirrored = 1;
-        break;
-    case 0x10:
-    case 0x12:
-    case 0x14:
-    case 0x16:
-        num_counters = F10H_NUM_COUNTERS;
-        counters = AMD_F10H_COUNTERS;
-        ctrls = AMD_F10H_CTRLS;
-        k7_counters_mirrored = 0;
-        break;
-    default:
-        printk(XENLOG_WARNING "VPMU: Unsupported CPU family %#x\n",
-               current_cpu_data.x86);
-        return -EINVAL;
-    }
-
-    if ( sizeof(struct xen_pmu_data) +
-         2 * sizeof(uint64_t) * num_counters > PAGE_SIZE )
-    {
-        printk(XENLOG_WARNING
-               "VPMU: Register bank does not fit into VPMU shared page\n");
-        counters = ctrls = NULL;
-        num_counters = 0;
-        return -ENOSPC;
-    }
-
-    return 0;
-}
-
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index cee8699..56b168c 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -33,12 +33,12 @@
 #include <asm/page.h>
 #include <asm/apic.h>
 #include <asm/io_apic.h>
+#include <asm/vpmu.h>
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/io.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
-#include <asm/hvm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..04a29ce 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -3,5 +3,4 @@ obj-y += intr.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
-obj-y += vpmu_core2.o
 obj-y += vvmx.o
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
deleted file mode 100644
index 66d7bc0..0000000
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ /dev/null
@@ -1,939 +0,0 @@
-/*
- * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/irq.h>
-#include <asm/system.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/apic.h>
-#include <asm/traps.h>
-#include <asm/msr.h>
-#include <asm/msr-index.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <public/sched.h>
-#include <public/hvm/save.h>
-#include <public/pmu.h>
-#include <asm/hvm/vpmu.h>
-
-/*
- * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
- * instruction.
- * cpuid 0xa - Architectural Performance Monitoring Leaf
- * Register eax
- */
-#define PMU_VERSION_SHIFT        0  /* Version ID */
-#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
-#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT)
-
-#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
-#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
-#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT)
-
-#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
-#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
-#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT)
-/* Register edx */
-#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
-#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
-#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT)
-
-#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
-#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
-#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT)
-
-/* Alias registers (0x4c1) for full-width writes to PMCs */
-#define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
-static bool_t __read_mostly full_width_write;
-
-/* Intel-specific VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-/*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
- */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* Number of general-purpose and fixed performance counters */
-static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
-
-/*
- * QUIRK to workaround an issue on various family 6 cpus.
- * The issue leads to endless PMC interrupt loops on the processor.
- * If the interrupt handler is running and a pmc reaches the value 0, this
- * value remains forever and it triggers immediately a new interrupt after
- * finishing the handler.
- * A workaround is to read all flagged counters and if the value is 0 write
- * 1 (or another value != 0) into it.
- * There exist no errata and the real cause of this behaviour is unknown.
- */
-bool_t __read_mostly is_pmc_quirk;
-
-static void check_pmc_quirk(void)
-{
-    if ( current_cpu_data.x86 == 6 )
-        is_pmc_quirk = 1;
-    else
-        is_pmc_quirk = 0;    
-}
-
-static void handle_pmc_quirk(u64 msr_content)
-{
-    int i;
-    u64 val;
-
-    if ( !is_pmc_quirk )
-        return;
-
-    val = msr_content;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_P6_PERFCTR(i), cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_P6_PERFCTR(i), 1);
-        }
-        val >>= 1;
-    }
-    val = msr_content >> 32;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-}
-
-/*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
- */
-static int core2_get_arch_pmc_count(void)
-{
-    u32 eax;
-
-    eax = cpuid_eax(0xa);
-    return MASK_EXTR(eax, PMU_GENERAL_NR_MASK);
-}
-
-/*
- * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
- */
-static int core2_get_fixed_pmc_count(void)
-{
-    u32 eax;
-
-    eax = cpuid_eax(0xa);
-    return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
-}
-
-/* edx bits 5-12: Bit width of fixed-function performance counters  */
-static int core2_get_bitwidth_fix_count(void)
-{
-    u32 edx;
-
-    edx = cpuid_edx(0xa);
-    return MASK_EXTR(edx, PMU_FIXED_WIDTH_MASK);
-}
-
-static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
-{
-    u32 msr_index_pmc;
-
-    switch ( msr_index )
-    {
-    case MSR_CORE_PERF_FIXED_CTR_CTRL:
-    case MSR_IA32_DS_AREA:
-    case MSR_IA32_PEBS_ENABLE:
-        *type = MSR_TYPE_CTRL;
-        return 1;
-
-    case MSR_CORE_PERF_GLOBAL_CTRL:
-    case MSR_CORE_PERF_GLOBAL_STATUS:
-    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        *type = MSR_TYPE_GLOBAL;
-        return 1;
-
-    default:
-
-        if ( (msr_index >= MSR_CORE_PERF_FIXED_CTR0) &&
-             (msr_index < MSR_CORE_PERF_FIXED_CTR0 + fixed_pmc_cnt) )
-        {
-            *index = msr_index - MSR_CORE_PERF_FIXED_CTR0;
-            *type = MSR_TYPE_COUNTER;
-            return 1;
-        }
-
-        if ( (msr_index >= MSR_P6_EVNTSEL(0)) &&
-             (msr_index < MSR_P6_EVNTSEL(arch_pmc_cnt)) )
-        {
-            *index = msr_index - MSR_P6_EVNTSEL(0);
-            *type = MSR_TYPE_ARCH_CTRL;
-            return 1;
-        }
-
-        msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
-        if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-             (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
-        {
-            *type = MSR_TYPE_ARCH_COUNTER;
-            *index = msr_index_pmc - MSR_IA32_PERFCTR0;
-            return 1;
-        }
-        return 0;
-    }
-}
-
-static inline int msraddr_to_bitpos(int x)
-{
-    ASSERT(x == (x & 0x1fff));
-    return x;
-}
-
-static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-
-        if ( full_width_write )
-        {
-            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
-            clear_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
-                      msr_bitmap + 0x800/BYTES_PER_LONG);
-        }
-    }
-
-    /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
-
-    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-
-        if ( full_width_write )
-        {
-            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i), msr_bitmap);
-            set_bit(msraddr_to_bitpos(MSR_IA32_A_PERFCTR0 + i),
-                      msr_bitmap + 0x800/BYTES_PER_LONG);
-        }
-    }
-
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
-
-    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static inline void __core2_vpmu_save(struct vcpu *v)
-{
-    int i;
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
-
-    if ( !has_hvm_container_vcpu(v) )
-        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
-}
-
-static int core2_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !has_hvm_container_vcpu(v) )
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
-        return 0;
-
-    __core2_vpmu_save(v);
-
-    /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_vcpu(v) && cpu_has_vmx_msr_bitmap )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-
-    return 1;
-}
-
-static inline void __core2_vpmu_load(struct vcpu *v)
-{
-    unsigned int i, pmc_start;
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
-
-    if ( full_width_write )
-        pmc_start = MSR_IA32_A_PERFCTR0;
-    else
-        pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
-    }
-
-    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
-    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
-    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
-
-    if ( !has_hvm_container_vcpu(v) )
-    {
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
-        core2_vpmu_cxt->global_ovf_ctrl = 0;
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-    }
-}
-
-static void core2_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-
-    __core2_vpmu_load(v);
-}
-
-static int core2_vpmu_alloc_resource(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
-    uint64_t *p = NULL;
-
-    p = xzalloc(uint64_t);
-    if ( !p )
-        goto out_err;
-
-    if ( has_hvm_container_vcpu(v) )
-    {
-        if ( is_hvm_vcpu(v) && !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            goto out_err;
-
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-            goto out_err_hvm;
-        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-            goto out_err_hvm;
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    }
-
-    if ( is_hvm_vcpu(v) )
-    {
-        core2_vpmu_cxt = xzalloc_bytes(sizeof(*core2_vpmu_cxt) +
-                                       sizeof(uint64_t) * fixed_pmc_cnt +
-                                       sizeof(struct xen_pmu_cntr_pair) *
-                                       arch_pmc_cnt);
-        if ( !core2_vpmu_cxt )
-            goto out_err_hvm;
-    }
-    else
-        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
-
-    core2_vpmu_cxt->fixed_counters = sizeof(*core2_vpmu_cxt);
-    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
-                                    sizeof(uint64_t) * fixed_pmc_cnt;
-
-    vpmu->context = core2_vpmu_cxt;
-    vpmu->priv_context = p;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-
-    return 1;
-
- out_err_hvm:
-    xfree(core2_vpmu_cxt);
-    if ( is_hvm_vcpu(v) )
-        release_pmu_ownship(PMU_OWNER_HVM);
-
- out_err:
-    xfree(p);
-
-    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
-           v->vcpu_id, v->domain->domain_id);
-
-    return 0;
-}
-
-static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( !is_core2_vpmu_msr(msr_index, type, index) )
-        return 0;
-
-    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-         !core2_vpmu_alloc_resource(current) )
-        return 0;
-
-    /* Do the lazy load staff. */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        __core2_vpmu_load(current);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( has_hvm_container_vcpu(current) &&
-             cpu_has_vmx_msr_bitmap )
-            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
-    }
-    return 1;
-}
-
-static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
-                               uint64_t supported)
-{
-    int i, tmp;
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
-    uint64_t *enabled_cntrs;
-
-    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        /* Special handling for BTS */
-        if ( msr == MSR_IA32_DEBUGCTLMSR )
-        {
-            supported |= IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
-                         IA32_DEBUGCTLMSR_BTINT;
-
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
-                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
-            if ( !(msr_content & ~supported) &&
-                 vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                return 0;
-            if ( (msr_content & supported) &&
-                 !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                printk(XENLOG_G_WARNING
-                       "%pv: Debug Store unsupported on this CPU\n",
-                       current);
-        }
-        return 1;
-    }
-
-    ASSERT(!supported);
-
-    if ( type == MSR_TYPE_COUNTER &&
-         (msr_content &
-          ~((1ull << core2_get_bitwidth_fix_count()) - 1)) )
-        /* Writing unsupported bits to a fixed counter */
-        return 1;
-
-    core2_vpmu_cxt = vpmu->context;
-    enabled_cntrs = vpmu->priv_context;
-    switch ( msr )
-    {
-    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        if ( msr_content & ~(0xC000000000000000 |
-                             (((1ULL << fixed_pmc_cnt) - 1) << 32) |
-                             ((1ULL << arch_pmc_cnt) - 1)) )
-            return 1;
-        core2_vpmu_cxt->global_status &= ~msr_content;
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-        return 0;
-    case MSR_CORE_PERF_GLOBAL_STATUS:
-        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
-                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        return 1;
-    case MSR_IA32_PEBS_ENABLE:
-        if ( msr_content & 1 )
-            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
-                     "which is not supported.\n");
-        core2_vpmu_cxt->pebs_enable = msr_content;
-        return 0;
-    case MSR_IA32_DS_AREA:
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            if ( !is_canonical_address(msr_content) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
-                         msr_content);
-                return 1;
-            }
-            core2_vpmu_cxt->ds_area = msr_content;
-            break;
-        }
-        gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
-        return 0;
-    case MSR_CORE_PERF_GLOBAL_CTRL:
-        core2_vpmu_cxt->global_ctrl = msr_content;
-        break;
-    case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        if ( msr_content &
-             ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
-            return 1;
-
-        if ( has_hvm_container_vcpu(v) )
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                               &core2_vpmu_cxt->global_ctrl);
-        else
-            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
-        if ( msr_content != 0 )
-        {
-            u64 val = msr_content;
-            for ( i = 0; i < fixed_pmc_cnt; i++ )
-            {
-                if ( val & 3 )
-                    *enabled_cntrs |= (1ULL << 32) << i;
-                val >>= FIXED_CTR_CTRL_BITS;
-            }
-        }
-
-        core2_vpmu_cxt->fixed_ctrl = msr_content;
-        break;
-    default:
-        tmp = msr - MSR_P6_EVNTSEL(0);
-        if ( tmp >= 0 && tmp < arch_pmc_cnt )
-        {
-            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
-                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-
-            if ( msr_content & (~((1ull << 32) - 1)) )
-                return 1;
-
-            if ( has_hvm_container_vcpu(v) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                                   &core2_vpmu_cxt->global_ctrl);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-
-            if ( msr_content & (1ULL << 22) )
-                *enabled_cntrs |= 1ULL << tmp;
-            else
-                *enabled_cntrs &= ~(1ULL << tmp);
-
-            xen_pmu_cntr_pair[tmp].control = msr_content;
-        }
-    }
-
-    if ( type != MSR_TYPE_GLOBAL )
-        wrmsrl(msr, msr_content);
-    else
-    {
-        if ( has_hvm_container_vcpu(v) )
-            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-        else
-            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-    }
-
-    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
-         (core2_vpmu_cxt->ds_area != 0) )
-        vpmu_set(vpmu, VPMU_RUNNING);
-    else
-        vpmu_reset(vpmu, VPMU_RUNNING);
-
-    return 0;
-}
-
-static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt;
-
-    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        core2_vpmu_cxt = vpmu->context;
-        switch ( msr )
-        {
-        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-            *msr_content = 0;
-            break;
-        case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_status;
-            break;
-        case MSR_CORE_PERF_GLOBAL_CTRL:
-            if ( has_hvm_container_vcpu(v) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
-            break;
-        default:
-            rdmsrl(msr, *msr_content);
-        }
-    }
-    else if ( msr == MSR_IA32_MISC_ENABLE )
-    {
-        /* Extension for BTS */
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-    }
-
-    return 0;
-}
-
-static void core2_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    if (input == 0x1)
-    {
-        struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            /* Switch on the 'Debug Store' feature in CPUID.EAX[1]:EDX[21] */
-            *edx |= cpufeat_mask(X86_FEATURE_DS);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
-        }
-    }
-}
-
-/* Dump vpmu info on console, called in the context of keyhandler 'q'. */
-static void core2_vpmu_dump(const struct vcpu *v)
-{
-    const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    unsigned int i;
-    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
-    u64 val;
-    uint64_t *fixed_counters;
-    struct xen_pmu_cntr_pair *cntr_pair;
-
-    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-         return;
-
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-            printk("    vPMU loaded\n");
-        else
-            printk("    vPMU allocated\n");
-        return;
-    }
-
-    printk("    vPMU running\n");
-
-    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
-    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
-
-    /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-            i, cntr_pair[i].counter, cntr_pair[i].control);
-
-    /*
-     * The configuration of the fixed counter is 4 bits each in the
-     * MSR_CORE_PERF_FIXED_CTR_CTRL.
-     */
-    val = core2_vpmu_cxt->fixed_ctrl;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, fixed_counters[i],
-               val & FIXED_CTR_CTRL_MASK);
-        val >>= FIXED_CTR_CTRL_BITS;
-    }
-}
-
-static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    u64 msr_content;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
-
-    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
-    if ( msr_content )
-    {
-        if ( is_pmc_quirk )
-            handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-    }
-    else
-    {
-        /* No PMC overflow but perhaps a Trace Message interrupt. */
-        __vmread(GUEST_IA32_DEBUGCTL, &msr_content);
-        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
-            return 0;
-    }
-
-    return 1;
-}
-
-static void core2_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( has_hvm_container_vcpu(v) )
-    {
-        if ( cpu_has_vmx_msr_bitmap )
-            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-
-        if ( is_hvm_vcpu(v) )
-            xfree(vpmu->context);
-
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    xfree(vpmu->priv_context);
-    vpmu->context = NULL;
-    vpmu_clear(vpmu);
-}
-
-struct arch_vpmu_ops core2_vpmu_ops = {
-    .do_wrmsr = core2_vpmu_do_wrmsr,
-    .do_rdmsr = core2_vpmu_do_rdmsr,
-    .do_interrupt = core2_vpmu_do_interrupt,
-    .do_cpuid = core2_vpmu_do_cpuid,
-    .arch_vpmu_destroy = core2_vpmu_destroy,
-    .arch_vpmu_save = core2_vpmu_save,
-    .arch_vpmu_load = core2_vpmu_load,
-    .arch_vpmu_dump = core2_vpmu_dump
-};
-
-static void core2_no_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    /*
-     * As in this case the vpmu is not enabled reset some bits in the
-     * architectural performance monitoring related part.
-     */
-    if ( input == 0xa )
-    {
-        *eax &= ~PMU_VERSION_MASK;
-        *eax &= ~PMU_GENERAL_NR_MASK;
-        *eax &= ~PMU_GENERAL_WIDTH_MASK;
-
-        *edx &= ~PMU_FIXED_NR_MASK;
-        *edx &= ~PMU_FIXED_WIDTH_MASK;
-    }
-}
-
-/*
- * If its a vpmu msr set it to 0.
- */
-static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    if ( !is_core2_vpmu_msr(msr, &type, &index) )
-        return 1;
-    *msr_content = 0;
-    return 0;
-}
-
-/*
- * These functions are used in case vpmu is not enabled.
- */
-struct arch_vpmu_ops core2_no_vpmu_ops = {
-    .do_rdmsr = core2_no_vpmu_do_rdmsr,
-    .do_cpuid = core2_no_vpmu_do_cpuid,
-};
-
-int vmx_vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    u64 msr_content;
-    static bool_t ds_warned;
-
-    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( vpmu_mode == XENPMU_MODE_OFF )
-        return 0;
-
-    if ( (arch_pmc_cnt + fixed_pmc_cnt) == 0 )
-        return -EINVAL;
-
-    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
-        goto func_out;
-    /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
-    while ( boot_cpu_has(X86_FEATURE_DS) )
-    {
-        if ( !boot_cpu_has(X86_FEATURE_DTES64) )
-        {
-            if ( !ds_warned )
-                printk(XENLOG_G_WARNING "CPU doesn't support 64-bit DS Area"
-                       " - Debug Store disabled for guests\n");
-            break;
-        }
-        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
-        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
-        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
-        {
-            /* If BTS_UNAVAIL is set reset the DS feature. */
-            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
-            if ( !ds_warned )
-                printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
-                       " - Debug Store disabled for guests\n");
-            break;
-        }
-
-        vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
-        if ( !ds_warned )
-        {
-            if ( !boot_cpu_has(X86_FEATURE_DSCPL) )
-                printk(XENLOG_G_INFO
-                       "vpmu: CPU doesn't support CPL-Qualified BTS\n");
-            printk("******************************************************\n");
-            printk("** WARNING: Emulation of BTS Feature is switched on **\n");
-            printk("** Using this processor feature in a virtualized    **\n");
-            printk("** environment is not 100%% safe.                    **\n");
-            printk("** Setting the DS buffer address with wrong values  **\n");
-            printk("** may lead to hypervisor hangs or crashes.         **\n");
-            printk("** It is NOT recommended for production use!        **\n");
-            printk("******************************************************\n");
-        }
-        break;
-    }
-    ds_warned = 1;
- func_out:
-
-    /* PV domains can allocate resources immediately */
-    if ( is_pv_vcpu(v) && !core2_vpmu_alloc_resource(v) )
-        return -EIO;
-
-    vpmu->arch_vpmu_ops = &core2_vpmu_ops;
-
-    return 0;
-}
-
-int __init core2_vpmu_init(void)
-{
-    u64 caps;
-
-    if ( current_cpu_data.x86 != 6 )
-    {
-        printk(XENLOG_WARNING "VPMU: only family 6 is supported\n");
-        return -EINVAL;
-    }
-
-    switch ( current_cpu_data.x86_model )
-    {
-        /* Core2: */
-        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
-        case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
-        case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
-        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
-
-        case 0x2a: /* SandyBridge */
-        case 0x2d: /* SandyBridge, "Romley-EP" */
-
-        /* Nehalem: */
-        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
-        case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */
-        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
-
-        /* Westmere: */
-        case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */
-        case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */
-        case 0x27: /* 32 nm Westmere-EX */
-
-        case 0x3a: /* IvyBridge */
-        case 0x3e: /* IvyBridge EP */
-
-        /* Haswell: */
-        case 0x3c:
-        case 0x3f:
-        case 0x45:
-        case 0x46:
-
-        /* future: */
-        case 0x3d:
-        case 0x4e:
-            break;
-    default:
-        printk(XENLOG_WARNING "VPMU: Unsupported CPU model %#x\n",
-               current_cpu_data.x86_model);
-        return -EINVAL;
-    }
-
-    arch_pmc_cnt = core2_get_arch_pmc_count();
-    fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    rdmsrl(MSR_IA32_PERF_CAPABILITIES, caps);
-    full_width_write = (caps >> 13) & 1;
-
-    check_pmc_quirk();
-
-    if ( sizeof(struct xen_pmu_data) + sizeof(uint64_t) * fixed_pmc_cnt +
-         sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > PAGE_SIZE )
-    {
-        printk(XENLOG_WARNING
-               "VPMU: Register bank does not fit into VPMU share page\n");
-        arch_pmc_cnt = fixed_pmc_cnt = 0;
-        return -ENOSPC;
-    }
-
-    return 0;
-}
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
deleted file mode 100644
index 71c5063..0000000
--- a/xen/arch/x86/hvm/vpmu.c
+++ /dev/null
@@ -1,791 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/event.h>
-#include <xen/guest_access.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/msr.h>
-#include <asm/nmi.h>
-#include <asm/p2m.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <asm/hvm/vpmu.h>
-#include <asm/hvm/svm/svm.h>
-#include <asm/hvm/svm/vmcb.h>
-#include <asm/apic.h>
-#include <public/pmu.h>
-#include <xsm/xsm.h>
-
-#include <compat/pmu.h>
-CHECK_pmu_params;
-CHECK_pmu_intel_ctxt;
-CHECK_pmu_amd_ctxt;
-CHECK_pmu_cntr_pair;
-CHECK_pmu_regs;
-
-/*
- * "vpmu" :     vpmu generally enabled
- * "vpmu=off" : vpmu generally disabled
- * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
- */
-static unsigned int __read_mostly opt_vpmu_enabled;
-unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
-unsigned int __read_mostly vpmu_features = 0;
-static void parse_vpmu_param(char *s);
-custom_param("vpmu", parse_vpmu_param);
-
-static DEFINE_SPINLOCK(vpmu_lock);
-static unsigned vpmu_count;
-
-static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
-
-static void __init parse_vpmu_param(char *s)
-{
-    switch ( parse_bool(s) )
-    {
-    case 0:
-        break;
-    default:
-        if ( !strcmp(s, "bts") )
-            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
-        else if ( *s )
-        {
-            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
-            break;
-        }
-        /* fall through */
-    case 1:
-        /* Default VPMU mode */
-        vpmu_mode = XENPMU_MODE_SELF;
-        opt_vpmu_enabled = 1;
-        break;
-    }
-}
-
-void vpmu_lvtpc_update(uint32_t val)
-{
-    struct vpmu_struct *vpmu;
-    struct vcpu *curr;
-
-    if ( vpmu_mode == XENPMU_MODE_OFF )
-        return;
-
-    curr = current;
-    vpmu = vcpu_vpmu(curr);
-
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-
-    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
-    if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data ||
-         !(vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
-        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-}
-
-int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
-                uint64_t supported, bool_t is_write)
-{
-    struct vcpu *curr;
-    struct vpmu_struct *vpmu;
-    const struct arch_vpmu_ops *ops;
-    int ret = 0;
-
-    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
-         ((vpmu_mode & XENPMU_MODE_ALL) &&
-          !is_hardware_domain(current->domain)) )
-        goto nop;
-
-    curr = current;
-    vpmu = vcpu_vpmu(curr);
-    ops = vpmu->arch_vpmu_ops;
-    if ( !ops )
-        goto nop;
-
-    if ( is_write && ops->do_wrmsr )
-        ret = ops->do_wrmsr(msr, *msr_content, supported);
-    else if ( !is_write && ops->do_rdmsr )
-        ret = ops->do_rdmsr(msr, msr_content);
-    else
-        goto nop;
-
-    /*
-     * We may have received a PMU interrupt while handling MSR access
-     * and since do_wr/rdmsr may load VPMU context we should save
-     * (and unload) it again.
-     */
-    if ( !is_hvm_vcpu(curr) &&
-         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
-    {
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-        ops->arch_vpmu_save(curr);
-        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-    }
-
-    return ret;
-
- nop:
-    if ( !is_write )
-        *msr_content = 0;
-
-    return 0;
-}
-
-static inline struct vcpu *choose_hwdom_vcpu(void)
-{
-    unsigned idx;
-
-    if ( hardware_domain->max_vcpus == 0 )
-        return NULL;
-
-    idx = smp_processor_id() % hardware_domain->max_vcpus;
-
-    return hardware_domain->vcpu[idx];
-}
-
-void vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *sampled = current, *sampling;
-    struct vpmu_struct *vpmu;
-
-    /*
-     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
-     * in XENPMU_MODE_ALL, for everyone.
-     */
-    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
-         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
-    {
-        sampling = choose_hwdom_vcpu();
-        if ( !sampling )
-            return;
-    }
-    else
-        sampling = sampled;
-
-    vpmu = vcpu_vpmu(sampling);
-    if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) )
-    {
-        /* PV(H) guest */
-        const struct cpu_user_regs *cur_regs;
-        uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
-        uint32_t domid;
-
-        if ( !vpmu->xenpmu_data )
-            return;
-
-        if ( is_pvh_vcpu(sampling) &&
-             !(vpmu_mode & XENPMU_MODE_ALL) &&
-             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
-            return;
-
-        if ( *flags & PMU_CACHED )
-            return;
-
-        /* PV guest will be reading PMU MSRs from xenpmu_data */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
-        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-
-        if ( has_hvm_container_vcpu(sampled) )
-            *flags = 0;
-        else
-            *flags = PMU_SAMPLE_PV;
-
-        if ( sampled == sampling )
-            domid = DOMID_SELF;
-        else
-            domid = sampled->domain->domain_id;
-
-        /* Store appropriate registers in xenpmu_data */
-        /* FIXME: 32-bit PVH should go here as well */
-        if ( is_pv_32bit_vcpu(sampling) )
-        {
-            /*
-             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
-             * and therefore we treat it the same way as a non-privileged
-             * PV 32-bit domain.
-             */
-            struct compat_pmu_regs *cmp;
-
-            cur_regs = guest_cpu_user_regs();
-
-            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
-            cmp->ip = cur_regs->rip;
-            cmp->sp = cur_regs->rsp;
-            cmp->flags = cur_regs->eflags;
-            cmp->ss = cur_regs->ss;
-            cmp->cs = cur_regs->cs;
-            if ( (cmp->cs & 3) > 1 )
-                *flags |= PMU_SAMPLE_USER;
-        }
-        else
-        {
-            struct xen_pmu_regs *r = &vpmu->xenpmu_data->pmu.r.regs;
-
-            if ( (vpmu_mode & XENPMU_MODE_SELF) )
-                cur_regs = guest_cpu_user_regs();
-            else if ( !guest_mode(regs) &&
-                      is_hardware_domain(sampling->domain) )
-            {
-                cur_regs = regs;
-                domid = DOMID_XEN;
-            }
-            else
-                cur_regs = guest_cpu_user_regs();
-
-            r->ip = cur_regs->rip;
-            r->sp = cur_regs->rsp;
-            r->flags = cur_regs->eflags;
-
-            if ( !has_hvm_container_vcpu(sampled) )
-            {
-                r->ss = cur_regs->ss;
-                r->cs = cur_regs->cs;
-                if ( !(sampled->arch.flags & TF_kernel_mode) )
-                    *flags |= PMU_SAMPLE_USER;
-            }
-            else
-            {
-                struct segment_register seg;
-
-                hvm_get_segment_register(sampled, x86_seg_cs, &seg);
-                r->cs = seg.sel;
-                hvm_get_segment_register(sampled, x86_seg_ss, &seg);
-                r->ss = seg.sel;
-                r->cpl = seg.attr.fields.dpl;
-                if ( !(sampled->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
-                    *flags |= PMU_SAMPLE_REAL;
-            }
-        }
-
-        vpmu->xenpmu_data->domain_id = domid;
-        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
-        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
-
-        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
-        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-        *flags |= PMU_CACHED;
-
-        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
-
-        return;
-    }
-
-    if ( vpmu->arch_vpmu_ops )
-    {
-        struct vlapic *vlapic = vcpu_vlapic(sampling);
-        u32 vlapic_lvtpc;
-
-        /* We don't support (yet) HVM dom0 */
-        ASSERT(sampling == sampled);
-
-        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ||
-             !is_vlapic_lvtpc_enabled(vlapic) )
-            return;
-
-        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-
-        switch ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) )
-        {
-        case APIC_MODE_FIXED:
-            vlapic_set_irq(vlapic, vlapic_lvtpc & APIC_VECTOR_MASK, 0);
-            break;
-        case APIC_MODE_NMI:
-            sampling->nmi_pending = 1;
-            break;
-        }
-    }
-}
-
-void vpmu_do_cpuid(unsigned int input,
-                   unsigned int *eax, unsigned int *ebx,
-                   unsigned int *ecx, unsigned int *edx)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid )
-        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
-}
-
-static void vpmu_save_force(void *arg)
-{
-    struct vcpu *v = (struct vcpu *)arg;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-
-    if ( vpmu->arch_vpmu_ops )
-        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
-
-    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
-
-    per_cpu(last_vcpu, smp_processor_id()) = NULL;
-}
-
-void vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-
-    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
-       return;
-
-    vpmu->last_pcpu = pcpu;
-    per_cpu(last_vcpu, pcpu) = v;
-
-    if ( vpmu->arch_vpmu_ops )
-        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-}
-
-void vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-    struct vcpu *prev = NULL;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    /* First time this VCPU is running here */
-    if ( vpmu->last_pcpu != pcpu )
-    {
-        /*
-         * Get the context from last pcpu that we ran on. Note that if another
-         * VCPU is running there it must have saved this VPCU's context before
-         * startig to run (see below).
-         * There should be no race since remote pcpu will disable interrupts
-         * before saving the context.
-         */
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        {
-            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-                             vpmu_save_force, (void *)v, 1);
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-        }
-    } 
-
-    /* Prevent forced context save from remote CPU */
-    local_irq_disable();
-
-    prev = per_cpu(last_vcpu, pcpu);
-
-    if ( prev != v && prev )
-    {
-        vpmu = vcpu_vpmu(prev);
-
-        /* Someone ran here before us */
-        vpmu_save_force(prev);
-        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-        vpmu = vcpu_vpmu(v);
-    }
-
-    local_irq_enable();
-
-    /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
-         (!is_hvm_vcpu(vpmu_vcpu(vpmu)) &&
-          (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED)) )
-        return;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
-    {
-        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-        /* Arch code needs to set VPMU_CONTEXT_LOADED */
-        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
-    }
-}
-
-void vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t vendor = current_cpu_data.x86_vendor;
-    int ret;
-
-    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
-    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
-    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
-    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
-
-    ASSERT(!vpmu->flags && !vpmu->context);
-
-    if ( v->domain != hardware_domain )
-    {
-        spin_lock(&vpmu_lock);
-        vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
-        spin_unlock(&vpmu_lock);
-    }
-
-    switch ( vendor )
-    {
-    case X86_VENDOR_AMD:
-        ret = svm_vpmu_initialise(v);
-        break;
-
-    case X86_VENDOR_INTEL:
-        ret = vmx_vpmu_initialise(v);
-        break;
-
-    default:
-        if ( vpmu_mode != XENPMU_MODE_OFF )
-        {
-            printk(XENLOG_G_WARNING "VPMU: Unknown CPU vendor %d. "
-                   "Disabling VPMU\n", vendor);
-            opt_vpmu_enabled = 0;
-            vpmu_mode = XENPMU_MODE_OFF;
-        }
-        return; /* Don't bother restoring vpmu_count, VPMU is off forever */
-    }
-
-    if ( ret )
-        printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
-
-    /* Intel needs to initialize VPMU ops even if VPMU is not in use */
-    if ( (v->domain != hardware_domain) &&
-         (ret || (vpmu_mode == XENPMU_MODE_OFF)) )
-    {
-        spin_lock(&vpmu_lock);
-        vpmu_count--;
-        spin_unlock(&vpmu_lock);
-    }
-}
-
-static void vpmu_clear_last(void *arg)
-{
-    if ( this_cpu(last_vcpu) == arg )
-        this_cpu(last_vcpu) = NULL;
-}
-
-void vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    /*
-     * Need to clear last_vcpu in case it points to v.
-     * We can check here non-atomically whether it is 'v' since
-     * last_vcpu can never become 'v' again at this point.
-     * We will test it again in vpmu_clear_last() with interrupts
-     * disabled to make sure we don't clear someone else.
-     */
-    if ( per_cpu(last_vcpu, vpmu->last_pcpu) == v )
-        on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-                         vpmu_clear_last, v, 1);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
-    {
-        /* Unload VPMU first. This will stop counters */
-        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
-                         vpmu_save_force, v, 1);
-         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
-    }
-
-    spin_lock(&vpmu_lock);
-    if ( v->domain != hardware_domain )
-        vpmu_count--;
-    spin_unlock(&vpmu_lock);
-}
-
-static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
-{
-    struct vcpu *v;
-    struct vpmu_struct *vpmu;
-    struct page_info *page;
-    uint64_t gfn = params->val;
-
-    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
-         ((vpmu_mode & XENPMU_MODE_ALL) && !is_hardware_domain(d)) )
-        return -EINVAL;
-
-    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
-         (d->vcpu[params->vcpu] == NULL) )
-        return -EINVAL;
-
-    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
-    if ( !page )
-        return -EINVAL;
-
-    if ( !get_page_type(page, PGT_writable_page) )
-    {
-        put_page(page);
-        return -EINVAL;
-    }
-
-    v = d->vcpu[params->vcpu];
-    vpmu = vcpu_vpmu(v);
-
-    spin_lock(&vpmu->vpmu_lock);
-
-    if ( v->arch.vpmu.xenpmu_data )
-    {
-        put_page_and_type(page);
-        spin_unlock(&vpmu->vpmu_lock);
-        return -EEXIST;
-    }
-
-    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
-    if ( !v->arch.vpmu.xenpmu_data )
-    {
-        put_page_and_type(page);
-        spin_unlock(&vpmu->vpmu_lock);
-        return -ENOMEM;
-    }
-
-    vpmu_initialise(v);
-
-    spin_unlock(&vpmu->vpmu_lock);
-
-    return 0;
-}
-
-static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
-{
-    struct vcpu *v;
-    struct vpmu_struct *vpmu;
-    uint64_t mfn;
-
-    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
-         (d->vcpu[params->vcpu] == NULL) )
-        return;
-
-    v = d->vcpu[params->vcpu];
-    if ( v != current )
-        vcpu_pause(v);
-
-    vpmu = vcpu_vpmu(v);
-    spin_lock(&vpmu->vpmu_lock);
-
-    vpmu_destroy(v);
-
-    if ( v->arch.vpmu.xenpmu_data )
-    {
-        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
-        ASSERT(mfn != 0);
-        unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
-        put_page_and_type(mfn_to_page(mfn));
-        v->arch.vpmu.xenpmu_data = NULL;
-    }
-
-    spin_unlock(&vpmu->vpmu_lock);
-
-    if ( v != current )
-        vcpu_unpause(v);
-}
-
-/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
-void vpmu_dump(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump )
-        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
-}
-
-long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
-{
-    int ret;
-    struct vcpu *curr;
-    struct xen_pmu_params pmu_params = {.val = 0};
-    struct xen_pmu_data *xenpmu_data;
-
-    if ( !opt_vpmu_enabled )
-        return -EOPNOTSUPP;
-
-    ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
-    if ( ret )
-        return ret;
-
-    /* Check major version when parameters are specified */
-    switch ( op )
-    {
-    case XENPMU_mode_set:
-    case XENPMU_feature_set:
-    case XENPMU_init:
-    case XENPMU_finish:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        if ( pmu_params.version.maj != XENPMU_VER_MAJ )
-            return -EINVAL;
-    }
-
-    switch ( op )
-    {
-    case XENPMU_mode_set:
-    {
-        if ( (pmu_params.val &
-              ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
-             (hweight64(pmu_params.val) > 1) )
-            return -EINVAL;
-
-        /* 32-bit dom0 can only sample itself. */
-        if ( is_pv_32bit_vcpu(current) &&
-             (pmu_params.val & (XENPMU_MODE_HV | XENPMU_MODE_ALL)) )
-            return -EINVAL;
-
-        spin_lock(&vpmu_lock);
-
-        /*
-         * We can always safely switch between XENPMU_MODE_SELF and
-         * XENPMU_MODE_HV while other VPMUs are active.
-         */
-        if ( (vpmu_count == 0) || (vpmu_mode == pmu_params.val) ||
-             ((vpmu_mode ^ pmu_params.val) ==
-              (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
-            vpmu_mode = pmu_params.val;
-        else
-        {
-            printk(XENLOG_WARNING "VPMU: Cannot change mode while"
-                                  " active VPMUs exist\n");
-            ret = -EBUSY;
-        }
-
-        spin_unlock(&vpmu_lock);
-
-        break;
-    }
-
-    case XENPMU_mode_get:
-        memset(&pmu_params, 0, sizeof(pmu_params));
-        pmu_params.val = vpmu_mode;
-
-        pmu_params.version.maj = XENPMU_VER_MAJ;
-        pmu_params.version.min = XENPMU_VER_MIN;
-
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-
-        break;
-
-    case XENPMU_feature_set:
-        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
-            return -EINVAL;
-
-        spin_lock(&vpmu_lock);
-
-        if ( vpmu_count == 0 )
-            vpmu_features = pmu_params.val;
-        else
-        {
-            printk(XENLOG_WARNING "VPMU: Cannot change features while"
-                                  " active VPMUs exist\n");
-            ret = -EBUSY;
-        }
-
-        spin_unlock(&vpmu_lock);
-
-        break;
-
-    case XENPMU_feature_get:
-        pmu_params.val = vpmu_features;
-        if ( copy_field_to_guest(arg, &pmu_params, val) )
-            return -EFAULT;
-
-        break;
-
-    case XENPMU_init:
-        ret = pvpmu_init(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_finish:
-        pvpmu_finish(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_lvtpc_set:
-        curr = current;
-        xenpmu_data = curr->arch.vpmu.xenpmu_data;
-        if ( xenpmu_data == NULL )
-            return -EINVAL;
-        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
-        break;
-
-    case XENPMU_flush:
-        curr = current;
-        xenpmu_data = curr->arch.vpmu.xenpmu_data;
-        if ( xenpmu_data == NULL )
-            return -EINVAL;
-        xenpmu_data->pmu.pmu_flags &= ~PMU_CACHED;
-        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
-        vpmu_load(curr);
-        break;
-
-    default:
-        ret = -EINVAL;
-    }
-
-    return ret;
-}
-
-static int __init vpmu_init(void)
-{
-    int vendor = current_cpu_data.x86_vendor;
-
-    if ( !opt_vpmu_enabled )
-    {
-        printk(XENLOG_INFO "VPMU: disabled\n");
-        return 0;
-    }
-
-    /* NMI watchdog uses LVTPC and HW counter */
-    if ( opt_watchdog && opt_vpmu_enabled )
-    {
-        printk(XENLOG_WARNING "NMI watchdog is enabled. Turning VPMU off.\n");
-        opt_vpmu_enabled = 0;
-        vpmu_mode = XENPMU_MODE_OFF;
-        return 0;
-    }
-
-    switch ( vendor )
-    {
-    case X86_VENDOR_AMD:
-        if ( amd_vpmu_init() )
-           vpmu_mode = XENPMU_MODE_OFF;
-        break;
-    case X86_VENDOR_INTEL:
-        if ( core2_vpmu_init() )
-           vpmu_mode = XENPMU_MODE_OFF;
-        break;
-    default:
-        printk(XENLOG_WARNING "VPMU: Unknown CPU vendor: %d. "
-               "Turning VPMU off.\n", vendor);
-        vpmu_mode = XENPMU_MODE_OFF;
-        break;
-    }
-
-    if ( vpmu_mode != XENPMU_MODE_OFF )
-        printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "."
-               __stringify(XENPMU_VER_MIN) "\n");
-    else
-        opt_vpmu_enabled = 0;
-
-    return 0;
-}
-__initcall(vpmu_init);
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index ca429a1..89649d0 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -19,7 +19,7 @@
 #include <asm/processor.h>
 #include <asm/regs.h>
 #include <asm/current.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 8a40deb..f861243 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,7 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6fce6aa..9f2e904 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -19,8 +19,8 @@
 #ifndef __ASM_X86_HVM_VMX_VMCS_H__
 #define __ASM_X86_HVM_VMX_VMCS_H__
 
+#include <asm/vpmu.h>
 #include <asm/hvm/io.h>
-#include <asm/hvm/vpmu.h>
 #include <irq_vectors.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
deleted file mode 100644
index 63851a7..0000000
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ /dev/null
@@ -1,143 +0,0 @@
-/*
- * vpmu.h: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_H_
-#define __ASM_X86_HVM_VPMU_H_
-
-#include <public/pmu.h>
-
-#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
-#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
-
-#define MSR_TYPE_COUNTER            0
-#define MSR_TYPE_CTRL               1
-#define MSR_TYPE_GLOBAL             2
-#define MSR_TYPE_ARCH_COUNTER       3
-#define MSR_TYPE_ARCH_CTRL          4
-
-/* Start of PMU register bank */
-#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
-                                                 (uintptr_t)ctxt->offset))
-
-/* Arch specific operations shared by all vpmus */
-struct arch_vpmu_ops {
-    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content,
-                    uint64_t supported);
-    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-    int (*do_interrupt)(struct cpu_user_regs *regs);
-    void (*do_cpuid)(unsigned int input,
-                     unsigned int *eax, unsigned int *ebx,
-                     unsigned int *ecx, unsigned int *edx);
-    void (*arch_vpmu_destroy)(struct vcpu *v);
-    int (*arch_vpmu_save)(struct vcpu *v);
-    void (*arch_vpmu_load)(struct vcpu *v);
-    void (*arch_vpmu_dump)(const struct vcpu *);
-};
-
-int core2_vpmu_init(void);
-int vmx_vpmu_initialise(struct vcpu *);
-int amd_vpmu_init(void);
-int svm_vpmu_initialise(struct vcpu *);
-
-struct vpmu_struct {
-    u32 flags;
-    u32 last_pcpu;
-    u32 hw_lapic_lvtpc;
-    void *context;      /* May be shared with PV guest */
-    void *priv_context; /* hypervisor-only */
-    struct arch_vpmu_ops *arch_vpmu_ops;
-    struct xen_pmu_data *xenpmu_data;
-    spinlock_t vpmu_lock;
-};
-
-/* VPMU states */
-#define VPMU_CONTEXT_ALLOCATED              0x1
-#define VPMU_CONTEXT_LOADED                 0x2
-#define VPMU_RUNNING                        0x4
-#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
-#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
-#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
-
-static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
-{
-    vpmu->flags |= mask;
-}
-static inline void vpmu_reset(struct vpmu_struct *vpmu, const u32 mask)
-{
-    vpmu->flags &= ~mask;
-}
-static inline void vpmu_clear(struct vpmu_struct *vpmu)
-{
-    vpmu->flags = 0;
-}
-static inline bool_t vpmu_is_set(const struct vpmu_struct *vpmu, const u32 mask)
-{
-    return !!(vpmu->flags & mask);
-}
-static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
-                                      const u32 mask)
-{
-    return !!((vpmu->flags & mask) == mask);
-}
-
-void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
-                uint64_t supported, bool_t is_write);
-void vpmu_do_interrupt(struct cpu_user_regs *regs);
-void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
-                                       unsigned int *ecx, unsigned int *edx);
-void vpmu_initialise(struct vcpu *v);
-void vpmu_destroy(struct vcpu *v);
-void vpmu_save(struct vcpu *v);
-void vpmu_load(struct vcpu *v);
-void vpmu_dump(struct vcpu *v);
-
-static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
-                                uint64_t supported)
-{
-    return vpmu_do_msr(msr, &msr_content, supported, 1);
-}
-static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    return vpmu_do_msr(msr, msr_content, 0, 0);
-}
-
-extern int acquire_pmu_ownership(int pmu_ownership);
-extern void release_pmu_ownership(int pmu_ownership);
-
-extern unsigned int vpmu_mode;
-extern unsigned int vpmu_features;
-
-/* Context switch */
-static inline void vpmu_switch_from(struct vcpu *prev)
-{
-    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
-        vpmu_save(prev);
-}
-
-static inline void vpmu_switch_to(struct vcpu *next)
-{
-    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
-        vpmu_load(next);
-}
-
-#endif /* __ASM_X86_HVM_VPMU_H_*/
-
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
new file mode 100644
index 0000000..63851a7
--- /dev/null
+++ b/xen/include/asm-x86/vpmu.h
@@ -0,0 +1,143 @@
+/*
+ * vpmu.h: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#ifndef __ASM_X86_HVM_VPMU_H_
+#define __ASM_X86_HVM_VPMU_H_
+
+#include <public/pmu.h>
+
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
+
+#define MSR_TYPE_COUNTER            0
+#define MSR_TYPE_CTRL               1
+#define MSR_TYPE_GLOBAL             2
+#define MSR_TYPE_ARCH_COUNTER       3
+#define MSR_TYPE_ARCH_CTRL          4
+
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
+
+/* Arch specific operations shared by all vpmus */
+struct arch_vpmu_ops {
+    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content,
+                    uint64_t supported);
+    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
+    int (*do_interrupt)(struct cpu_user_regs *regs);
+    void (*do_cpuid)(unsigned int input,
+                     unsigned int *eax, unsigned int *ebx,
+                     unsigned int *ecx, unsigned int *edx);
+    void (*arch_vpmu_destroy)(struct vcpu *v);
+    int (*arch_vpmu_save)(struct vcpu *v);
+    void (*arch_vpmu_load)(struct vcpu *v);
+    void (*arch_vpmu_dump)(const struct vcpu *);
+};
+
+int core2_vpmu_init(void);
+int vmx_vpmu_initialise(struct vcpu *);
+int amd_vpmu_init(void);
+int svm_vpmu_initialise(struct vcpu *);
+
+struct vpmu_struct {
+    u32 flags;
+    u32 last_pcpu;
+    u32 hw_lapic_lvtpc;
+    void *context;      /* May be shared with PV guest */
+    void *priv_context; /* hypervisor-only */
+    struct arch_vpmu_ops *arch_vpmu_ops;
+    struct xen_pmu_data *xenpmu_data;
+    spinlock_t vpmu_lock;
+};
+
+/* VPMU states */
+#define VPMU_CONTEXT_ALLOCATED              0x1
+#define VPMU_CONTEXT_LOADED                 0x2
+#define VPMU_RUNNING                        0x4
+#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
+#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
+#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
+
+static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
+{
+    vpmu->flags |= mask;
+}
+static inline void vpmu_reset(struct vpmu_struct *vpmu, const u32 mask)
+{
+    vpmu->flags &= ~mask;
+}
+static inline void vpmu_clear(struct vpmu_struct *vpmu)
+{
+    vpmu->flags = 0;
+}
+static inline bool_t vpmu_is_set(const struct vpmu_struct *vpmu, const u32 mask)
+{
+    return !!(vpmu->flags & mask);
+}
+static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
+                                      const u32 mask)
+{
+    return !!((vpmu->flags & mask) == mask);
+}
+
+void vpmu_lvtpc_update(uint32_t val);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+                uint64_t supported, bool_t is_write);
+void vpmu_do_interrupt(struct cpu_user_regs *regs);
+void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
+                                       unsigned int *ecx, unsigned int *edx);
+void vpmu_initialise(struct vcpu *v);
+void vpmu_destroy(struct vcpu *v);
+void vpmu_save(struct vcpu *v);
+void vpmu_load(struct vcpu *v);
+void vpmu_dump(struct vcpu *v);
+
+static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+                                uint64_t supported)
+{
+    return vpmu_do_msr(msr, &msr_content, supported, 1);
+}
+static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    return vpmu_do_msr(msr, msr_content, 0, 0);
+}
+
+extern int acquire_pmu_ownership(int pmu_ownership);
+extern void release_pmu_ownership(int pmu_ownership);
+
+extern unsigned int vpmu_mode;
+extern unsigned int vpmu_features;
+
+/* Context switch */
+static inline void vpmu_switch_from(struct vcpu *prev)
+{
+    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        vpmu_save(prev);
+}
+
+static inline void vpmu_switch_to(struct vcpu *next)
+{
+    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        vpmu_load(next);
+}
+
+#endif /* __ASM_X86_HVM_VPMU_H_*/
+
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called
  2015-03-17 14:53 ` [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called Boris Ostrovsky
@ 2015-03-19  8:43   ` Dietmar Hahn
  0 siblings, 0 replies; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-19  8:43 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, JBeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Am Dienstag 17 März 2015, 10:53:58 schrieb Boris Ostrovsky:
> We don't need to try to destroy it since it can't be already allocated at the
> time we try to initialize it.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>

Seems to be already commited.

Dietmar.


> ---
> 
> Changes in v19:
> * Removed unnecesary test for VPMU_CONTEXT_ALLOCATED in svm/vpmu.c
> 
>  xen/arch/x86/hvm/svm/vpmu.c | 3 ---
>  xen/arch/x86/hvm/vpmu.c     | 5 +----
>  2 files changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 64dc167..6764070 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -359,9 +359,6 @@ static int amd_vpmu_initialise(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
>  
> -    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
> -        return 0;
> -
>      if ( counters == NULL )
>      {
>           switch ( family )
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 0e6b6c0..c3273ee 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -236,10 +236,7 @@ void vpmu_initialise(struct vcpu *v)
>      if ( is_pvh_vcpu(v) )
>          return;
>  
> -    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
> -        vpmu_destroy(v);
> -    vpmu_clear(vpmu);
> -    vpmu->context = NULL;
> +    ASSERT(!vpmu->flags && !vpmu->context);
>  
>      switch ( vendor )
>      {
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags
  2015-03-17 14:54 ` [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2015-03-19 13:12   ` Dietmar Hahn
  2015-03-19 13:28     ` Jan Beulich
  2015-03-24 13:56   ` Jan Beulich
  1 sibling, 1 reply; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-19 13:12 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, JBeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Am Dienstag 17 März 2015, 10:54:02 schrieb Boris Ostrovsky:
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * XENPMU_MODE_OFF:  PMU is not virtualized
> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
>   can profile itself and the hypervisor.
> 
> Note that PMU modes are different from what can be provided at Xen's boot line
> with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
> Any other value, on the other hand, will cause VPMU mode to be set to
> XENPMU_MODE_SELF during boot.
> 
> For feature flags only Intel's BTS is currently supported.
> 
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

Only two minor comments below.

> ---
> 
> Changes in v19:
> * Keep track of active vpmu count and allow certain mode changes only when the count
>   is zero
> * Drop vpmu_unload routines
> * Revert to to using opt_vpmu_enabled
> * Changes to oprofile code are no longer needed
> * Changes to vmcs.c are no longer needed
> * Simplified vpmu_switch_from/to inlines
> 
>  tools/flask/policy/policy/modules/xen/xen.te |   3 +
>  xen/arch/x86/domain.c                        |   4 +-
>  xen/arch/x86/hvm/svm/vpmu.c                  |   4 +-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c            |  10 +-
>  xen/arch/x86/hvm/vpmu.c                      | 155 +++++++++++++++++++++++++--
>  xen/arch/x86/x86_64/compat/entry.S           |   4 +
>  xen/arch/x86/x86_64/entry.S                  |   4 +
>  xen/include/asm-x86/hvm/vpmu.h               |  27 +++--
>  xen/include/public/pmu.h                     |  45 ++++++++
>  xen/include/public/xen.h                     |   1 +
>  xen/include/xen/hypercall.h                  |   4 +
>  xen/include/xlat.lst                         |   1 +
>  xen/include/xsm/dummy.h                      |  15 +++
>  xen/include/xsm/xsm.h                        |   6 ++
>  xen/xsm/dummy.c                              |   1 +
>  xen/xsm/flask/hooks.c                        |  18 ++++
>  xen/xsm/flask/policy/access_vectors          |   2 +
>  17 files changed, 279 insertions(+), 25 deletions(-)
> 
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
> index c0128aa..870ff81 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -68,6 +68,9 @@ allow dom0_t xen_t:xen2 {
>      resource_op
>      psr_cmt_op
>  };
> +allow dom0_t xen_t:xen2 {
> +    pmu_ctrl
> +};
>  allow dom0_t xen_t:mmu memorymap;
>  
>  # Allow dom0 to use these domctls on itself. For domctls acting on other
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 21f0766..60d9a80 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1536,7 +1536,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>      if ( is_hvm_vcpu(prev) )
>      {
>          if (prev != next)
> -            vpmu_save(prev);
> +            vpmu_switch_from(prev);
>  
>          if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>              pt_save_timer(prev);
> @@ -1581,7 +1581,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>  
>      if (is_hvm_vcpu(next) && (prev != next) )
>          /* Must be done with interrupts enabled */
> -        vpmu_load(next);
> +        vpmu_switch_to(next);
>  
>      context_saved(prev);
>  
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index a8b79df..481ea7b 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -472,14 +472,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
>      .arch_vpmu_dump = amd_vpmu_dump
>  };
>  
> -int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> +int svm_vpmu_initialise(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
>      int ret = 0;
>  
>      /* vpmu enabled? */
> -    if ( !vpmu_flags )
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>          return 0;
>  
>      switch ( family )
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index c2405bf..6280644 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -708,13 +708,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
>      return 1;
>  }
>  
> -static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> +static int core2_vpmu_initialise(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      u64 msr_content;
>      static bool_t ds_warned;
>  
> -    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
> +    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
>          goto func_out;
>      /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
>      while ( boot_cpu_has(X86_FEATURE_DS) )
> @@ -826,7 +826,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
>      .do_cpuid = core2_no_vpmu_do_cpuid,
>  };
>  
> -int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> +int vmx_vpmu_initialise(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
> @@ -834,7 +834,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>      int ret = 0;
>  
>      vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
> -    if ( !vpmu_flags )
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>          return 0;
>  
>      if ( family == 6 )
> @@ -877,7 +877,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>          /* future: */
>          case 0x3d:
>          case 0x4e:
> -            ret = core2_vpmu_initialise(v, vpmu_flags);
> +            ret = core2_vpmu_initialise(v);
>              if ( !ret )
>                  vpmu->arch_vpmu_ops = &core2_vpmu_ops;
>              return ret;
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index ef88528..164accf 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -21,6 +21,8 @@
>  #include <xen/config.h>
>  #include <xen/sched.h>
>  #include <xen/xenoprof.h>
> +#include <xen/event.h>
> +#include <xen/guest_access.h>
>  #include <asm/regs.h>
>  #include <asm/types.h>
>  #include <asm/msr.h>
> @@ -33,8 +35,10 @@
>  #include <asm/hvm/svm/vmcb.h>
>  #include <asm/apic.h>
>  #include <public/pmu.h>
> +#include <xsm/xsm.h>
>  
>  #include <compat/pmu.h>
> +CHECK_pmu_params;
>  CHECK_pmu_intel_ctxt;
>  CHECK_pmu_amd_ctxt;
>  CHECK_pmu_cntr_pair;
> @@ -46,9 +50,14 @@ CHECK_pmu_regs;
>   * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
>   */
>  static unsigned int __read_mostly opt_vpmu_enabled;
> +unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
> +unsigned int __read_mostly vpmu_features = 0;
>  static void parse_vpmu_param(char *s);
>  custom_param("vpmu", parse_vpmu_param);
>  
> +static DEFINE_SPINLOCK(vpmu_lock);
> +static unsigned vpmu_count;
> +
>  static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
>  
>  static void __init parse_vpmu_param(char *s)
> @@ -59,7 +68,7 @@ static void __init parse_vpmu_param(char *s)
>          break;
>      default:
>          if ( !strcmp(s, "bts") )
> -            opt_vpmu_enabled |= VPMU_BOOT_BTS;
> +            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
>          else if ( *s )
>          {
>              printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
> @@ -67,7 +76,9 @@ static void __init parse_vpmu_param(char *s)
>          }
>          /* fall through */
>      case 1:
> -        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
> +        /* Default VPMU mode */
> +        vpmu_mode = XENPMU_MODE_SELF;
> +        opt_vpmu_enabled = 1;
>          break;
>      }
>  }
> @@ -76,7 +87,7 @@ void vpmu_lvtpc_update(uint32_t val)
>  {
>      struct vpmu_struct *vpmu;
>  
> -    if ( !opt_vpmu_enabled )
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>          return;
>  
>      vpmu = vcpu_vpmu(current);
> @@ -89,6 +100,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>  
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
> +        return 0;
> +
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
>          return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
>      return 0;
> @@ -98,6 +112,12 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>  
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
> +    {
> +        *msr_content = 0;
> +        return 0;
> +    }
> +
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
>          return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
>      return 0;
> @@ -248,28 +268,41 @@ void vpmu_initialise(struct vcpu *v)
>  
>      ASSERT(!vpmu->flags && !vpmu->context);
>  
> +    spin_lock(&vpmu_lock);
> +    vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
> +    spin_unlock(&vpmu_lock);
> +
>      switch ( vendor )
>      {
>      case X86_VENDOR_AMD:
> -        ret = svm_vpmu_initialise(v, opt_vpmu_enabled);
> +        ret = svm_vpmu_initialise(v);
>          break;
>  
>      case X86_VENDOR_INTEL:
> -        ret = vmx_vpmu_initialise(v, opt_vpmu_enabled);
> +        ret = vmx_vpmu_initialise(v);
>          break;
>  
>      default:
> -        if ( opt_vpmu_enabled )
> +        if ( vpmu_mode != XENPMU_MODE_OFF )
>          {
>              printk(XENLOG_G_WARNING "VPMU: Unknown CPU vendor %d. "
>                     "Disabling VPMU\n", vendor);
>              opt_vpmu_enabled = 0;
> +            vpmu_mode = XENPMU_MODE_OFF;
>          }
> -        return;
> +        return; /* Don't bother restoring vpmu_count, VPMU is off forever */
>      }
>  
>      if ( ret )
>          printk(XENLOG_G_WARNING "VPMU: Initialization failed for %pv\n", v);
> +
> +    /* Intel needs to initialize VPMU ops even if VPMU is not in use */
> +    if ( ret || (vpmu_mode == XENPMU_MODE_OFF) )
> +    {
> +        spin_lock(&vpmu_lock);
> +        vpmu_count--;
> +        spin_unlock(&vpmu_lock);
> +    }
>  }
>  
>  static void vpmu_clear_last(void *arg)
> @@ -298,6 +331,10 @@ void vpmu_destroy(struct vcpu *v)
>  
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
>          vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> +
> +    spin_lock(&vpmu_lock);
> +    vpmu_count--;
> +    spin_unlock(&vpmu_lock);
>  }
>  
>  /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
> @@ -309,6 +346,109 @@ void vpmu_dump(struct vcpu *v)
>          vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
>  }
>  
> +long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> +{
> +    int ret;
> +    struct xen_pmu_params pmu_params = {.val = 0};
> +
> +    if ( !opt_vpmu_enabled )
> +        return -EOPNOTSUPP;
> +
> +    ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
> +    if ( ret )
> +        return ret;
> +
> +    /* Check major version when parameters are specified */
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    case XENPMU_feature_set:
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( pmu_params.version.maj != XENPMU_VER_MAJ )
> +            return -EINVAL;
> +    }
> +
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    {
> +        if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) ||
> +             (hweight64(pmu_params.val) > 1) )
> +            return -EINVAL;
> +
> +        /* 32-bit dom0 can only sample itself. */
> +        if ( is_pv_32bit_vcpu(current) && (pmu_params.val & XENPMU_MODE_HV) )
> +            return -EINVAL;
> +
> +        spin_lock(&vpmu_lock);
> +
> +        /*
> +         * We can always safely switch between XENPMU_MODE_SELF and
> +         * XENPMU_MODE_HV while other VPMUs are active.
> +         */
> +        if ( (vpmu_count == 0) || (vpmu_mode == pmu_params.val) ||
> +             ((vpmu_mode ^ pmu_params.val) ==
> +              (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
> +            vpmu_mode = pmu_params.val;
> +        else
> +        {
> +            printk(XENLOG_WARNING "VPMU: Cannot change mode while"
> +                                  " active VPMUs exist\n");
> +            ret = -EBUSY;
> +        }
> +
> +        spin_unlock(&vpmu_lock);
> +
> +        break;
> +    }
> +
> +    case XENPMU_mode_get:
> +        memset(&pmu_params, 0, sizeof(pmu_params));
> +        pmu_params.val = vpmu_mode;
> +
> +        pmu_params.version.maj = XENPMU_VER_MAJ;
> +        pmu_params.version.min = XENPMU_VER_MIN;
> +
> +        if ( copy_to_guest(arg, &pmu_params, 1) )
> +            return -EFAULT;
> +
> +        break;
> +
> +    case XENPMU_feature_set:
> +        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
> +            return -EINVAL;
> +
> +        spin_lock(&vpmu_lock);
> +
> +        if ( vpmu_count == 0 )
> +            vpmu_features = pmu_params.val;
> +        else
> +        {
> +            printk(XENLOG_WARNING "VPMU: Cannot change features while"
> +                                  " active VPMUs exist\n");
> +            ret = -EBUSY;
> +        }
> +
> +        spin_unlock(&vpmu_lock);
> +
> +        break;
> +
> +    case XENPMU_feature_get:

memset(&pmu_params, 0, sizeof(pmu_params));  ?

> +        pmu_params.val = vpmu_features;
> +        if ( copy_field_to_guest(arg, &pmu_params, val) )
> +            return -EFAULT;
> +
> +        break;
> +
> +    default:
> +        ret = -EINVAL;
> +    }
> +
> +    return ret;
> +}
> +
>  static int __init vpmu_init(void)
>  {
>      /* NMI watchdog uses LVTPC and HW counter */
> @@ -316,6 +456,7 @@ static int __init vpmu_init(void)
>      {
>          printk(XENLOG_WARNING "NMI watchdog is enabled. Turning VPMU off.\n");
>          opt_vpmu_enabled = 0;
> +        vpmu_mode = XENPMU_MODE_OFF;

Maybe not needed because of the static initializing?

Dietmar.

>      }
>  
>      return 0;
> diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
> index 5b0af61..7691a79 100644
> --- a/xen/arch/x86/x86_64/compat/entry.S
> +++ b/xen/arch/x86/x86_64/compat/entry.S
> @@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
>          .quad do_domctl
>          .quad compat_kexec_op
>          .quad do_tmem_op
> +        .quad do_ni_hypercall           /* reserved for XenClient */
> +        .quad do_xenpmu_op              /* 40 */
>          .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
>          .quad compat_ni_hypercall
>          .endr
> @@ -466,6 +468,8 @@ ENTRY(compat_hypercall_args_table)
>          .byte 1 /* do_domctl                */
>          .byte 2 /* compat_kexec_op          */
>          .byte 1 /* do_tmem_op               */
> +        .byte 0 /* reserved for XenClient   */
> +        .byte 2 /* do_xenpmu_op             */  /* 40 */
>          .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
>          .byte 0 /* compat_ni_hypercall      */
>          .endr
> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
> index 2d25d57..403e97c 100644
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -764,6 +764,8 @@ ENTRY(hypercall_table)
>          .quad do_domctl
>          .quad do_kexec_op
>          .quad do_tmem_op
> +        .quad do_ni_hypercall       /* reserved for XenClient */
> +        .quad do_xenpmu_op          /* 40 */
>          .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
>          .quad do_ni_hypercall
>          .endr
> @@ -813,6 +815,8 @@ ENTRY(hypercall_args_table)
>          .byte 1 /* do_domctl            */
>          .byte 2 /* do_kexec             */
>          .byte 1 /* do_tmem_op           */
> +        .byte 0 /* reserved for XenClient */
> +        .byte 2 /* do_xenpmu_op         */  /* 40 */
>          .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
>          .byte 0 /* do_ni_hypercall      */
>          .endr
> diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
> index 82bfa0e..88ffc19 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -24,13 +24,6 @@
>  
>  #include <public/pmu.h>
>  
> -/*
> - * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
> - * See arch/x86/hvm/vpmu.c.
> - */
> -#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
> -#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
> -
>  #define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
>  #define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
>  
> @@ -59,8 +52,8 @@ struct arch_vpmu_ops {
>      void (*arch_vpmu_dump)(const struct vcpu *);
>  };
>  
> -int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
> -int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
> +int vmx_vpmu_initialise(struct vcpu *);
> +int svm_vpmu_initialise(struct vcpu *);
>  
>  struct vpmu_struct {
>      u32 flags;
> @@ -116,5 +109,21 @@ void vpmu_dump(struct vcpu *v);
>  extern int acquire_pmu_ownership(int pmu_ownership);
>  extern void release_pmu_ownership(int pmu_ownership);
>  
> +extern unsigned int vpmu_mode;
> +extern unsigned int vpmu_features;
> +
> +/* Context switch */
> +static inline void vpmu_switch_from(struct vcpu *prev)
> +{
> +    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
> +        vpmu_save(prev);
> +}
> +
> +static inline void vpmu_switch_to(struct vcpu *next)
> +{
> +    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
> +        vpmu_load(next);
> +}
> +
>  #endif /* __ASM_X86_HVM_VPMU_H_*/
>  
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index f97106d..66cc494 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -13,6 +13,51 @@
>  #define XENPMU_VER_MAJ    0
>  #define XENPMU_VER_MIN    1
>  
> +/*
> + * ` enum neg_errnoval
> + * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
> + *
> + * @cmd  == XENPMU_* (PMU operation)
> + * @args == struct xenpmu_params
> + */
> +/* ` enum xenpmu_op { */
> +#define XENPMU_mode_get        0 /* Also used for getting PMU version */
> +#define XENPMU_mode_set        1
> +#define XENPMU_feature_get     2
> +#define XENPMU_feature_set     3
> +/* ` } */
> +
> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
> +struct xen_pmu_params {
> +    /* IN/OUT parameters */
> +    struct {
> +        uint32_t maj;
> +        uint32_t min;
> +    } version;
> +    uint64_t val;
> +
> +    /* IN parameters */
> +    uint32_t vcpu;
> +    uint32_t pad;
> +};
> +typedef struct xen_pmu_params xen_pmu_params_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
> +
> +/* PMU modes:
> + * - XENPMU_MODE_OFF:   No PMU virtualization
> + * - XENPMU_MODE_SELF:  Guests can profile themselves
> + * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
> + *                      itself and Xen
> + */
> +#define XENPMU_MODE_OFF           0
> +#define XENPMU_MODE_SELF          (1<<0)
> +#define XENPMU_MODE_HV            (1<<1)
> +
> +/*
> + * PMU features:
> + * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
> + */
> +#define XENPMU_FEATURE_INTEL_BTS  1
>  
>  /* Shared between hypervisor and PV domain */
>  struct xen_pmu_data {
> diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> index 3703c39..0dd3c97 100644
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  #define __HYPERVISOR_kexec_op             37
>  #define __HYPERVISOR_tmem_op              38
>  #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
> +#define __HYPERVISOR_xenpmu_op            40
>  
>  /* Architecture-specific hypercall definitions. */
>  #define __HYPERVISOR_arch_0               48
> diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
> index eda8a36..ef665db 100644
> --- a/xen/include/xen/hypercall.h
> +++ b/xen/include/xen/hypercall.h
> @@ -14,6 +14,7 @@
>  #include <public/event_channel.h>
>  #include <public/tmem.h>
>  #include <public/version.h>
> +#include <public/pmu.h>
>  #include <asm/hypercall.h>
>  #include <xsm/xsm.h>
>  
> @@ -144,6 +145,9 @@ do_tmem_op(
>  extern long
>  do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
>  
> +extern long
> +do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
> +
>  #ifdef CONFIG_COMPAT
>  
>  extern int
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index 94eedad..19ab5f7 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -97,6 +97,7 @@
>  ?	xenpf_pcpuinfo			platform.h
>  ?	xenpf_pcpu_version		platform.h
>  ?	xenpf_resource_entry		platform.h
> +?	pmu_params			pmu.h
>  !	sched_poll			sched.h
>  ?	sched_remote_shutdown		sched.h
>  ?	sched_shutdown			sched.h
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index f20e89c..c637454 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -655,4 +655,19 @@ static XSM_INLINE int xsm_ioport_mapping(XSM_DEFAULT_ARG struct domain *d, uint3
>      return xsm_default_action(action, current->domain, d);
>  }
>  
> +static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
> +{
> +    XSM_ASSERT_ACTION(XSM_OTHER);
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    case XENPMU_mode_get:
> +    case XENPMU_feature_set:
> +    case XENPMU_feature_get:
> +        return xsm_default_action(XSM_PRIV, d, current->domain);
> +    default:
> +        return -EPERM;
> +    }
> +}
> +
>  #endif /* CONFIG_X86 */
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 4ce089f..90edbb1 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -173,6 +173,7 @@ struct xsm_operations {
>      int (*unbind_pt_irq) (struct domain *d, struct xen_domctl_bind_pt_irq *bind);
>      int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>      int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
> +    int (*pmu_op) (struct domain *d, unsigned int op);
>  #endif
>  };
>  
> @@ -665,6 +666,11 @@ static inline int xsm_ioport_mapping (xsm_default_t def, struct domain *d, uint3
>      return xsm_ops->ioport_mapping(d, s, e, allow);
>  }
>  
> +static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, int op)
> +{
> +    return xsm_ops->pmu_op(d, op);
> +}
> +
>  #endif /* CONFIG_X86 */
>  
>  #endif /* XSM_NO_WRAPPERS */
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index 8eb3050..94f1cf0 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -144,5 +144,6 @@ void xsm_fixup_ops (struct xsm_operations *ops)
>      set_to_dummy_if_null(ops, unbind_pt_irq);
>      set_to_dummy_if_null(ops, ioport_permission);
>      set_to_dummy_if_null(ops, ioport_mapping);
> +    set_to_dummy_if_null(ops, pmu_op);
>  #endif
>  }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index c6431b5..982e879 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1505,6 +1505,23 @@ static int flask_unbind_pt_irq (struct domain *d, struct xen_domctl_bind_pt_irq
>  {
>      return current_has_perm(d, SECCLASS_RESOURCE, RESOURCE__REMOVE);
>  }
> +
> +static int flask_pmu_op (struct domain *d, unsigned int op)
> +{
> +    u32 dsid = domain_sid(d);
> +
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    case XENPMU_mode_get:
> +    case XENPMU_feature_set:
> +    case XENPMU_feature_get:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
> +                            XEN2__PMU_CTRL, NULL);
> +    default:
> +        return -EPERM;
> +    }
> +}
>  #endif /* CONFIG_X86 */
>  
>  long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
> @@ -1627,6 +1644,7 @@ static struct xsm_operations flask_ops = {
>      .unbind_pt_irq = flask_unbind_pt_irq,
>      .ioport_permission = flask_ioport_permission,
>      .ioport_mapping = flask_ioport_mapping,
> +    .pmu_op = flask_pmu_op,
>  #endif
>  };
>  
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index 3a97577..626850d 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -86,6 +86,8 @@ class xen2
>      psr_cmt_op
>  # XENPF_get_symbol
>      get_symbol
> +# PMU control
> +    pmu_ctrl
>  }
>  
>  # Classes domain and domain2 consist of operations that a domain performs on
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags
  2015-03-19 13:12   ` Dietmar Hahn
@ 2015-03-19 13:28     ` Jan Beulich
  2015-03-19 13:49       ` Dietmar Hahn
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2015-03-19 13:28 UTC (permalink / raw)
  To: Dietmar Hahn
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, Aravind.Gopalakrishnan, jun.nakajima, Boris Ostrovsky,
	dgdegra

>>> On 19.03.15 at 14:12, <dietmar.hahn@ts.fujitsu.com> wrote:
> Am Dienstag 17 März 2015, 10:54:02 schrieb Boris Ostrovsky:
>> +    case XENPMU_feature_get:
> 
> memset(&pmu_params, 0, sizeof(pmu_params));  ?
> 
>> +        pmu_params.val = vpmu_features;
>> +        if ( copy_field_to_guest(arg, &pmu_params, val) )
>> +            return -EFAULT;

Why? Notice the _field_ in the used function's name.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags
  2015-03-19 13:28     ` Jan Beulich
@ 2015-03-19 13:49       ` Dietmar Hahn
  0 siblings, 0 replies; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-19 13:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, Aravind.Gopalakrishnan, jun.nakajima, Boris Ostrovsky,
	dgdegra

Am Donnerstag 19 März 2015, 13:28:56 schrieb Jan Beulich:
> >>> On 19.03.15 at 14:12, <dietmar.hahn@ts.fujitsu.com> wrote:
> > Am Dienstag 17 März 2015, 10:54:02 schrieb Boris Ostrovsky:
> >> +    case XENPMU_feature_get:
> > 
> > memset(&pmu_params, 0, sizeof(pmu_params));  ?
> > 
> >> +        pmu_params.val = vpmu_features;
> >> +        if ( copy_field_to_guest(arg, &pmu_params, val) )
> >> +            return -EFAULT;
> 
> Why? Notice the _field_ in the used function's name.
> 
> Jan

Ah, yes, thanks! Sorry for noise.

Dietmar.

> 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support
  2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (13 preceding siblings ...)
  2015-03-17 14:54 ` [PATCH v19 14/14] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
@ 2015-03-19 14:32 ` Dietmar Hahn
  2015-03-19 16:03   ` Boris Ostrovsky
  14 siblings, 1 reply; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-19 14:32 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, JBeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Am Dienstag 17 März 2015, 10:53:57 schrieb Boris Ostrovsky:
> 
> Changes in v19:
> 
> * Do not allow changing mode to/from OFF/ALL while guests are
>   running. This significantly simplifies code due
>   to large number of corner cases that I had to deal with. Most of the
>   changes are in patch#5. This also makes patch 4 from last version
>   unnecessary
> * Defer NMI support (drop patch#14 from last version)
> * Make patch#15 from last series be patch#1 (vpmu init cleanup)
> * Other changes are listed per patch

Hi,

is anybody of you working on PEBS support for x86_64?

Dietmar.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support
  2015-03-19 14:32 ` [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Dietmar Hahn
@ 2015-03-19 16:03   ` Boris Ostrovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-19 16:03 UTC (permalink / raw)
  To: Dietmar Hahn, xen-devel
  Cc: kevin.tian, JBeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra

On 03/19/2015 10:32 AM, Dietmar Hahn wrote:
> Am Dienstag 17 März 2015, 10:53:57 schrieb Boris Ostrovsky:
>> Changes in v19:
>>
>> * Do not allow changing mode to/from OFF/ALL while guests are
>>    running. This significantly simplifies code due
>>    to large number of corner cases that I had to deal with. Most of the
>>    changes are in patch#5. This also makes patch 4 from last version
>>    unnecessary
>> * Defer NMI support (drop patch#14 from last version)
>> * Make patch#15 from last series be patch#1 (vpmu init cleanup)
>> * Other changes are listed per patch
> Hi,
>
> is anybody of you working on PEBS support for x86_64?
>

I am not and don't see myself doing it in the immediate future.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags
  2015-03-17 14:54 ` [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
  2015-03-19 13:12   ` Dietmar Hahn
@ 2015-03-24 13:56   ` Jan Beulich
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2015-03-24 13:56 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * XENPMU_MODE_OFF:  PMU is not virtualized
> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
>   can profile itself and the hypervisor.
> 
> Note that PMU modes are different from what can be provided at Xen's boot 
> line
> with 'vpmu' argument. An 'off' (or '0') value is equivalent to 
> XENPMU_MODE_OFF.
> Any other value, on the other hand, will cause VPMU mode to be set to
> XENPMU_MODE_SELF during boot.
> 
> For feature flags only Intel's BTS is currently supported.
> 
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests
  2015-03-17 14:54 ` [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
@ 2015-03-24 14:08   ` Jan Beulich
  2015-03-24 15:06     ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2015-03-24 14:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
> @@ -263,14 +264,14 @@ void vpmu_initialise(struct vcpu *v)
>      BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>      BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>  
> -    if ( is_pvh_vcpu(v) )
> -        return;
> -
>      ASSERT(!vpmu->flags && !vpmu->context);
>  
> -    spin_lock(&vpmu_lock);
> -    vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
> -    spin_unlock(&vpmu_lock);
> +    if ( v->domain != hardware_domain )
> +    {
> +        spin_lock(&vpmu_lock);
> +        vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
> +        spin_unlock(&vpmu_lock);
> +    }

So why is this being made conditional here? A patch this size would
certainly benefit from a slightly more verbose description...

And then if the conditional is indeed needed - why don't you use
is_hardware_domain()?

> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    struct vpmu_struct *vpmu;
> +    struct page_info *page;
> +    uint64_t gfn = params->val;
> +
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
> +        return -EINVAL;
> +
> +    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> +         (d->vcpu[params->vcpu] == NULL) )
> +        return -EINVAL;
> +
> +    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> +    if ( !page )
> +        return -EINVAL;
> +
> +    if ( !get_page_type(page, PGT_writable_page) )
> +    {
> +        put_page(page);
> +        return -EINVAL;
> +    }
> +
> +    v = d->vcpu[params->vcpu];
> +    vpmu = vcpu_vpmu(v);
> +
> +    spin_lock(&vpmu->vpmu_lock);
> +
> +    if ( v->arch.vpmu.xenpmu_data )
> +    {
> +        put_page_and_type(page);
> +        spin_unlock(&vpmu->vpmu_lock);

These two lines should be switched.

> +        return -EEXIST;
> +    }
> +
> +    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
> +    if ( !v->arch.vpmu.xenpmu_data )
> +    {
> +        put_page_and_type(page);
> +        spin_unlock(&vpmu->vpmu_lock);

Same here.

> +static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    struct vpmu_struct *vpmu;
> +    uint64_t mfn;
> +
> +    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> +         (d->vcpu[params->vcpu] == NULL) )
> +        return;
> +
> +    v = d->vcpu[params->vcpu];
> +    if ( v != current )
> +        vcpu_pause(v);
> +
> +    vpmu = vcpu_vpmu(v);
> +    spin_lock(&vpmu->vpmu_lock);
> +
> +    vpmu_destroy(v);
> +
> +    if ( v->arch.vpmu.xenpmu_data )
> +    {
> +        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
> +        ASSERT(mfn != 0);
> +        unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
> +        put_page_and_type(mfn_to_page(mfn));
> +        v->arch.vpmu.xenpmu_data = NULL;
> +    }
> +
> +    spin_unlock(&vpmu->vpmu_lock);

Similarly here the put should be moved out of the locked region.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch
  2015-03-17 14:54 ` [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2015-03-24 14:15   ` Jan Beulich
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2015-03-24 14:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
> Save VPMU state during context switch for both HVM and PV(H) guests.
> 
> A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible
> for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs
> is_running to be correctly set/cleared. To prepare for that, call 
> context_saved()
> before vpmu_switch_to() is executed. (Note that while this change could have
> been dalayed until that later patch, the changes are harmless to existing 
> code
> and so we do it here)
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
  2015-03-17 14:54 ` [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2015-03-24 14:28   ` Jan Beulich
  2015-03-24 15:13     ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2015-03-24 14:28 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
> Changes in v19:
> * Adjusted for new ops interfaces (passing vcpu vs. vpmu)
> * Test for domain->max_cpu in choose_hwdom_vcpu() instead of 
> 'domain->vcpu!=NULL'

I suppose that's something that then should also be done in patch 7?

> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -87,31 +87,57 @@ static void __init parse_vpmu_param(char *s)
>  void vpmu_lvtpc_update(uint32_t val)
>  {
>      struct vpmu_struct *vpmu;
> +    struct vcpu *curr;
>  
>      if ( vpmu_mode == XENPMU_MODE_OFF )
>          return;
>  
> -    vpmu = vcpu_vpmu(current);
> +    curr = current;

Please make this the initializer of the variable, to be consistent with
other code (including yours a few lines down).

> +    vpmu = vcpu_vpmu(curr);
>  
>      vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
> -    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> +
> +    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
> +    if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data ||

Here and below you use is_hvm_vcpu() - is the title wrongly saying
just PV when PV/PVH is meant?

> @@ -548,6 +719,24 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>          pvpmu_finish(current->domain, &pmu_params);
>          break;
>  
> +    case XENPMU_lvtpc_set:
> +        curr = current;
> +        xenpmu_data = curr->arch.vpmu.xenpmu_data;
> +        if ( xenpmu_data == NULL )
> +            return -EINVAL;
> +        vpmu_lvtpc_update(xenpmu_data->pmu.l.lapic_lvtpc);
> +        break;

You don't use curr more than once here, hence I don't see the point
of latching current into a local variable.

These aside,
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests
  2015-03-24 14:08   ` Jan Beulich
@ 2015-03-24 15:06     ` Boris Ostrovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-24 15:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 03/24/2015 10:08 AM, Jan Beulich wrote:
>>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
>> @@ -263,14 +264,14 @@ void vpmu_initialise(struct vcpu *v)
>>       BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>>       BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>>   
>> -    if ( is_pvh_vcpu(v) )
>> -        return;
>> -
>>       ASSERT(!vpmu->flags && !vpmu->context);
>>   
>> -    spin_lock(&vpmu_lock);
>> -    vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
>> -    spin_unlock(&vpmu_lock);
>> +    if ( v->domain != hardware_domain )
>> +    {
>> +        spin_lock(&vpmu_lock);
>> +        vpmu_count++; /* Prevent vpmu_mode from changing until we are done */
>> +        spin_unlock(&vpmu_lock);
>> +    }
> So why is this being made conditional here? A patch this size would
> certainly benefit from a slightly more verbose description...

I don't want to include dom0's VPMU since the count is used for deciding 
whether or not we can switch the mode. If other guests have active VPMUs 
we sometimes can't do the switch. dom0's VPMUs are always there and we 
should be able to switch the mode with those VPMUs on, which is why 
there is no reason to count them.

I suppose I could check whether the count is smaller than number of 
dom0's CPUs but that's not particularly convenient since we could have 
had errors during initialization (so not all dom0's VCPUs have VPMUs), 
we may have to guard against hotplug events, etc.

But yes, an explanation would be useful. I'll add something.

>
> And then if the conditional is indeed needed - why don't you use
> is_hardware_domain()?

No specific reason. I'll switch to is_hardware_domain()

-boris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
  2015-03-24 14:28   ` Jan Beulich
@ 2015-03-24 15:13     ` Boris Ostrovsky
  2015-03-24 15:36       ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-24 15:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 03/24/2015 10:28 AM, Jan Beulich wrote:
>>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
>> Changes in v19:
>> * Adjusted for new ops interfaces (passing vcpu vs. vpmu)
>> * Test for domain->max_cpu in choose_hwdom_vcpu() instead of
>> 'domain->vcpu!=NULL'
> I suppose that's something that then should also be done in patch 7?

We only need this routine during interrupt handling (for PV(H)) and this 
is the patch that introduces this functionality.

And if you are asking about the test specifically --- this is also the 
first patch where we refer to hardware_domain->vcpu[], which is what the 
test is really for.

Or is it something else that you had in mind?

>> +    vpmu = vcpu_vpmu(curr);
>>   
>>       vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
>> -    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
>> +
>> +    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
>> +    if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data ||
> Here and below you use is_hvm_vcpu() - is the title wrongly saying
> just PV when PV/PVH is meant?

Yes, it should include both PV and PVH.

-boris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
  2015-03-24 15:13     ` Boris Ostrovsky
@ 2015-03-24 15:36       ` Jan Beulich
  2015-03-24 15:47         ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2015-03-24 15:36 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 24.03.15 at 16:13, <boris.ostrovsky@oracle.com> wrote:
> On 03/24/2015 10:28 AM, Jan Beulich wrote:
>>>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
>>> Changes in v19:
>>> * Adjusted for new ops interfaces (passing vcpu vs. vpmu)
>>> * Test for domain->max_cpu in choose_hwdom_vcpu() instead of
>>> 'domain->vcpu!=NULL'
>> I suppose that's something that then should also be done in patch 7?
> 
> We only need this routine during interrupt handling (for PV(H)) and this 
> is the patch that introduces this functionality.
> 
> And if you are asking about the test specifically --- this is also the 
> first patch where we refer to hardware_domain->vcpu[], which is what the 
> test is really for.
> 
> Or is it something else that you had in mind?

Yes - following Andrew's cleanup I believe the d->vcpu != NULL
check is redundant with having (perhaps indirectly)
checked d->max_vcpus > 0.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
  2015-03-24 15:36       ` Jan Beulich
@ 2015-03-24 15:47         ` Boris Ostrovsky
  2015-03-24 16:03           ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2015-03-24 15:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 03/24/2015 11:36 AM, Jan Beulich wrote:
>>>> On 24.03.15 at 16:13, <boris.ostrovsky@oracle.com> wrote:
>> On 03/24/2015 10:28 AM, Jan Beulich wrote:
>>>>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
>>>> Changes in v19:
>>>> * Adjusted for new ops interfaces (passing vcpu vs. vpmu)
>>>> * Test for domain->max_cpu in choose_hwdom_vcpu() instead of
>>>> 'domain->vcpu!=NULL'
>>> I suppose that's something that then should also be done in patch 7?
>> We only need this routine during interrupt handling (for PV(H)) and this
>> is the patch that introduces this functionality.
>>
>> And if you are asking about the test specifically --- this is also the
>> first patch where we refer to hardware_domain->vcpu[], which is what the
>> test is really for.
>>
>> Or is it something else that you had in mind?
> Yes - following Andrew's cleanup I believe the d->vcpu != NULL
> check is redundant with having (perhaps indirectly)
> checked d->max_vcpus > 0.

Then I am not sure I understand what you are asking me to do for patch 
7, sorry.

-boris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
  2015-03-24 15:47         ` Boris Ostrovsky
@ 2015-03-24 16:03           ` Jan Beulich
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2015-03-24 16:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 24.03.15 at 16:47, <boris.ostrovsky@oracle.com> wrote:
> On 03/24/2015 11:36 AM, Jan Beulich wrote:
>>>>> On 24.03.15 at 16:13, <boris.ostrovsky@oracle.com> wrote:
>>> On 03/24/2015 10:28 AM, Jan Beulich wrote:
>>>>>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
>>>>> Changes in v19:
>>>>> * Adjusted for new ops interfaces (passing vcpu vs. vpmu)
>>>>> * Test for domain->max_cpu in choose_hwdom_vcpu() instead of
>>>>> 'domain->vcpu!=NULL'
>>>> I suppose that's something that then should also be done in patch 7?
>>> We only need this routine during interrupt handling (for PV(H)) and this
>>> is the patch that introduces this functionality.
>>>
>>> And if you are asking about the test specifically --- this is also the
>>> first patch where we refer to hardware_domain->vcpu[], which is what the
>>> test is really for.
>>>
>>> Or is it something else that you had in mind?
>> Yes - following Andrew's cleanup I believe the d->vcpu != NULL
>> check is redundant with having (perhaps indirectly)
>> checked d->max_vcpus > 0.
> 
> Then I am not sure I understand what you are asking me to do for patch 
> 7, sorry.

There you have

+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct vpmu_struct *vpmu;
+    struct page_info *page;
+    uint64_t gfn = params->val;
+
+    if ( vpmu_mode == XENPMU_MODE_OFF )
+        return -EINVAL;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return -EINVAL;

This conditional checks both ->max_vcpus and ->vcpu, when
the former check passes you already know ->max_vcpus > 0.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2015-03-17 14:54 ` [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
@ 2015-03-25  8:29   ` Dietmar Hahn
  2015-03-25 12:23     ` Dietmar Hahn
  2015-03-25 16:52   ` Jan Beulich
  1 sibling, 1 reply; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-25  8:29 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, JBeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Am Dienstag 17 März 2015, 10:54:09 schrieb Boris Ostrovsky:
> The two routines share most of their logic.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Changes in v19:
> * const-ified arch_vpmu_ops in vpmu_do_wrmsr
> * non-changes:
>    - kept 'current' as a non-initializer to avoid unnecessary initialization
>      in the (common) non-VPMU case
>    - kept 'nop' label since there are multiple dissimilar cases that can cause
>      a non-emulation of VPMU access
> 
>  xen/arch/x86/hvm/vpmu.c        | 76 +++++++++++++++++-------------------------
>  xen/include/asm-x86/hvm/vpmu.h | 14 ++++++--
>  2 files changed, 42 insertions(+), 48 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index c287d8b..beed956 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -103,63 +103,47 @@ void vpmu_lvtpc_update(uint32_t val)
>          apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
>  }
>  
> -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
> +int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
> +                uint64_t supported, bool_t is_write)
>  {
> -    struct vcpu *curr = current;
> +    struct vcpu *curr;
>      struct vpmu_struct *vpmu;
> +    const struct arch_vpmu_ops *ops;

What is this const useful for in this small function?

Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

Dietmar.

> +    int ret = 0;
>  
>      if ( vpmu_mode == XENPMU_MODE_OFF )
> -        return 0;
> +        goto nop;
>  
> +    curr = current;
>      vpmu = vcpu_vpmu(curr);
> -    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
> -    {
> -        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
> -
> -        /*
> -         * We may have received a PMU interrupt during WRMSR handling
> -         * and since do_wrmsr may load VPMU context we should save
> -         * (and unload) it again.
> -         */
> -        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
> -             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
> -        {
> -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> -            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
> -            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> -        }
> -        return ret;
> -    }
> -
> -    return 0;
> -}
> -
> -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
> -{
> -    struct vcpu *curr = current;
> -    struct vpmu_struct *vpmu;
> +    ops = vpmu->arch_vpmu_ops;
> +    if ( !ops )
> +        goto nop;
> +
> +    if ( is_write && ops->do_wrmsr )
> +        ret = ops->do_wrmsr(msr, *msr_content, supported);
> +    else if ( !is_write && ops->do_rdmsr )
> +        ret = ops->do_rdmsr(msr, msr_content);
> +    else
> +        goto nop;
>  
> -    if ( vpmu_mode == XENPMU_MODE_OFF )
> +    /*
> +     * We may have received a PMU interrupt while handling MSR access
> +     * and since do_wr/rdmsr may load VPMU context we should save
> +     * (and unload) it again.
> +     */
> +    if ( !is_hvm_vcpu(curr) &&
> +         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
>      {
> -        *msr_content = 0;
> -        return 0;
> +        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> +        ops->arch_vpmu_save(curr);
> +        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>      }
>  
> -    vpmu = vcpu_vpmu(curr);
> -    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
> -    {
> -        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
> +    return ret;
>  
> -        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
> -             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
> -        {
> -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> -            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
> -            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> -        }
> -        return ret;
> -    }
> -    else
> + nop:
> +    if ( !is_write )
>          *msr_content = 0;
>  
>      return 0;
> diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
> index 642a4b7..63851a7 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -99,8 +99,8 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
>  }
>  
>  void vpmu_lvtpc_update(uint32_t val);
> -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
> -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
> +int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
> +                uint64_t supported, bool_t is_write);
>  void vpmu_do_interrupt(struct cpu_user_regs *regs);
>  void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>                                         unsigned int *ecx, unsigned int *edx);
> @@ -110,6 +110,16 @@ void vpmu_save(struct vcpu *v);
>  void vpmu_load(struct vcpu *v);
>  void vpmu_dump(struct vcpu *v);
>  
> +static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
> +                                uint64_t supported)
> +{
> +    return vpmu_do_msr(msr, &msr_content, supported, 1);
> +}
> +static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
> +{
> +    return vpmu_do_msr(msr, msr_content, 0, 0);
> +}
> +
>  extern int acquire_pmu_ownership(int pmu_ownership);
>  extern void release_pmu_ownership(int pmu_ownership);
>  
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2015-03-25  8:29   ` Dietmar Hahn
@ 2015-03-25 12:23     ` Dietmar Hahn
  0 siblings, 0 replies; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-25 12:23 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, JBeulich, Boris Ostrovsky, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, jun.nakajima, dgdegra,
	suravee.suthikulpanit

Am Mittwoch 25 März 2015, 09:29:14 schrieb Dietmar Hahn:
> Am Dienstag 17 März 2015, 10:54:09 schrieb Boris Ostrovsky:
> > The two routines share most of their logic.
> > 
> > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> > ---
> > Changes in v19:
> > * const-ified arch_vpmu_ops in vpmu_do_wrmsr
> > * non-changes:
> >    - kept 'current' as a non-initializer to avoid unnecessary initialization
> >      in the (common) non-VPMU case
> >    - kept 'nop' label since there are multiple dissimilar cases that can cause
> >      a non-emulation of VPMU access
> > 
> >  xen/arch/x86/hvm/vpmu.c        | 76 +++++++++++++++++-------------------------
> >  xen/include/asm-x86/hvm/vpmu.h | 14 ++++++--
> >  2 files changed, 42 insertions(+), 48 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> > index c287d8b..beed956 100644
> > --- a/xen/arch/x86/hvm/vpmu.c
> > +++ b/xen/arch/x86/hvm/vpmu.c
> > @@ -103,63 +103,47 @@ void vpmu_lvtpc_update(uint32_t val)
> >          apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> >  }
> >  
> > -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
> > +int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
> > +                uint64_t supported, bool_t is_write)
> >  {
> > -    struct vcpu *curr = current;
> > +    struct vcpu *curr;
> >      struct vpmu_struct *vpmu;
> > +    const struct arch_vpmu_ops *ops;
> 
> What is this const useful for in this small function?

Sorry for the question and for the noise I already got it!

> 
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> 
> Dietmar.
> 
> > +    int ret = 0;
> >  
> >      if ( vpmu_mode == XENPMU_MODE_OFF )
> > -        return 0;
> > +        goto nop;
> >  
> > +    curr = current;
> >      vpmu = vcpu_vpmu(curr);
> > -    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
> > -    {
> > -        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
> > -
> > -        /*
> > -         * We may have received a PMU interrupt during WRMSR handling
> > -         * and since do_wrmsr may load VPMU context we should save
> > -         * (and unload) it again.
> > -         */
> > -        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
> > -             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
> > -        {
> > -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> > -            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
> > -            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> > -        }
> > -        return ret;
> > -    }
> > -
> > -    return 0;
> > -}
> > -
> > -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
> > -{
> > -    struct vcpu *curr = current;
> > -    struct vpmu_struct *vpmu;
> > +    ops = vpmu->arch_vpmu_ops;
> > +    if ( !ops )
> > +        goto nop;
> > +
> > +    if ( is_write && ops->do_wrmsr )
> > +        ret = ops->do_wrmsr(msr, *msr_content, supported);
> > +    else if ( !is_write && ops->do_rdmsr )
> > +        ret = ops->do_rdmsr(msr, msr_content);
> > +    else
> > +        goto nop;
> >  
> > -    if ( vpmu_mode == XENPMU_MODE_OFF )
> > +    /*
> > +     * We may have received a PMU interrupt while handling MSR access
> > +     * and since do_wr/rdmsr may load VPMU context we should save
> > +     * (and unload) it again.
> > +     */
> > +    if ( !is_hvm_vcpu(curr) &&
> > +         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
> >      {
> > -        *msr_content = 0;
> > -        return 0;
> > +        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> > +        ops->arch_vpmu_save(curr);
> > +        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> >      }
> >  
> > -    vpmu = vcpu_vpmu(curr);
> > -    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
> > -    {
> > -        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
> > +    return ret;
> >  
> > -        if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
> > -             (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
> > -        {
> > -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> > -            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
> > -            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> > -        }
> > -        return ret;
> > -    }
> > -    else
> > + nop:
> > +    if ( !is_write )
> >          *msr_content = 0;
> >  
> >      return 0;
> > diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
> > index 642a4b7..63851a7 100644
> > --- a/xen/include/asm-x86/hvm/vpmu.h
> > +++ b/xen/include/asm-x86/hvm/vpmu.h
> > @@ -99,8 +99,8 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
> >  }
> >  
> >  void vpmu_lvtpc_update(uint32_t val);
> > -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
> > -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
> > +int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
> > +                uint64_t supported, bool_t is_write);
> >  void vpmu_do_interrupt(struct cpu_user_regs *regs);
> >  void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >                                         unsigned int *ecx, unsigned int *edx);
> > @@ -110,6 +110,16 @@ void vpmu_save(struct vcpu *v);
> >  void vpmu_load(struct vcpu *v);
> >  void vpmu_dump(struct vcpu *v);
> >  
> > +static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
> > +                                uint64_t supported)
> > +{
> > +    return vpmu_do_msr(msr, &msr_content, supported, 1);
> > +}
> > +static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
> > +{
> > +    return vpmu_do_msr(msr, msr_content, 0, 0);
> > +}
> > +
> >  extern int acquire_pmu_ownership(int pmu_ownership);
> >  extern void release_pmu_ownership(int pmu_ownership);
> >  
> > 
> 
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode
  2015-03-17 14:54 ` [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2015-03-25 12:25   ` Dietmar Hahn
  2015-03-25 16:57   ` Jan Beulich
  1 sibling, 0 replies; 35+ messages in thread
From: Dietmar Hahn @ 2015-03-25 12:25 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, JBeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Am Dienstag 17 März 2015, 10:54:10 schrieb Boris Ostrovsky:
> Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged
> domain (dom0) profile both itself (and the hypervisor) and the guests. While
> this mode is on profiling in guests is disabled.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

Dietmar.

> ---
> Changes in v19:
> * Slightly different mode changing logic in xenpmu_op() since we no longer
>   allow mode changes while VPMUs are active
> 
>  xen/arch/x86/hvm/vpmu.c  | 34 +++++++++++++++++++++++++---------
>  xen/arch/x86/traps.c     | 13 +++++++++++++
>  xen/include/public/pmu.h |  3 +++
>  3 files changed, 41 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index beed956..71c5063 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -111,7 +111,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
>      const struct arch_vpmu_ops *ops;
>      int ret = 0;
>  
> -    if ( vpmu_mode == XENPMU_MODE_OFF )
> +    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
> +         ((vpmu_mode & XENPMU_MODE_ALL) &&
> +          !is_hardware_domain(current->domain)) )
>          goto nop;
>  
>      curr = current;
> @@ -166,8 +168,12 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
>      struct vcpu *sampled = current, *sampling;
>      struct vpmu_struct *vpmu;
>  
> -    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
> -    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
> +    /*
> +     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
> +     * in XENPMU_MODE_ALL, for everyone.
> +     */
> +    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
> +         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
>      {
>          sampling = choose_hwdom_vcpu();
>          if ( !sampling )
> @@ -177,17 +183,18 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
>          sampling = sampled;
>  
>      vpmu = vcpu_vpmu(sampling);
> -    if ( !is_hvm_vcpu(sampling) )
> +    if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) )
>      {
>          /* PV(H) guest */
>          const struct cpu_user_regs *cur_regs;
>          uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
> -        uint32_t domid = DOMID_SELF;
> +        uint32_t domid;
>  
>          if ( !vpmu->xenpmu_data )
>              return;
>  
>          if ( is_pvh_vcpu(sampling) &&
> +             !(vpmu_mode & XENPMU_MODE_ALL) &&
>               !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>              return;
>  
> @@ -204,6 +211,11 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
>          else
>              *flags = PMU_SAMPLE_PV;
>  
> +        if ( sampled == sampling )
> +            domid = DOMID_SELF;
> +        else
> +            domid = sampled->domain->domain_id;
> +
>          /* Store appropriate registers in xenpmu_data */
>          /* FIXME: 32-bit PVH should go here as well */
>          if ( is_pv_32bit_vcpu(sampling) )
> @@ -232,7 +244,8 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
>  
>              if ( (vpmu_mode & XENPMU_MODE_SELF) )
>                  cur_regs = guest_cpu_user_regs();
> -            else if ( !guest_mode(regs) && is_hardware_domain(sampling->domain) )
> +            else if ( !guest_mode(regs) &&
> +                      is_hardware_domain(sampling->domain) )
>              {
>                  cur_regs = regs;
>                  domid = DOMID_XEN;
> @@ -508,7 +521,8 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>      struct page_info *page;
>      uint64_t gfn = params->val;
>  
> -    if ( vpmu_mode == XENPMU_MODE_OFF )
> +    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
> +         ((vpmu_mode & XENPMU_MODE_ALL) && !is_hardware_domain(d)) )
>          return -EINVAL;
>  
>      if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> @@ -627,12 +641,14 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>      {
>      case XENPMU_mode_set:
>      {
> -        if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) ||
> +        if ( (pmu_params.val &
> +              ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
>               (hweight64(pmu_params.val) > 1) )
>              return -EINVAL;
>  
>          /* 32-bit dom0 can only sample itself. */
> -        if ( is_pv_32bit_vcpu(current) && (pmu_params.val & XENPMU_MODE_HV) )
> +        if ( is_pv_32bit_vcpu(current) &&
> +             (pmu_params.val & (XENPMU_MODE_HV | XENPMU_MODE_ALL)) )
>              return -EINVAL;
>  
>          spin_lock(&vpmu_lock);
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 1eb7bb4..8a40deb 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2653,6 +2653,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
>                  if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
>                  {
> +                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
> +                         !is_hardware_domain(v->domain) )
> +                        break;
> +
>                      if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
>                          goto fail;
>                  }
> @@ -2776,6 +2780,15 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
>                  if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
>                  {
> +
> +                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
> +                         !is_hardware_domain(v->domain) )
> +                    {
> +                        /* Don't leak PMU MSRs to unprivileged domains */
> +                        regs->eax = regs->edx = 0;
> +                        break;
> +                    }
> +
>                      if ( vpmu_do_rdmsr(regs->ecx, &val) )
>                          goto fail;
>  
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index db5321a..a83ae71 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -52,10 +52,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
>   * - XENPMU_MODE_SELF:  Guests can profile themselves
>   * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
>   *                      itself and Xen
> + * - XENPMU_MODE_ALL:   Only dom0 has access to VPMU and it profiles
> + *                      everyone: itself, the hypervisor and the guests.
>   */
>  #define XENPMU_MODE_OFF           0
>  #define XENPMU_MODE_SELF          (1<<0)
>  #define XENPMU_MODE_HV            (1<<1)
> +#define XENPMU_MODE_ALL           (1<<2)
>  
>  /*
>   * PMU features:
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2015-03-17 14:54 ` [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
  2015-03-25  8:29   ` Dietmar Hahn
@ 2015-03-25 16:52   ` Jan Beulich
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2015-03-25 16:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
> ---
> Changes in v19:
> * const-ified arch_vpmu_ops in vpmu_do_wrmsr
> * non-changes:
>    - kept 'current' as a non-initializer to avoid unnecessary initialization
>      in the (common) non-VPMU case

Afaict there's nothing at all keeping the compiler to still schedule part
or all of the involved insns ahead of that first goto. And if it was an
initializer, there would equally be nothing keeping the compiler from
scheduling part or all of those insns after the if(). So I don't see why
not making it the initializer is better (other than increasing line count
and patch size). If you really wanted to do something to help the
compiler, flagging the if() condition with likely() would perhaps be the
thing to do.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode
  2015-03-17 14:54 ` [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
  2015-03-25 12:25   ` Dietmar Hahn
@ 2015-03-25 16:57   ` Jan Beulich
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2015-03-25 16:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 17.03.15 at 15:54, <boris.ostrovsky@oracle.com> wrote:
> Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged
> domain (dom0) profile both itself (and the hypervisor) and the guests. While
> this mode is on profiling in guests is disabled.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
but ...

> @@ -177,17 +183,18 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
>          sampling = sampled;
>  
>      vpmu = vcpu_vpmu(sampling);
> -    if ( !is_hvm_vcpu(sampling) )
> +    if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) )
>      {
>          /* PV(H) guest */
>          const struct cpu_user_regs *cur_regs;
>          uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
> -        uint32_t domid = DOMID_SELF;
> +        uint32_t domid;

... why is this not domid_t? Probably needs adjustment in an earlier
patch.

Jan

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-03-25 16:57 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-17 14:53 [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
2015-03-17 14:53 ` [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called Boris Ostrovsky
2015-03-19  8:43   ` Dietmar Hahn
2015-03-17 14:53 ` [PATCH v19 02/14] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 03/14] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 04/14] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2015-03-19 13:12   ` Dietmar Hahn
2015-03-19 13:28     ` Jan Beulich
2015-03-19 13:49       ` Dietmar Hahn
2015-03-24 13:56   ` Jan Beulich
2015-03-17 14:54 ` [PATCH v19 06/14] x86/VPMU: Initialize VPMUs with __initcall Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
2015-03-24 14:08   ` Jan Beulich
2015-03-24 15:06     ` Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2015-03-24 14:15   ` Jan Beulich
2015-03-17 14:54 ` [PATCH v19 09/14] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 10/14] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
2015-03-17 14:54 ` [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2015-03-24 14:28   ` Jan Beulich
2015-03-24 15:13     ` Boris Ostrovsky
2015-03-24 15:36       ` Jan Beulich
2015-03-24 15:47         ` Boris Ostrovsky
2015-03-24 16:03           ` Jan Beulich
2015-03-17 14:54 ` [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
2015-03-25  8:29   ` Dietmar Hahn
2015-03-25 12:23     ` Dietmar Hahn
2015-03-25 16:52   ` Jan Beulich
2015-03-17 14:54 ` [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2015-03-25 12:25   ` Dietmar Hahn
2015-03-25 16:57   ` Jan Beulich
2015-03-17 14:54 ` [PATCH v19 14/14] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
2015-03-19 14:32 ` [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support Dietmar Hahn
2015-03-19 16:03   ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.