All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support
@ 2014-09-25 19:28 Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
                   ` (21 more replies)
  0 siblings, 22 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Here is the twelfth version of PV(H) PMU patches.

Changes in v12:

* Added XSM support
* Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL
* Updated documentation for 'vpmu=nmi' option
* Added more text to a bunch of commit messages (per Konrad's request)

Changes in v11:

* Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch.
  - as part of this re-work noticed that CS registers were set in later patch then
    needed. Moved those changes to appropriate place
* Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only provide dom0
  with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will be what
  XENPMU_MODE_SELF used to be.
* Kept  vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around vmx_add_msr()
* Cleaned up VPMU context switch macros (moved  'if(prev!=next)' back to context_switch())
* Dropped hypercall continuation from vpmu_force_context_switch() and replaced it with
  -EAGAIN error if hypercall_preempt_check() is true after 2ms.
* Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr()
* Move context switching patch (#13) earlier in the series (for proper bisection support)
* Various comment updates and cleanups
* Dropped a bunch of Reviewed-by and all Tested-by tags

Changes in v10:

* Swapped address and name fields of xenpf_symdata (to make it smaller on 32-bit)
* Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more complicated.
* Cleaned up vlapic_reg_write()
* Call vpmu_destroy() for both HVM and PVH VCPUs
* Verify that (xen_pmu_data+PMU register bank) fit into a page
* Return error codes from arch-specific VPMU init code
* Moved VPMU-related context switch logic into inlines
* vpmu_force_context_switch() changes:
  o Avoid greater than page-sized allocations
  o Prevent another VCPU from starting VPMU sync while the first sync is in progress
* Avoid stack leak in do_xenpmu_op()
* Checked validity of Intel VPMU MSR values before they are committed
* Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on AMD)
* Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => sampling)
* Clarified commit messages (patches 2, 13, 18) 
* Various cleanups

Changes in v9:

* Restore VPMU context after context_saved() is called in
  context_switch(). This is needed because vpmu_load() may end up
  calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
  to be correctly set/cleared. (patch 18, dropped review acks)
* Added patch 2 to properly manage VPMU_CONTEXT_LOADED
* Addressed most of Jan's comments.
  o Keep track of time in vpmu_force_context_switch() to properly break
    out of a loop when using hypercall continuations
  o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
  o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
    make it more clear which vcpu we are using
  o Cleaned up vpmu_do_wrmsr()
  o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
    amd_vpmu_initialise(): throughout the code registers are declared as
    uint64_t and if we are to add a new type (e.g. reg_t) this should be
    done in a separate patch, unrelated to this series.
  o Various more minor cleanups and code style fixes
  
Changes in v8:

* Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
* Added compat checks for vpmu structures
* Converted vpmu flag manipulation macros to inline routines
* Reimplemented vpmu_unload_all() to avoid long loops
* Reworked PMU fault generation and handling (new patch #12)
* Added checks for domain->vcpu[] non-NULLness
* Added more comments, renamed some routines and macros, code style cleanup


Changes in v7:

* When reading hypervisor symbols make the caller pass buffer length
  (as opposed to having this length be part of the API). Make the
  hypervisor buffer static, make xensyms_read() return zero-length
  string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
  drop compat_pf_symdata definition.
* Spread PVH support across patches as opposed to lumping it into a
  separate patch
* Rename vpmu_is_set_all() to vpmu_are_all_set()
* Split VPMU cleanup patch in two
* Use memmove when copying VMX guest and host MSRs
* Make padding of xen_arch_pmu's context union a constand that does not
  depend on arch context size.
* Set interface version to 0.1
* Check pointer validity in pvpmu_init/destroy()
* Fixed crash in core2_vpmu_dump()
* Fixed crash in vmx_add_msr()
* Break handling of Intel and AMD MSRs in traps.c into separate cases
* Pass full CS selector to guests
* Add lock in pvpmu init code to prevent potential race


Changes in v6:

* Two new patches:
  o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
  o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
* Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled
  guest from overwriting sampled_vcpu written by de-scheduled VPCU.
* Keep track of enabled counters on Intel. This was removed in earlier patches and
  was a mistake. As result of this change struct vpmu will have a pointer to private
  context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer
  on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests
  earlier).
  Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed
  agan (core2_vpmu_do_wrmsr() routine, mostly)
* Replaced references to dom0 with hardware_domain (and is_control_domain with
  is_hardware_domain for consistency)
* Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
* Reverted unnecessary changes in vpmu_initialise()'s switch statement
* Fixed comment in vpmu_do_interrupt


Changes in v5:

* Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()")
  as no longer needed
* Added patch number 2 that marks context as loaded before PMU registers are
  loaded. This prevents situation where a PMU interrupt may occur while context
  is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
  code)
* Renamed xenpmu.h files to pmu.h
* More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
  has_hvm_container_domain(). Also explicitly disabled support for PVH until
  patch 16 to make distinction between usage of the above macros more clear.
* Added support for disabling VPMU support during runtime.
* Disable VPMUs for non-privileged domains when switching to privileged
  profiling mode
* Added ARM stub for xen_arch_pmu_t
* Separated vpmu_mode from vpmu_features
* Moved CS register query to make sure we use appropriate query mechanism for
  various guest types.
* LVTPC is now set from value in shared area, not copied from dom0
* Various code and comments cleanup as suggested by Jan.

Changes in v4:

* Added support for PVH guests:
  o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10
  o more careful use of is_hvm_domain
  o Additional patch (16)
* Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
* Fixed dom0's VCPU selection in privileged mode
* Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
  (don't want to expose compat_cpu_user_regs in a public header)
* Renamed public structures by prefixing them with "xen_"
* Added an entry for xenpf_symdata in xlat.lst
* Fixed pv_cpuid check for vpmu-specific cpuid adjustments
* Varios code style fixes
* Eliminated anonymous unions
* Added more verbiage to NMI patch description


Changes in v3:

* Moved PMU MSR banks out from architectural context data structures to allow
for future expansion without protocol changes
* PMU interrupts can be either NMIs or regular vector interrupts (the latter
is the default)
* Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
race conditions with the guest
* Fixed races with PV guest in MSR access handlers
* More Intel VPMU cleanup
* Moved NMI-unsafe code from NMI handler
* Dropped changes to vcpu->is_running
* Added LVTPC apic handling (cached for PV guests)
* Separated privileged profiling mode into a standalone patch
* Separated NMI handling into a standalone patch


Changes in v2:

* Xen symbols are exported as data structure (as opoosed to a set of formatted
strings in v1). Even though one symbol per hypercall is returned performance
appears to be acceptable: reading whole file from dom0 userland takes on average
about twice as long as reading /proc/kallsyms
* More cleanup of Intel VPMU code to simplify publicly exported structures
* There is an architecture-independent and x86-specific public include files (ARM
has a stub)
* General cleanup of public include files to make them more presentable (and
to make auto doc generation better)
* Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
changes to common/schedule.c architecture-independent). Note that this is not
tested since I don't have access to ARM hardware.
* PCPU ID of interrupted processor is now passed to PV guest


The following patch series adds PMU support in Xen for PV(H)
guests. There is a companion patchset for Linux kernel. In addition,
another set of changes will be provided (later) for userland perf
code.

This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that have running dom0 VCPUs
on them.
* No backtrace support.

A few notes that may help reviewing:

* A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
CPU is used for passing registers' values as well as PMU state at the time of
PMU interrupt.
* PMU interrupts are taken by hypervisor either as NMIs or regular vector
interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
and as virtual interrupts to PV(H) guests
* PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
accesses xenpmu_data_t and flushes it to HW it before returning.
* PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
in addition to 'vpmu' boot option (which is preserved for back compatibility).
The following modes are provided:
  * disable: VPMU is off
  * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
  * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
    in guests is suspended.
* /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
/proc/kallsyms)
* VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
up from hvm subtree



Boris Ostrovsky (20):
  common/symbols: Export hypervisor symbols to privileged guest
  x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  x86/VPMU: Make vpmu macros a bit more efficient
  intel/VPMU: Clean up Intel VPMU code
  vmx: Merge MSR management routines
  x86/VPMU: Handle APIC_LVTPC accesses
  intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  x86/VPMU: Add public xenpmu.h
  x86/VPMU: Make vpmu not HVM-specific
  x86/VPMU: Interface for setting PMU mode and flags
  x86/VPMU: Initialize PMU for PV(H) guests
  x86/VPMU: Save VPMU state for PV guests during context switch
  x86/VPMU: When handling MSR accesses, leave fault injection to callers
  x86/VPMU: Add support for PMU register handling on PV guests
  x86/VPMU: Handle PMU interrupts for PV guests
  x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  x86/VPMU: Add privileged PMU mode
  x86/VPMU: NMI-based VPMU support
  x86/VPMU: Move VPMU files up from hvm/ directory

 docs/misc/xen-command-line.markdown                |   8 +-
 tools/flask/policy/policy/modules/xen/xen.te       |   7 +
 xen/arch/x86/Makefile                              |   1 +
 xen/arch/x86/domain.c                              |  23 +-
 xen/arch/x86/hvm/Makefile                          |   1 -
 xen/arch/x86/hvm/hvm.c                             |   3 +-
 xen/arch/x86/hvm/svm/Makefile                      |   1 -
 xen/arch/x86/hvm/svm/svm.c                         |  10 +-
 xen/arch/x86/hvm/vlapic.c                          |   3 +
 xen/arch/x86/hvm/vmx/Makefile                      |   1 -
 xen/arch/x86/hvm/vmx/vmcs.c                        |  84 +--
 xen/arch/x86/hvm/vmx/vmx.c                         |  28 +-
 xen/arch/x86/hvm/vpmu.c                            | 265 -------
 xen/arch/x86/oprofile/op_model_ppro.c              |   8 +-
 xen/arch/x86/platform_hypercall.c                  |  33 +
 xen/arch/x86/traps.c                               |  60 +-
 xen/arch/x86/vpmu.c                                | 826 +++++++++++++++++++++
 xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}        | 158 ++--
 .../x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c}     | 639 ++++++++--------
 xen/arch/x86/x86_64/compat/entry.S                 |   4 +
 xen/arch/x86/x86_64/entry.S                        |   4 +
 xen/common/event_channel.c                         |   1 +
 xen/common/symbols.c                               |  54 ++
 xen/include/Makefile                               |   2 +
 xen/include/asm-x86/domain.h                       |   2 +
 xen/include/asm-x86/hvm/vcpu.h                     |   3 -
 xen/include/asm-x86/hvm/vmx/vmcs.h                 |  18 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h           |  51 --
 xen/include/asm-x86/{hvm => }/vpmu.h               |  94 ++-
 xen/include/public/arch-arm.h                      |   3 +
 xen/include/public/arch-x86/pmu.h                  |  77 ++
 xen/include/public/arch-x86/xen-x86_32.h           |   8 +
 xen/include/public/arch-x86/xen-x86_64.h           |   8 +
 xen/include/public/platform.h                      |  19 +
 xen/include/public/pmu.h                           |  95 +++
 xen/include/public/xen.h                           |   2 +
 xen/include/xen/hypercall.h                        |   4 +
 xen/include/xen/softirq.h                          |   1 +
 xen/include/xen/symbols.h                          |   3 +
 xen/include/xlat.lst                               |   5 +
 xen/include/xsm/dummy.h                            |  20 +
 xen/include/xsm/xsm.h                              |   6 +
 xen/xsm/dummy.c                                    |   1 +
 xen/xsm/flask/hooks.c                              |  28 +
 xen/xsm/flask/policy/access_vectors                |  18 +-
 xen/xsm/flask/policy/security_classes              |   1 +
 46 files changed, 1872 insertions(+), 819 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 rename xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c} (74%)
 rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} (60%)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 rename xen/include/asm-x86/{hvm => }/vpmu.h (55%)
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 14:58   ` Konrad Rzeszutek Wilk
  2014-09-26 21:43   ` Daniel De Graaf
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
hypercall

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/platform_hypercall.c     | 33 +++++++++++++++++++++
 xen/common/symbols.c                  | 54 +++++++++++++++++++++++++++++++++++
 xen/include/public/platform.h         | 19 ++++++++++++
 xen/include/xen/symbols.h             |  3 ++
 xen/include/xlat.lst                  |  1 +
 xen/xsm/flask/hooks.c                 |  4 +++
 xen/xsm/flask/policy/access_vectors   | 14 +++++++--
 xen/xsm/flask/policy/security_classes |  1 +
 8 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 2162811..68bc6d9 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include <xen/cpu.h>
 #include <xen/pmstat.h>
 #include <xen/irq.h>
+#include <xen/symbols.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -601,6 +602,38 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_get_symbol:
+    {
+        static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
+        XEN_GUEST_HANDLE(char) nameh;
+        uint32_t namelen, copylen;
+
+        guest_from_compat_handle(nameh, op->u.symdata.name);
+
+        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
+                           &op->u.symdata.address, name);
+
+        namelen = strlen(name) + 1;
+
+        if ( namelen > op->u.symdata.namelen )
+        {
+            /* Caller's buffer is too small for the whole string */
+            if ( op->u.symdata.namelen )
+                name[op->u.symdata.namelen] = '\0';
+            copylen = op->u.symdata.namelen;
+        }
+        else
+            copylen = namelen;
+
+        op->u.symdata.namelen = namelen;
+
+        if ( !ret && copy_to_guest(nameh, name, copylen) )
+            ret = -EFAULT;
+        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
+            ret = -EFAULT;
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index bc2fde6..2c0942d 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <public/platform.h>
+#include <xen/guest_access.h>
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -148,3 +150,55 @@ const char *symbols_lookup(unsigned long addr,
     *offset = addr - symbols_address(low);
     return namebuf;
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+    /*
+     * Get just the first code, look it up in the token table,
+     * and return the first char from this token.
+     */
+    return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name)
+{
+    /*
+     * Symbols are most likely accessed sequentially so we remember position
+     * from previous read. This can help us avoid the extra call to
+     * get_symbol_offset().
+     */
+    static uint64_t next_symbol, next_offset;
+    static DEFINE_SPINLOCK(symbols_mutex);
+
+    if ( *symnum > symbols_num_syms )
+        return -ERANGE;
+    if ( *symnum == symbols_num_syms )
+    {
+        /* No more symbols */
+        name[0] = '\0';
+        return 0;
+    }
+
+    spin_lock(&symbols_mutex);
+
+    if ( *symnum == 0 )
+        next_offset = next_symbol = 0;
+    if ( next_symbol != *symnum )
+        /* Non-sequential access */
+        next_offset = get_symbol_offset(*symnum);
+
+    *type = symbols_get_symbol_type(next_offset);
+    next_offset = symbols_expand_symbol(next_offset, name);
+    *address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
+
+    next_symbol = ++*symnum;
+
+    spin_unlock(&symbols_mutex);
+
+    return 0;
+}
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 053b9fa..4f21b17 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -527,6 +527,24 @@ struct xenpf_core_parking {
 typedef struct xenpf_core_parking xenpf_core_parking_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 
+#define XENPF_get_symbol   61
+struct xenpf_symdata {
+    /* IN/OUT variables */
+    uint32_t namelen; /* IN:  size of name buffer                       */
+                      /* OUT: strlen(name) of hypervisor symbol (may be */
+                      /*      larger than what's been copied to guest)  */
+    uint32_t symnum;  /* IN:  Symbol to read                            */
+                      /* OUT: Next available symbol. If same as IN then */
+                      /*      we reached the end                        */
+
+    /* OUT variables */
+    char type;
+    XEN_GUEST_HANDLE(char) name;
+    uint64_t address;
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -553,6 +571,7 @@ struct xen_platform_op {
         struct xenpf_cpu_hotadd        cpu_add;
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
+        struct xenpf_symdata           symdata;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 87cd77d..1fa0537 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -11,4 +11,7 @@ const char *symbols_lookup(unsigned long addr,
                            unsigned long *offset,
                            char *namebuf);
 
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9a35dd7..c8fafef 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -86,6 +86,7 @@
 ?	processor_px			platform.h
 !	psd_package			platform.h
 ?	xenpf_enter_acpi_sleep		platform.h
+!	xenpf_symdata			platform.h
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
 !	sched_poll			sched.h
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index df05566..5afc1d7 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1391,6 +1391,10 @@ static int flask_platform_op(uint32_t op)
     case XENPF_get_cpuinfo:
         return domain_has_xen(current->domain, XEN__GETCPUINFO);
 
+    case XENPF_get_symbol:
+        return avc_has_perm(domain_sid(current->domain), SECINITSID_XEN,
+                            SECCLASS_XEN2, XEN2__GET_SYMBOL, NULL);
+
     default:
         printk("flask_platform_op: Unknown op %d\n", op);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index d279841..2ddbeba 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -3,9 +3,9 @@
 #
 # class class_name { permission_name ... }
 
-# Class xen consists of dom0-only operations dealing with the hypervisor itself.
-# Unless otherwise specified, the source is the domain executing the hypercall,
-# and the target is the xen initial sid (type xen_t).
+# Classes xen and xen2 consist of dom0-only operations dealing with the
+# hypervisor itself. Unless otherwise specified, the source is the domain
+# executing the hypercall, and the target is the xen initial sid (type xen_t).
 class xen
 {
 # XENPF_settime
@@ -75,6 +75,14 @@ class xen
     setscheduler
 }
 
+# This is a continuation of class xen, since only 32 permissions can be
+# defined per class
+class xen2
+{
+# XENPF_get_symbol
+    get_symbol
+}
+
 # Classes domain and domain2 consist of operations that a domain performs on
 # another domain or on itself.  Unless otherwise specified, the source is the
 # domain executing the hypercall, and the target is the domain being operated on
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ef134a7..ca191db 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -8,6 +8,7 @@
 # for userspace object managers
 
 class xen
+class xen2
 class domain
 class domain2
 class hvm
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 14:49   ` Konrad Rzeszutek Wilk
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

There is a possibility that we set VPMU_CONTEXT_SAVE on VPMU context in
vpmu_load() and never clear it (because vpmu_save_force() will see
VPMU_CONTEXT_LOADED bit clear, which is possible on AMD processors)

The problem is that amd_vpmu_save() assumes that if VPMU_CONTEXT_SAVE is set
then (1) we need to save counters and (2) we don't need to "stop" control
registers since they must have been stopped earlier. The latter may cause all
sorts of problem (like counters still running in a wrong guest and hypervisor
sending to that guest unexpected PMU interrupts).

Since setting this flag is currently always done prior to calling
vpmu_save_force() let's both set and clear it there.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vpmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 15d5b6f..451b346 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -130,6 +130,8 @@ static void vpmu_save_force(void *arg)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         return;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
     if ( vpmu->arch_vpmu_ops )
         (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
 
@@ -178,7 +180,6 @@ void vpmu_load(struct vcpu *v)
          */
         if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
             on_selected_cpus(cpumask_of(vpmu->last_pcpu),
                              vpmu_save_force, (void *)v, 1);
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
@@ -195,7 +196,6 @@ void vpmu_load(struct vcpu *v)
         vpmu = vcpu_vpmu(prev);
 
         /* Someone ran here before us */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
         vpmu_save_force(prev);
         vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 14:59   ` Konrad Rzeszutek Wilk
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

In preparation for making VPMU code shared with PV make sure that we we update
MSR bitmaps only for HVM/PVH guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 21 +++++++++++++--------
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  8 +++++---
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 8e07a98..c7e0946 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -244,7 +244,8 @@ static int amd_vpmu_save(struct vcpu *v)
 
     context_save(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -287,8 +288,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     ASSERT(!supported);
 
     /* For all counters, enable guest only mode for HVM guest */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        !(is_guest_mode(msr_content)) )
+    if ( has_hvm_container_domain(v->domain) &&
+         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+         !is_guest_mode(msr_content) )
     {
         set_guest_mode(msr_content);
     }
@@ -303,8 +305,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
-        if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_set_msr_bitmap(v);
+        if ( has_hvm_container_domain(v->domain) &&
+             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             amd_vpmu_set_msr_bitmap(v);
     }
 
     /* stop saving & restore if guest stops first counter */
@@ -314,8 +317,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
+        if ( has_hvm_container_domain(v->domain) &&
+             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
 
@@ -406,7 +410,8 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_domain(v->domain) &&
+         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 68b6272..c9f6ae4 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -335,7 +335,8 @@ static int core2_vpmu_save(struct vcpu *v)
     __core2_vpmu_save(v);
 
     /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
 
     return 1;
@@ -448,7 +449,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
     {
         __core2_vpmu_load(current);
         vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap )
+        if ( has_hvm_container_domain(current->domain) &&
+             cpu_has_vmx_msr_bitmap )
             core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
     }
     return 1;
@@ -822,7 +824,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
     xfree(core2_vpmu_cxt->pmu_enable);
     xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap )
+    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
     vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 04/20] x86/VPMU: Make vpmu macros a bit more efficient
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Introduce vpmu_are_all_set that allows testing multiple bits at once. Convert macros
into inlines for better compiler checking.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  5 +----
 xen/arch/x86/hvm/vpmu.c           |  3 +--
 xen/include/asm-x86/hvm/vpmu.h    | 25 +++++++++++++++++++++----
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index c9f6ae4..9d5d8eb 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -326,10 +326,7 @@ static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        return 0;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) 
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
     __core2_vpmu_save(v);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 451b346..7929290 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -145,8 +145,7 @@ void vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int pcpu = smp_processor_id();
 
-    if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) &&
-           vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) )
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
        return;
 
     vpmu->last_pcpu = pcpu;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 9a5ac01..40a6e57 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -82,10 +82,27 @@ struct vpmu_struct {
 #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
 
 
-#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
-#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
+static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
+{
+    vpmu->flags |= mask;
+}
+static inline void vpmu_reset(struct vpmu_struct *vpmu, const u32 mask)
+{
+    vpmu->flags &= ~mask;
+}
+static inline void vpmu_clear(struct vpmu_struct *vpmu)
+{
+    vpmu->flags = 0;
+}
+static inline bool_t vpmu_is_set(const struct vpmu_struct *vpmu, const u32 mask)
+{
+    return !!(vpmu->flags & mask);
+}
+static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
+                                      const u32 mask)
+{
+    return !!((vpmu->flags & mask) == mask);
+}
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 05/20] intel/VPMU: Clean up Intel VPMU code
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 06/20] vmx: Merge MSR management routines Boris Ostrovsky
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures with
fields in core2_vpmu_context.

Call core2_get_pmc_count() once, during initialization.

Properly clean up when core2_vpmu_alloc_resource() fails.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 381 ++++++++++++++-----------------
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
 2 files changed, 172 insertions(+), 228 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 9d5d8eb..dd0d5e9 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -69,6 +69,27 @@
 static bool_t __read_mostly full_width_write;
 
 /*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+#define VPMU_CORE2_MAX_FIXED_PMCS     4
+struct core2_vpmu_context {
+    u64 fixed_ctrl;
+    u64 ds_area;
+    u64 pebs_enable;
+    u64 global_ovf_status;
+    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
+    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
+    struct arch_msr_pair arch_msr_pair[1];
+};
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
  * QUIRK to workaround an issue on various family 6 cpus.
  * The issue leads to endless PMC interrupt loops on the processor.
  * If the interrupt handler is running and a pmc reaches the value 0, this
@@ -88,11 +109,8 @@ static void check_pmc_quirk(void)
         is_pmc_quirk = 0;    
 }
 
-static int core2_get_pmc_count(void);
 static void handle_pmc_quirk(u64 msr_content)
 {
-    int num_gen_pmc = core2_get_pmc_count();
-    int num_fix_pmc  = 3;
     int i;
     u64 val;
 
@@ -100,7 +118,7 @@ static void handle_pmc_quirk(u64 msr_content)
         return;
 
     val = msr_content;
-    for ( i = 0; i < num_gen_pmc; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -112,7 +130,7 @@ static void handle_pmc_quirk(u64 msr_content)
         val >>= 1;
     }
     val = msr_content >> 32;
-    for ( i = 0; i < num_fix_pmc; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -125,128 +143,91 @@ static void handle_pmc_quirk(u64 msr_content)
     }
 }
 
-static const u32 core2_fix_counters_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR0,
-    MSR_CORE_PERF_FIXED_CTR1,
-    MSR_CORE_PERF_FIXED_CTR2
-};
-
 /*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
  */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */
-#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
-
-/* Core 2 Non-architectual Performance Control MSRs. */
-static const u32 core2_ctrls_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR_CTRL,
-    MSR_IA32_PEBS_ENABLE,
-    MSR_IA32_DS_AREA
-};
-
-struct pmumsr {
-    unsigned int num;
-    const u32 *msr;
-};
-
-static const struct pmumsr core2_fix_counters = {
-    VPMU_CORE2_NUM_FIXED,
-    core2_fix_counters_msr
-};
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
 
-static const struct pmumsr core2_ctrls = {
-    VPMU_CORE2_NUM_CTRLS,
-    core2_ctrls_msr
-};
-static int arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return MASK_EXTR(eax, PMU_GENERAL_NR_MASK);
+}
 
 /*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
  */
-static int core2_get_pmc_count(void)
+static int core2_get_fixed_pmc_count(void)
 {
-    u32 eax, ebx, ecx, edx;
+    u32 eax;
 
-    if ( arch_pmc_cnt == 0 )
-    {
-        cpuid(0xa, &eax, &ebx, &ecx, &edx);
-        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
-    }
-
-    return arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
 }
 
 static u64 core2_calc_intial_glb_ctrl_msr(void)
 {
-    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
-    u64 fix_pmc_bits  = (1 << 3) - 1;
-    return ((fix_pmc_bits << 32) | arch_pmc_bits);
+    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
+    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
+
+    return (fix_pmc_bits << 32) | arch_pmc_bits;
 }
 
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
-    u32 eax, ebx, ecx, edx;
+    u32 edx;
 
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
+    edx = cpuid_edx(0xa);
+    return MASK_EXTR(edx, PMU_FIXED_WIDTH_MASK);
 }
 
 static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
 {
-    int i;
     u32 msr_index_pmc;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    switch ( msr_index )
     {
-        if ( core2_fix_counters.msr[i] == msr_index )
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+    case MSR_IA32_DS_AREA:
+    case MSR_IA32_PEBS_ENABLE:
+        *type = MSR_TYPE_CTRL;
+        return 1;
+
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        *type = MSR_TYPE_GLOBAL;
+        return 1;
+
+    default:
+
+        if ( (msr_index >= MSR_CORE_PERF_FIXED_CTR0) &&
+             (msr_index < MSR_CORE_PERF_FIXED_CTR0 + fixed_pmc_cnt) )
         {
+            *index = msr_index - MSR_CORE_PERF_FIXED_CTR0;
             *type = MSR_TYPE_COUNTER;
-            *index = i;
             return 1;
         }
-    }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-    {
-        if ( core2_ctrls.msr[i] == msr_index )
+        if ( (msr_index >= MSR_P6_EVNTSEL(0)) &&
+             (msr_index < MSR_P6_EVNTSEL(arch_pmc_cnt)) )
         {
-            *type = MSR_TYPE_CTRL;
-            *index = i;
+            *index = msr_index - MSR_P6_EVNTSEL(0);
+            *type = MSR_TYPE_ARCH_CTRL;
             return 1;
         }
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
-    {
-        *type = MSR_TYPE_GLOBAL;
-        return 1;
-    }
-
-    msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
-    if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-         (msr_index_pmc < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) )
-    {
-        *type = MSR_TYPE_ARCH_COUNTER;
-        *index = msr_index_pmc - MSR_IA32_PERFCTR0;
-        return 1;
-    }
 
-    if ( (msr_index >= MSR_P6_EVNTSEL(0)) &&
-         (msr_index < (MSR_P6_EVNTSEL(core2_get_pmc_count()))) )
-    {
-        *type = MSR_TYPE_ARCH_CTRL;
-        *index = msr_index - MSR_P6_EVNTSEL(0);
-        return 1;
+        msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
+        if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
+             (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
+        {
+            *type = MSR_TYPE_ARCH_COUNTER;
+            *index = msr_index_pmc - MSR_IA32_PERFCTR0;
+            return 1;
+        }
+        return 0;
     }
-
-    return 0;
 }
 
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
@@ -254,13 +235,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     int i;
 
     /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                   msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
@@ -275,26 +256,28 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     }
 
     /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
 
         if ( full_width_write )
@@ -305,10 +288,12 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
         set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL(i)), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static inline void __core2_vpmu_save(struct vcpu *v)
@@ -316,10 +301,10 @@ static inline void __core2_vpmu_save(struct vcpu *v)
     int i;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        rdmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -344,20 +329,22 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     unsigned int i, pmc_start;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        wrmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
     else
         pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
         wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
-
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
         wrmsrl(MSR_P6_EVNTSEL(i), core2_vpmu_cxt->arch_msr_pair[i].control);
+    }
+
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -376,56 +363,37 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct core2_vpmu_context *core2_vpmu_cxt;
-    struct core2_pmu_enable *pmu_enable;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
     if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
 
-    pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) +
-                               core2_get_pmc_count() - 1);
-    if ( !pmu_enable )
-        goto out1;
-
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (core2_get_pmc_count()-1)*sizeof(struct arch_msr_pair));
+                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
     if ( !core2_vpmu_cxt )
-        goto out2;
-    core2_vpmu_cxt->pmu_enable = pmu_enable;
+        goto out_err;
+
     vpmu->context = (void *)core2_vpmu_cxt;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
     return 1;
- out2:
-    xfree(pmu_enable);
- out1:
-    gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU feature is "
-             "unavailable on domain %d vcpu %d.\n",
-             v->vcpu_id, v->domain->domain_id);
-    return 0;
-}
 
-static void core2_vpmu_save_msr_context(struct vcpu *v, int type,
-                                       int index, u64 msr_data)
-{
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+out_err:
+    release_pmu_ownship(PMU_OWNER_HVM);
 
-    switch ( type )
-    {
-    case MSR_TYPE_CTRL:
-        core2_vpmu_cxt->ctrls[index] = msr_data;
-        break;
-    case MSR_TYPE_ARCH_CTRL:
-        core2_vpmu_cxt->arch_msr_pair[index].control = msr_data;
-        break;
-    }
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
 }
 
 static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
@@ -436,10 +404,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
         return 0;
 
     if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-	 (vpmu->context != NULL ||
-	  !core2_vpmu_alloc_resource(current)) )
+         !core2_vpmu_alloc_resource(current) )
         return 0;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
     /* Do the lazy load staff. */
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
@@ -456,8 +422,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                                uint64_t supported)
 {
-    u64 global_ctrl, non_global_ctrl;
-    char pmu_enable = 0;
+    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -504,6 +469,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
         return 1;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
@@ -516,57 +482,48 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                 hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
-            core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0;
+            core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 1;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
-        for ( i = 0; i < core2_get_pmc_count(); i++ )
-        {
-            rdmsrl(MSR_P6_EVNTSEL(i), non_global_ctrl);
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] =
-                    global_ctrl & (non_global_ctrl >> 22) & 1;
-            global_ctrl >>= 1;
-        }
-
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
-        global_ctrl = msr_content >> 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
-        {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
-            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
-            global_ctrl >>= 1;
-        }
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        non_global_ctrl = msr_content;
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        global_ctrl >>= 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        core2_vpmu_cxt->enabled_cntrs &=
+                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        if ( msr_content != 0 )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
-            non_global_ctrl >>= 4;
-            global_ctrl >>= 1;
+            u64 val = msr_content;
+            for ( i = 0; i < fixed_pmc_cnt; i++ )
+            {
+                if ( val & 3 )
+                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                val >>= FIXED_CTR_CTRL_BITS;
+            }
         }
+
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
         break;
     default:
         tmp = msr - MSR_P6_EVNTSEL(0);
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        if ( tmp >= 0 && tmp < core2_get_pmc_count() )
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] =
-                (global_ctrl >> tmp) & (msr_content >> 22) & 1;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+
+            if ( msr_content & (1ULL << 22) )
+                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+            else
+                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+
+            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+        }
     }
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i];
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i];
-    pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable;
-    if ( pmu_enable )
+    if ( (global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0)  )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -584,7 +541,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
     }
 
-    core2_vpmu_save_msr_context(v, type, index, msr_content);
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -600,7 +556,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             if  ( msr == MSR_IA32_DS_AREA )
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (VPMU_CORE2_NUM_FIXED * FIXED_CTR_CTRL_BITS)) - 1);
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
             if (msr_content & mask)
                 inject_gp = 1;
             break;
@@ -685,7 +641,7 @@ static void core2_vpmu_do_cpuid(unsigned int input,
 static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i, num;
+    unsigned int i;
     const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
     u64 val;
 
@@ -703,27 +659,25 @@ static void core2_vpmu_dump(const struct vcpu *v)
 
     printk("    vPMU running\n");
     core2_vpmu_cxt = vpmu->context;
-    num = core2_get_pmc_count();
+
     /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < num; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
 
-        if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] )
-            printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-                   i, msr_pair[i].counter, msr_pair[i].control);
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+               i, msr_pair[i].counter, msr_pair[i].control);
     }
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
      */
-    val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] )
-            printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-                   i, core2_vpmu_cxt->fix_counters[i],
-                   val & FIXED_CTR_CTRL_MASK);
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, core2_vpmu_cxt->fix_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
 }
@@ -741,7 +695,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
         core2_vpmu_cxt->global_ovf_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
     else
@@ -808,6 +762,16 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     }
     ds_warned = 1;
  func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
+    {
+        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
+        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
+               fixed_pmc_cnt);
+    }
+
     check_pmc_quirk();
     return 0;
 }
@@ -815,11 +779,10 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
 static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
 
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
-    xfree(core2_vpmu_cxt->pmu_enable);
+
     xfree(vpmu->context);
     if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
index 60b05fd..410372d 100644
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
@@ -23,29 +23,10 @@
 #ifndef __ASM_X86_HVM_VPMU_CORE_H_
 #define __ASM_X86_HVM_VPMU_CORE_H_
 
-/* Currently only 3 fixed counters are supported. */
-#define VPMU_CORE2_NUM_FIXED 3
-/* Currently only 3 Non-architectual Performance Control MSRs */
-#define VPMU_CORE2_NUM_CTRLS 3
-
 struct arch_msr_pair {
     u64 counter;
     u64 control;
 };
 
-struct core2_pmu_enable {
-    char ds_area_enable;
-    char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
-    char arch_pmc_enable[1];
-};
-
-struct core2_vpmu_context {
-    struct core2_pmu_enable *pmu_enable;
-    u64 fix_counters[VPMU_CORE2_NUM_FIXED];
-    u64 ctrls[VPMU_CORE2_NUM_CTRLS];
-    u64 global_ovf_status;
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 06/20] vmx: Merge MSR management routines
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 20:48   ` Tian, Kevin
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

vmx_add_host_load_msr() and vmx_add_guest_msr() share fair amount of code. Merge
them to simplify code maintenance.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 84 +++++++++++++++++++-------------------
 xen/include/asm-x86/hvm/vmx/vmcs.h | 16 +++++++-
 2 files changed, 55 insertions(+), 45 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index fc1f882..6649837 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1201,64 +1201,62 @@ int vmx_write_guest_msr(u32 msr, u64 val)
     return -ESRCH;
 }
 
-int vmx_add_guest_msr(u32 msr)
+int vmx_add_msr(u32 msr, int type)
 {
     struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+    unsigned int idx, *msr_count;
+    struct vmx_msr_entry **msr_area, *msr_area_elem;
+
+    if ( type == VMX_GUEST_MSR )
+    {
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+    }
+    else
+    {
+        ASSERT(type == VMX_HOST_MSR);
+        msr_count = &curr->arch.hvm_vmx.host_msr_count;
+        msr_area = &curr->arch.hvm_vmx.host_msr_area;
+    }
 
-    if ( msr_area == NULL )
+    if ( *msr_area == NULL )
     {
-        if ( (msr_area = alloc_xenheap_page()) == NULL )
+        if ( (*msr_area = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
-        curr->arch.hvm_vmx.msr_area = msr_area;
-        __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(msr_area));
-        __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(msr_area));
+
+        if ( type == VMX_GUEST_MSR )
+        {
+            __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+        }
+        else
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
     }
 
-    for ( i = 0; i < msr_count; i++ )
-        if ( msr_area[i].index == msr )
+    for ( idx = 0; idx < *msr_count; idx++ )
+        if ( (*msr_area)[idx].index == msr )
             return 0;
 
-    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
+    if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
         return -ENOSPC;
 
-    msr_area[msr_count].index = msr;
-    msr_area[msr_count].mbz   = 0;
-    msr_area[msr_count].data  = 0;
-    curr->arch.hvm_vmx.msr_count = ++msr_count;
-    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
-    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
+    msr_area_elem = *msr_area + *msr_count;
+    msr_area_elem->index = msr;
+    msr_area_elem->mbz = 0;
 
-    return 0;
-}
+    ++*msr_count;
 
-int vmx_add_host_load_msr(u32 msr)
-{
-    struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.host_msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
-
-    if ( msr_area == NULL )
+    if ( type == VMX_GUEST_MSR )
     {
-        if ( (msr_area = alloc_xenheap_page()) == NULL )
-            return -ENOMEM;
-        curr->arch.hvm_vmx.host_msr_area = msr_area;
-        __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(msr_area));
+        msr_area_elem->data = 0;
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+    }
+    else
+    {
+        rdmsrl(msr, msr_area_elem->data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
     }
-
-    for ( i = 0; i < msr_count; i++ )
-        if ( msr_area[i].index == msr )
-            return 0;
-
-    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
-        return -ENOSPC;
-
-    msr_area[msr_count].index = msr;
-    msr_area[msr_count].mbz   = 0;
-    rdmsrl(msr, msr_area[msr_count].data);
-    curr->arch.hvm_vmx.host_msr_count = ++msr_count;
-    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
 
     return 0;
 }
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6a99dca..949884b 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -482,12 +482,15 @@ extern const unsigned int vmx_introspection_force_enabled_msrs_size;
 
 #define MSR_TYPE_R 1
 #define MSR_TYPE_W 2
+
+#define VMX_GUEST_MSR 0
+#define VMX_HOST_MSR  1
+
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
-int vmx_add_guest_msr(u32 msr);
-int vmx_add_host_load_msr(u32 msr);
+int vmx_add_msr(u32 msr, int type);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
@@ -497,6 +500,15 @@ void virtual_vmcs_exit(void *vvmcs);
 u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding);
 void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val);
 
+static inline int vmx_add_guest_msr(u32 msr)
+{
+    return vmx_add_msr(msr, VMX_GUEST_MSR);
+}
+static inline int vmx_add_host_load_msr(u32 msr)
+{
+    return vmx_add_msr(msr, VMX_HOST_MSR);
+}
+
 DECLARE_PER_CPU(bool_t, vmxon);
 
 #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 07/20] x86/VPMU: Handle APIC_LVTPC accesses
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 06/20] vmx: Merge MSR management routines Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Update APIC_LVTPC vector when HVM guest writes to it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |  4 ----
 xen/arch/x86/hvm/vlapic.c         |  3 +++
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 17 -----------------
 xen/arch/x86/hvm/vpmu.c           |  8 ++++++++
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 5 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index c7e0946..11e9484 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -302,8 +302,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
         if ( has_hvm_container_domain(v->domain) &&
              !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
@@ -314,8 +312,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
         if ( has_hvm_container_domain(v->domain) &&
              ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 47c4eaa..f8cdc9b 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,6 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
@@ -762,6 +763,8 @@ static int vlapic_reg_write(struct vcpu *v,
         }
         if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
             pt_may_unmask_irq(NULL, &vlapic->pt);
+        if ( offset == APIC_LVTPC )
+            vpmu_lvtpc_update(val);
         break;
 
     case APIC_TMICT:
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index dd0d5e9..a3a2905 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -528,19 +528,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    /* Setup LVTPC in local apic */
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
-    }
-    else
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
-    }
-
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -706,10 +693,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
             return 0;
     }
 
-    /* HW sets the MASK bit when performance counter interrupt occurs*/
-    vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED;
-    apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-
     return 1;
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 7929290..0210284 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -64,6 +64,14 @@ static void __init parse_vpmu_param(char *s)
     }
 }
 
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 40a6e57..761c556 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -104,6 +104,7 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
     return !!((vpmu->flags & mask) == mask);
 }
 
+void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

MSR_CORE_PERF_GLOBAL_CTRL register should be set zero initially. It is up to
the guest to set it so that counters are enabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index a3a2905..79a82a3 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -165,14 +165,6 @@ static int core2_get_fixed_pmc_count(void)
     return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
 }
 
-static u64 core2_calc_intial_glb_ctrl_msr(void)
-{
-    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
-    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
-
-    return (fix_pmc_bits << 32) | arch_pmc_bits;
-}
-
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
@@ -373,8 +365,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
         goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                 core2_calc_intial_glb_ctrl_msr());
+    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
                     (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 20:49   ` Tian, Kevin
  2014-09-29 14:17   ` Jan Beulich
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c              |  84 ++++++++++++----------
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 118 +++++++++++++++++--------------
 xen/arch/x86/hvm/vpmu.c                  |   6 ++
 xen/arch/x86/oprofile/op_model_ppro.c    |   6 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 ---------
 xen/include/asm-x86/hvm/vpmu.h           |  16 ++---
 xen/include/public/arch-arm.h            |   3 +
 xen/include/public/arch-x86/pmu.h        |  77 ++++++++++++++++++++
 xen/include/public/arch-x86/xen-x86_32.h |   8 +++
 xen/include/public/arch-x86/xen-x86_64.h |   8 +++
 xen/include/public/pmu.h                 |  38 ++++++++++
 11 files changed, 263 insertions(+), 133 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 11e9484..124b147 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vpmu.h>
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include <public/pmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,10 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+#define AMD_MAX_COUNTERS    6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
     MSR_K7_PERFCTR0,
@@ -83,12 +84,14 @@ static const u32 AMD_F15H_CTRLS[] = {
     MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-    u64 counters[MAX_NUM_COUNTERS];
-    u64 ctrls[MAX_NUM_COUNTERS];
-    bool_t msr_bitmap_set;
-};
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)    do {                                    \
+                                   (vpmu)->priv_context = (void *)-1L; \
+                               } while (0)
+#define msr_bitmap_off(vpmu)   do {                                    \
+                                   (vpmu)->priv_context = NULL;        \
+                               } while (0)
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
 
 static inline int get_pmu_reg_type(u32 addr)
 {
@@ -142,7 +145,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -150,14 +152,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
     }
 
-    ctxt->msr_bitmap_set = 1;
+    msr_bitmap_on(vpmu);
 }
 
 static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -165,7 +166,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
     }
 
-    ctxt->msr_bitmap_set = 0;
+    msr_bitmap_off(vpmu);
 }
 
 static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
@@ -177,19 +178,22 @@ static inline void context_load(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     for ( i = 0; i < num_counters; i++ )
     {
-        wrmsrl(counters[i], ctxt->counters[i]);
-        wrmsrl(ctrls[i], ctxt->ctrls[i]);
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
     }
 }
 
 static void amd_vpmu_load(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     vpmu_reset(vpmu, VPMU_FROZEN);
 
@@ -198,7 +202,7 @@ static void amd_vpmu_load(struct vcpu *v)
         unsigned int i;
 
         for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctxt->ctrls[i]);
+            wrmsrl(ctrls[i], ctrl_regs[i]);
 
         return;
     }
@@ -212,17 +216,17 @@ static inline void context_save(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
 
     /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
     for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], ctxt->counters[i]);
+        rdmsrl(counters[i], counter_regs[i]);
 }
 
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctx = vpmu->context;
     unsigned int i;
 
     /*
@@ -245,7 +249,7 @@ static int amd_vpmu_save(struct vcpu *v)
     context_save(v);
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
+         has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -256,7 +260,9 @@ static void context_update(unsigned int msr, u64 msr_content)
     unsigned int i;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     if ( k7_counters_mirrored &&
         ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
@@ -268,12 +274,12 @@ static void context_update(unsigned int msr, u64 msr_content)
     {
        if ( msr == ctrls[i] )
        {
-           ctxt->ctrls[i] = msr_content;
+           ctrl_regs[i] = msr_content;
            return;
        }
         else if (msr == counters[i] )
         {
-            ctxt->counters[i] = msr_content;
+            counter_regs[i] = msr_content;
             return;
         }
     }
@@ -303,8 +309,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
 
-        if ( has_hvm_container_domain(v->domain) &&
-             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -313,8 +318,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( has_hvm_container_domain(v->domain) &&
-             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -355,7 +359,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 static int amd_vpmu_initialise(struct vcpu *v)
 {
-    struct amd_vpmu_context *ctxt;
+    struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
 
@@ -385,7 +389,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc(struct amd_vpmu_context);
+    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
+                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
     if ( !ctxt )
     {
         gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
@@ -394,7 +399,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
         return -ENOMEM;
     }
 
+    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
+
     vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
     return 0;
 }
@@ -406,8 +415,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( has_hvm_container_domain(v->domain) &&
-         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
@@ -424,7 +432,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
 static void amd_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct amd_vpmu_context *ctxt = vpmu->context;
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    const uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    const uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
     unsigned int i;
 
     printk("    VPMU state: 0x%x ", vpmu->flags);
@@ -454,8 +464,8 @@ static void amd_vpmu_dump(const struct vcpu *v)
         rdmsrl(ctrls[i], ctrl);
         rdmsrl(counters[i], cntr);
         printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctxt->ctrls[i], ctrl,
-               counters[i], ctxt->counters[i], cntr);
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
     }
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 79a82a3..beff5c3 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -35,8 +35,8 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <public/sched.h>
 #include <public/hvm/save.h>
+#include <public/pmu.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
@@ -68,6 +68,10 @@
 #define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
 static bool_t __read_mostly full_width_write;
 
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
 /*
  * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
  * counters. 4 bits for every counter.
@@ -75,17 +79,6 @@ static bool_t __read_mostly full_width_write;
 #define FIXED_CTR_CTRL_BITS 4
 #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
 
-#define VPMU_CORE2_MAX_FIXED_PMCS     4
-struct core2_vpmu_context {
-    u64 fixed_ctrl;
-    u64 ds_area;
-    u64 pebs_enable;
-    u64 global_ovf_status;
-    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
-    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 /* Number of general-purpose and fixed performance counters */
 static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
 
@@ -222,6 +215,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     }
 }
 
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
@@ -291,12 +285,15 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 static inline void __core2_vpmu_save(struct vcpu *v)
 {
     int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -319,10 +316,13 @@ static int core2_vpmu_save(struct vcpu *v)
 static inline void __core2_vpmu_load(struct vcpu *v)
 {
     unsigned int i, pmc_start;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
@@ -330,8 +330,8 @@ static inline void __core2_vpmu_load(struct vcpu *v)
         pmc_start = MSR_IA32_PERFCTR0;
     for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL(i), core2_vpmu_cxt->arch_msr_pair[i].control);
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
     }
 
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
@@ -354,7 +354,8 @@ static void core2_vpmu_load(struct vcpu *v)
 static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *p = NULL;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
@@ -367,12 +368,20 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-    if ( !core2_vpmu_cxt )
+    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                   sizeof(uint64_t) * fixed_pmc_cnt +
+                                   sizeof(struct xen_pmu_cntr_pair) *
+                                   arch_pmc_cnt);
+    p = xzalloc(uint64_t);
+    if ( !core2_vpmu_cxt || !p )
         goto out_err;
 
-    vpmu->context = (void *)core2_vpmu_cxt;
+    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+                                    sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = core2_vpmu_cxt;
+    vpmu->priv_context = p;
 
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
@@ -381,6 +390,9 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 out_err:
     release_pmu_ownship(PMU_OWNER_HVM);
 
+    xfree(core2_vpmu_cxt);
+    xfree(p);
+
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
            v->vcpu_id, v->domain->domain_id);
 
@@ -418,7 +430,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *enabled_cntrs;
 
     if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -446,10 +459,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     ASSERT(!supported);
 
     core2_vpmu_cxt = vpmu->context;
+    enabled_cntrs = vpmu->priv_context;
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_status &= ~msr_content;
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -483,15 +497,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        core2_vpmu_cxt->enabled_cntrs &=
-                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
             u64 val = msr_content;
             for ( i = 0; i < fixed_pmc_cnt; i++ )
             {
                 if ( val & 3 )
-                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                    *enabled_cntrs |= (1ULL << 32) << i;
                 val >>= FIXED_CTR_CTRL_BITS;
             }
         }
@@ -502,19 +515,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         tmp = msr - MSR_P6_EVNTSEL(0);
         if ( tmp >= 0 && tmp < arch_pmc_cnt )
         {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
-                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+                *enabled_cntrs |= 1ULL << tmp;
             else
-                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+                *enabled_cntrs &= ~(1ULL << tmp);
 
-            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            xen_pmu_cntr_pair[tmp].control = msr_content;
         }
     }
 
-    if ( (global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
-         (core2_vpmu_cxt->ds_area != 0)  )
+    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -560,7 +575,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
 
     if ( core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -571,7 +586,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = 0;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_ovf_status;
+            *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
@@ -620,10 +635,12 @@ static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
     unsigned int i;
-    const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
     u64 val;
+    uint64_t *fixed_counters;
+    struct xen_pmu_cntr_pair *cntr_pair;
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
          return;
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
@@ -636,16 +653,15 @@ static void core2_vpmu_dump(const struct vcpu *v)
     }
 
     printk("    vPMU running\n");
-    core2_vpmu_cxt = vpmu->context;
+
+    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
 
     /* Print the contents of the counter and its configuration msr. */
     for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
-
         printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, msr_pair[i].counter, msr_pair[i].control);
-    }
+            i, cntr_pair[i].counter, cntr_pair[i].control);
+
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
@@ -654,7 +670,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, core2_vpmu_cxt->fix_counters[i],
+               i, fixed_counters[i],
                val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
@@ -665,14 +681,14 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     u64 msr_content;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
 
     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
     if ( msr_content )
     {
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_ovf_status |= msr_content;
+        core2_vpmu_cxt->global_status |= msr_content;
         msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
@@ -739,13 +755,6 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
-    {
-        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
-        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
-               fixed_pmc_cnt);
-    }
-
     check_pmc_quirk();
     return 0;
 }
@@ -758,6 +767,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
 
     xfree(vpmu->context);
+    xfree(vpmu->priv_context);
     if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 0210284..071b869 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <public/pmu.h>
 
 /*
  * "vpmu" :     vpmu generally enabled
@@ -228,6 +229,11 @@ void vpmu_initialise(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t vendor = current_cpu_data.x86_vendor;
 
+    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
+
     if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         vpmu_destroy(v);
     vpmu_clear(vpmu);
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index aa99e4d..ca429a1 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -20,11 +20,15 @@
 #include <asm/regs.h>
 #include <asm/current.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
 
+struct arch_msr_pair {
+    u64 counter;
+    u64 control;
+};
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
deleted file mode 100644
index 410372d..0000000
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-
-/*
- * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_CORE_H_
-#define __ASM_X86_HVM_VPMU_CORE_H_
-
-struct arch_msr_pair {
-    u64 counter;
-    u64 control;
-};
-
-#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
-
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 761c556..f0b2686 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -22,6 +22,8 @@
 #ifndef __ASM_X86_HVM_VPMU_H_
 #define __ASM_X86_HVM_VPMU_H_
 
+#include <public/pmu.h>
+
 /*
  * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
  * See arch/x86/hvm/vpmu.c.
@@ -29,12 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.hvm_vcpu.vpmu))
-#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
@@ -42,6 +41,9 @@
 #define MSR_TYPE_ARCH_COUNTER       3
 #define MSR_TYPE_ARCH_CTRL          4
 
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
 
 /* Arch specific operations shared by all vpmus */
 struct arch_vpmu_ops {
@@ -65,7 +67,8 @@ struct vpmu_struct {
     u32 flags;
     u32 last_pcpu;
     u32 hw_lapic_lvtpc;
-    void *context;
+    void *context;      /* May be shared with PV guest */
+    void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
 };
 
@@ -77,11 +80,6 @@ struct vpmu_struct {
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
 
-/* VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-
 static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
 {
     vpmu->flags |= mask;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index ac54cd6..9de6d66 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -407,6 +407,9 @@ typedef uint64_t xen_callback_t;
 
 #endif
 
+/* Stub definition of PMU structure */
+typedef struct xen_pmu_arch {} xen_pmu_arch_t;
+
 #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
 
 /*
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
new file mode 100644
index 0000000..c0cfc6c
--- /dev/null
+++ b/xen/include/public/arch-x86/pmu.h
@@ -0,0 +1,77 @@
+#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
+#define __XEN_PUBLIC_ARCH_X86_PMU_H__
+
+/* x86-specific PMU definitions */
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+    /* Offsets to counter and control MSRs (relative to xen_pmu_arch.c.amd) */
+    uint32_t counters;
+    uint32_t ctrls;
+};
+typedef struct xen_pmu_amd_ctxt xen_pmu_amd_ctxt_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_amd_ctxt_t);
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+    uint64_t counter;
+    uint64_t control;
+};
+typedef struct xen_pmu_cntr_pair xen_pmu_cntr_pair_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_cntr_pair_t);
+
+struct xen_pmu_intel_ctxt {
+    uint64_t global_ctrl;
+    uint64_t global_ovf_ctrl;
+    uint64_t global_status;
+    uint64_t fixed_ctrl;
+    uint64_t ds_area;
+    uint64_t pebs_enable;
+    uint64_t debugctl;
+    /*
+     * Offsets to fixed and architectural counter MSRs (relative to
+     * xen_pmu_arch.c.intel)
+     */
+    uint32_t fixed_counters;
+    uint32_t arch_counters;
+};
+typedef struct xen_pmu_intel_ctxt xen_pmu_intel_ctxt_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_intel_ctxt_t);
+
+struct xen_pmu_arch {
+    union {
+        struct xen_pmu_regs regs;
+        /* Padding for adding new registers to xen_pmu_regs in the future */
+#define XENPMU_REGS_PAD_SZ  64
+        uint8_t pad[XENPMU_REGS_PAD_SZ];
+    } r;
+    union {
+        uint32_t lapic_lvtpc;
+        uint64_t pad;
+    } l;
+    union {
+        struct xen_pmu_amd_ctxt amd;
+        struct xen_pmu_intel_ctxt intel;
+
+        /*
+         * Padding for contexts (fixed parts only, does not include MSR banks
+         * that are specified by offsets
+         */
+#define XENPMU_CTXT_PAD_SZ  128
+        uint8_t pad[XENPMU_CTXT_PAD_SZ];
+    } c;
+};
+typedef struct xen_pmu_arch xen_pmu_arch_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/public/arch-x86/xen-x86_32.h b/xen/include/public/arch-x86/xen-x86_32.h
index 1504191..5b437cf 100644
--- a/xen/include/public/arch-x86/xen-x86_32.h
+++ b/xen/include/public/arch-x86/xen-x86_32.h
@@ -136,6 +136,14 @@ struct cpu_user_regs {
 typedef struct cpu_user_regs cpu_user_regs_t;
 DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
 
+struct xen_pmu_regs {
+    uint32_t eip;
+    uint32_t esp;
+    uint16_t cs;
+};
+typedef struct xen_pmu_regs xen_pmu_regs_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
+
 /*
  * Page-directory addresses above 4GB do not fit into architectural %cr3.
  * When accessing %cr3, or equivalent field in vcpu_guest_context, guests
diff --git a/xen/include/public/arch-x86/xen-x86_64.h b/xen/include/public/arch-x86/xen-x86_64.h
index 1c4e159..86b6844 100644
--- a/xen/include/public/arch-x86/xen-x86_64.h
+++ b/xen/include/public/arch-x86/xen-x86_64.h
@@ -174,6 +174,14 @@ struct cpu_user_regs {
 typedef struct cpu_user_regs cpu_user_regs_t;
 DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
 
+struct xen_pmu_regs {
+    __DECL_REG(ip);
+    __DECL_REG(sp);
+    uint16_t cs;
+};
+typedef struct xen_pmu_regs xen_pmu_regs_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
+
 #undef __DECL_REG
 
 #define xen_pfn_to_cr3(pfn) ((unsigned long)(pfn) << 12)
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
new file mode 100644
index 0000000..e6f45ee
--- /dev/null
+++ b/xen/include/public/pmu.h
@@ -0,0 +1,38 @@
+#ifndef __XEN_PUBLIC_PMU_H__
+#define __XEN_PUBLIC_PMU_H__
+
+#include "xen.h"
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/pmu.h"
+#elif defined (__arm__) || defined (__aarch64__)
+#include "arch-arm.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    1
+
+
+/* Shared between hypervisor and PV domain */
+struct xen_pmu_data {
+    uint32_t domain_id;
+    uint32_t vcpu_id;
+    uint32_t pcpu_id;
+    uint32_t pmu_flags;
+
+    xen_pmu_arch_t pmu;
+};
+typedef struct xen_pmu_data xen_pmu_data_t;
+
+#endif /* __XEN_PUBLIC_PMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 10/20] x86/VPMU: Make vpmu not HVM-specific
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 5 ++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 7abe1b3..40a44b5 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -410,6 +410,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    struct vpmu_struct vpmu;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 01e0665..71a5b15 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -151,9 +151,6 @@ struct hvm_vcpu {
     u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
 
-    /* VPMU */
-    struct vpmu_struct  vpmu;
-
     union {
         struct arch_vmx_struct vmx;
         struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index f0b2686..6fa0def 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,8 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (9 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 21:04   ` Tian, Kevin
                     ` (4 more replies)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
                   ` (10 subsequent siblings)
  21 siblings, 5 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* XENPMU_MODE_OFF:  PMU is not virtualized
* XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
* XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
  can profile itself and the hypervisor.

Note that PMU modes are different from what can be provided at Xen's boot line
with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
Any other value, on the other hand, will cause VPMU mode to be set to
XENPMU_MODE_SELF during boot.

For feature flags only Intel's BTS is currently supported.

Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/flask/policy/policy/modules/xen/xen.te |   3 +
 xen/arch/x86/domain.c                        |   6 +-
 xen/arch/x86/hvm/svm/vpmu.c                  |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c            |  10 +-
 xen/arch/x86/hvm/vpmu.c                      | 206 +++++++++++++++++++++++++--
 xen/arch/x86/x86_64/compat/entry.S           |   4 +
 xen/arch/x86/x86_64/entry.S                  |   4 +
 xen/include/Makefile                         |   2 +
 xen/include/asm-x86/hvm/vpmu.h               |  27 ++--
 xen/include/public/pmu.h                     |  44 ++++++
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xlat.lst                         |   4 +
 xen/include/xsm/dummy.h                      |  15 ++
 xen/include/xsm/xsm.h                        |   6 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  18 +++
 xen/xsm/flask/policy/access_vectors          |   2 +
 18 files changed, 334 insertions(+), 27 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 1937883..fb761cd 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
 	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
 	tmem_control getscheduler setscheduler
 };
+allow dom0_t xen_t:xen2 {
+    pmu_ctrl
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 7b1dfe6..6a07737 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1503,7 +1503,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     if ( is_hvm_vcpu(prev) )
     {
         if (prev != next)
-            vpmu_save(prev);
+            vpmu_switch_from(prev, next);
 
         if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
             pt_save_timer(prev);
@@ -1546,9 +1546,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if (is_hvm_vcpu(next) && (prev != next) )
+    if ( is_hvm_vcpu(prev) && (prev != next) )
         /* Must be done with interrupts enabled */
-        vpmu_load(next);
+        vpmu_switch_to(prev, next);
 
     context_saved(prev);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 124b147..37d8228 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -479,14 +479,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
     .arch_vpmu_dump = amd_vpmu_dump
 };
 
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int svm_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
     int ret = 0;
 
     /* vpmu enabled? */
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index beff5c3..c0a45cd 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -703,13 +703,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+static int core2_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     u64 msr_content;
     static bool_t ds_warned;
 
-    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
         goto func_out;
     /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
     while ( boot_cpu_has(X86_FEATURE_DS) )
@@ -824,7 +824,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
     .do_cpuid = core2_no_vpmu_do_cpuid,
 };
 
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int vmx_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
@@ -832,7 +832,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     int ret = 0;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     if ( family == 6 )
@@ -875,7 +875,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         /* future: */
         case 0x3d:
         case 0x4e:
-            ret = core2_vpmu_initialise(v, vpmu_flags);
+            ret = core2_vpmu_initialise(v);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;
             return ret;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 071b869..5fcee0e 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,8 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -32,13 +34,22 @@
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
 #include <public/pmu.h>
+#include <xen/tasklet.h>
+#include <xsm/xsm.h>
+
+#include <compat/pmu.h>
+CHECK_pmu_params;
+CHECK_pmu_intel_ctxt;
+CHECK_pmu_amd_ctxt;
+CHECK_pmu_cntr_pair;
 
 /*
  * "vpmu" :     vpmu generally enabled
  * "vpmu=off" : vpmu generally disabled
  * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
  */
-static unsigned int __read_mostly opt_vpmu_enabled;
+uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+uint64_t __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
@@ -52,7 +63,7 @@ static void __init parse_vpmu_param(char *s)
         break;
     default:
         if ( !strcmp(s, "bts") )
-            opt_vpmu_enabled |= VPMU_BOOT_BTS;
+            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
         else if ( *s )
         {
             printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
@@ -60,7 +71,8 @@ static void __init parse_vpmu_param(char *s)
         }
         /* fall through */
     case 1:
-        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
+        /* Default VPMU mode */
+        vpmu_mode = XENPMU_MODE_SELF;
         break;
     }
 }
@@ -77,6 +89,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
         return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
     return 0;
@@ -86,6 +101,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
         return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
     return 0;
@@ -242,19 +260,19 @@ void vpmu_initialise(struct vcpu *v)
     switch ( vendor )
     {
     case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( svm_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( vmx_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     default:
         printk("VPMU: Initialization failed. "
                "Unknown CPU vendor %d\n", vendor);
-        opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
         break;
     }
 }
@@ -276,3 +294,175 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+static atomic_t vpmu_sched_counter;
+
+static void vpmu_sched_checkin(unsigned long unused)
+{
+    atomic_inc(&vpmu_sched_counter);
+}
+
+static int vpmu_force_context_switch(void)
+{
+    unsigned i, j, allbutself_num, mycpu;
+    static s_time_t start, now;
+    struct tasklet **sync_task;
+    struct vcpu *curr_vcpu = current;
+    int ret = 0;
+
+    allbutself_num = num_online_cpus() - 1;
+
+    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
+    if ( !sync_task )
+    {
+        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < allbutself_num; i++ )
+    {
+        sync_task[i] = xmalloc(struct tasklet);
+        if ( sync_task[i] == NULL )
+        {
+            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
+            ret = -ENOMEM;
+            goto out;
+        }
+        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
+    }
+
+    atomic_set(&vpmu_sched_counter, 0);
+
+    j = 0;
+    mycpu = smp_processor_id();
+    for_each_online_cpu( i )
+    {
+        if ( i != mycpu )
+            tasklet_schedule_on_cpu(sync_task[j++], i);
+    }
+
+    vpmu_save(curr_vcpu);
+
+    start = NOW();
+
+    /*
+     * Note that we may fail here if a CPU is hot-plugged while we are
+     * waiting. We will then time out.
+     */
+    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
+    {
+        cpu_relax();
+
+        now = NOW();
+
+        /* Give up after 5 seconds */
+        if ( now > start + SECONDS(5) )
+        {
+            printk(XENLOG_WARNING
+                   "vpmu_force_context_switch: failed to sync\n");
+            ret = -EBUSY;
+            break;
+        }
+
+        /* Or after 2 milliseconds if need to be preempted */
+        if ( (now > start + MILLISECS(2)) && hypercall_preempt_check() )
+        {
+            ret = -EAGAIN;
+            break;
+        }
+    }
+
+ out:
+    for ( i = 0; i < allbutself_num; i++ )
+    {
+        if ( sync_task[i] )
+        {
+            tasklet_kill(sync_task[i]);
+            xfree(sync_task[i]);
+        }
+    }
+    xfree(sync_task);
+
+    return ret;
+}
+
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret;
+    xen_pmu_params_t pmu_params;
+
+    ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
+    if ( ret )
+        return ret;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    {
+        static DEFINE_SPINLOCK(xenpmu_mode_lock);
+        uint32_t current_mode;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+            return -EINVAL;
+
+        /*
+         * Return error is someone else is in the middle of changing mode ---
+         * this is most likely indication of two system administrators
+         * working against each other
+         */
+        if ( !spin_trylock(&xenpmu_mode_lock) )
+            return -EAGAIN;
+
+        current_mode = vpmu_mode;
+        vpmu_mode = pmu_params.val;
+
+        if ( vpmu_mode == XENPMU_MODE_OFF )
+        {
+            /*
+             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
+             * can be achieved by having all physical processors go through
+             * context_switch().
+             */
+            ret = vpmu_force_context_switch();
+            if ( ret )
+                vpmu_mode = current_mode;
+        }
+
+        spin_unlock(&xenpmu_mode_lock);
+        break;
+    }
+
+    case XENPMU_mode_get:
+        memset(&pmu_params, 0, sizeof(pmu_params));
+        pmu_params.val = vpmu_mode;
+        pmu_params.version.maj = XENPMU_VER_MAJ;
+        pmu_params.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        break;
+
+    case XENPMU_feature_set:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_features = pmu_params.val;
+        break;
+
+    case XENPMU_feature_get:
+        memset(&pmu_params, 0, sizeof(pmu_params));
+        pmu_params.val = vpmu_features;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        break;
+
+    default:
+        ret = -EINVAL;
+    }
+
+    return ret;
+}
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index ac594c9..8587c46 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall           /* reserved for XenClient */
+        .quad do_xenpmu_op              /* 40 */
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -465,6 +467,8 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 0 /* reserved for XenClient   */
+        .byte 2 /* do_xenpmu_op             */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index ade555b..7f5dedf 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -772,6 +772,8 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall       /* reserved for XenClient */
+        .quad do_xenpmu_op          /* 40 */
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -820,6 +822,8 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 0 /* reserved for XenClient */
+        .byte 2 /* do_xenpmu_op         */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/include/Makefile b/xen/include/Makefile
index f7ccbc9..f97733a 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -26,7 +26,9 @@ headers-y := \
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-mca.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-$(compat-arch-y).h
+headers-$(CONFIG_X86)     += compat/arch-x86/pmu.h
 headers-y                 += compat/arch-$(compat-arch-y).h compat/xlat.h
+headers-y                 += compat/pmu.h
 headers-$(FLASK_ENABLE)   += compat/xsm/flask_op.h
 
 cppflags-y                := -include public/xen-compat.h
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 6fa0def..c612e1a 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -24,13 +24,6 @@
 
 #include <public/pmu.h>
 
-/*
- * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
- * See arch/x86/hvm/vpmu.c.
- */
-#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
-#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
-
 #define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
 #define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
@@ -59,8 +52,8 @@ struct arch_vpmu_ops {
     void (*arch_vpmu_dump)(const struct vcpu *);
 };
 
-int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
-int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
+int vmx_vpmu_initialise(struct vcpu *);
+int svm_vpmu_initialise(struct vcpu *);
 
 struct vpmu_struct {
     u32 flags;
@@ -116,5 +109,21 @@ void vpmu_dump(struct vcpu *v);
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
+extern uint64_t vpmu_mode;
+extern uint64_t vpmu_features;
+
+/* Context switch */
+inline void vpmu_switch_from(struct vcpu *prev, struct vcpu *next)
+{
+    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        vpmu_save(prev);
+}
+
+inline void vpmu_switch_to(struct vcpu *prev, struct vcpu *next)
+{
+    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        vpmu_load(next);
+}
+
 #endif /* __ASM_X86_HVM_VPMU_H_*/
 
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index e6f45ee..c2293be 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -13,6 +13,50 @@
 #define XENPMU_VER_MAJ    0
 #define XENPMU_VER_MIN    1
 
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_feature_get     2
+#define XENPMU_feature_set     3
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xen_pmu_params {
+    /* IN/OUT parameters */
+    struct {
+        uint32_t maj;
+        uint32_t min;
+    } version;
+    uint64_t val;
+
+    /* IN parameters */
+    uint64_t vcpu;
+};
+typedef struct xen_pmu_params xen_pmu_params_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_SELF:  Guests can profile themselves
+ * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ */
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_SELF          (1<<0)
+#define XENPMU_MODE_HV            (1<<1)
+
+/*
+ * PMU features:
+ * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FEATURE_INTEL_BTS  1
 
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index a6a2092..0766790 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index a9e5229..cf34547 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -14,6 +14,7 @@
 #include <public/event_channel.h>
 #include <public/tmem.h>
 #include <public/version.h>
+#include <public/pmu.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -139,6 +140,9 @@ do_tmem_op(
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+extern long
+do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index c8fafef..5809c60 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -101,6 +101,10 @@
 !	vcpu_set_singleshot_timer	vcpu.h
 ?	xenoprof_init			xenoprof.h
 ?	xenoprof_passive		xenoprof.h
+?	pmu_params			pmu.h
+?	pmu_intel_ctxt			arch-x86/pmu.h
+?	pmu_amd_ctxt			arch-x86/pmu.h
+?	pmu_cntr_pair			arch-x86/pmu.h
 ?	flask_access			xsm/flask_op.h
 !	flask_boolean			xsm/flask_op.h
 ?	flask_cache_stats		xsm/flask_op.h
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index df55e70..d423c1c 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -653,4 +653,19 @@ static XSM_INLINE int xsm_ioport_mapping(XSM_DEFAULT_ARG struct domain *d, uint3
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    case XENPMU_mode_get:
+    case XENPMU_feature_set:
+    case XENPMU_feature_get:
+        return xsm_default_action(XSM_PRIV, d, current->domain);
+    default:
+        return -EPERM;
+    }
+}
+
 #endif /* CONFIG_X86 */
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 6c1c079..635f7df 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -170,6 +170,7 @@ struct xsm_operations {
     int (*unbind_pt_irq) (struct domain *d, struct xen_domctl_bind_pt_irq *bind);
     int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
+    int (*pmu_op) (struct domain *d, int op);
 #endif
 };
 
@@ -660,6 +661,11 @@ static inline int xsm_ioport_mapping (xsm_default_t def, struct domain *d, uint3
     return xsm_ops->ioport_mapping(d, s, e, allow);
 }
 
+static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, int op)
+{
+    return xsm_ops->pmu_op(d, op);
+}
+
 #endif /* CONFIG_X86 */
 
 #endif /* XSM_NO_WRAPPERS */
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 0826a8b..3638bd9 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -141,5 +141,6 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, unbind_pt_irq);
     set_to_dummy_if_null(ops, ioport_permission);
     set_to_dummy_if_null(ops, ioport_mapping);
+    set_to_dummy_if_null(ops, pmu_op);
 #endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 5afc1d7..b437a24 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1485,6 +1485,23 @@ static int flask_unbind_pt_irq (struct domain *d, struct xen_domctl_bind_pt_irq
 {
     return current_has_perm(d, SECCLASS_RESOURCE, RESOURCE__REMOVE);
 }
+
+static int flask_pmu_op (struct domain *d, int op)
+{
+    u32 dsid = domain_sid(d);
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    case XENPMU_mode_get:
+    case XENPMU_feature_set:
+    case XENPMU_feature_get:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
+                            XEN2__PMU_CTRL, NULL);
+    default:
+        return -EPERM;
+    }
+}
 #endif /* CONFIG_X86 */
 
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
@@ -1604,6 +1621,7 @@ static struct xsm_operations flask_ops = {
     .unbind_pt_irq = flask_unbind_pt_irq,
     .ioport_permission = flask_ioport_permission,
     .ioport_mapping = flask_ioport_mapping,
+    .pmu_op = flask_pmu_op,
 #endif
 };
 
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 2ddbeba..64c7378 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -81,6 +81,8 @@ class xen2
 {
 # XENPF_get_symbol
     get_symbol
+# PMU control
+    pmu_ctrl
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (10 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 22:16   ` Daniel De Graaf
                     ` (2 more replies)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 13/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
                   ` (9 subsequent siblings)
  21 siblings, 3 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/flask/policy/policy/modules/xen/xen.te |  4 ++
 xen/arch/x86/hvm/hvm.c                       |  3 +-
 xen/arch/x86/hvm/svm/svm.c                   |  4 +-
 xen/arch/x86/hvm/svm/vpmu.c                  | 43 +++++++++-----
 xen/arch/x86/hvm/vmx/vmx.c                   |  4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c            | 83 ++++++++++++++++++++--------
 xen/arch/x86/hvm/vpmu.c                      | 81 +++++++++++++++++++++++++++
 xen/common/event_channel.c                   |  1 +
 xen/include/asm-x86/hvm/vpmu.h               |  1 +
 xen/include/public/pmu.h                     |  2 +
 xen/include/public/xen.h                     |  1 +
 xen/include/xsm/dummy.h                      |  3 +
 xen/xsm/flask/hooks.c                        |  4 ++
 xen/xsm/flask/policy/access_vectors          |  2 +
 14 files changed, 195 insertions(+), 41 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index fb761cd..6744c36 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -116,6 +116,10 @@ domain_comms(dom0_t, dom0_t)
 # Allow all domains to use (unprivileged parts of) the tmem hypercall
 allow domain_type xen_t:xen tmem_op;
 
+# Allow all domains to use PMU (but not to change its settings --- that's what
+# pmu_ctrl is for)
+allow domain_type xen_t:xen2 pmu_use;
+
 ###############################################################################
 #
 # Domain creation
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bb45593..ec4a021 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4832,7 +4832,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
     [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(domctl)
+    HYPERCALL(domctl),
+    HYPERCALL(xenpmu_op)
 };
 
 int hvm_do_hypercall(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 5d404ce..319e5da 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1157,7 +1157,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     svm_guest_osvw_init(v);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 37d8228..be3ab27 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -362,6 +362,7 @@ static int amd_vpmu_initialise(struct vcpu *v)
     struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
+    unsigned int regs_size;
 
     if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return 0;
@@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
-                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
-    if ( !ctxt )
+    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
+    if ( is_hvm_domain(v->domain) )
     {
-        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-            " PMU feature is unavailable on domain %d vcpu %d.\n",
-            v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
+        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                "PMU feature is unavailable\n");
+            return -ENOMEM;
+        }
+    }
+    else
+    {
+        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
+        {
+            gdprintk(XENLOG_WARNING,
+                    "Register bank does not fit into VPMU shared page\n");
+            return -ENOSPC;
+        }
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
     }
 
     ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
@@ -415,17 +428,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
-        amd_vpmu_unset_msr_bitmap(v);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( is_msr_bitmap_on(vpmu) )
+            amd_vpmu_unset_msr_bitmap(v);
 
-    xfree(vpmu->context);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
 
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 84119ed..bebe879 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -116,7 +116,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     vmx_install_vlapic_mapping(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index c0a45cd..5c0f99a 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -356,25 +356,45 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
     uint64_t *p = NULL;
+    unsigned int regs_size;
 
-    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-        return 0;
-
-    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+    p = xzalloc_bytes(sizeof(uint64_t));
+    if ( !p )
         goto out_err;
 
-    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
-                                   sizeof(uint64_t) * fixed_pmc_cnt +
-                                   sizeof(struct xen_pmu_cntr_pair) *
-                                   arch_pmc_cnt);
-    p = xzalloc(uint64_t);
-    if ( !core2_vpmu_cxt || !p )
-        goto out_err;
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( is_hvm_domain(v->domain) && !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err_hvm;
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err_hvm;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    }
+
+    regs_size = sizeof(uint64_t) * fixed_pmc_cnt +
+                sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt;
+    if ( is_hvm_domain(v->domain) )
+    {
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                       regs_size);
+        if ( !core2_vpmu_cxt )
+            goto out_err_hvm;
+    }
+    else
+    {
+        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
+        {
+            printk(XENLOG_WARNING
+                   "Register bank does not fit into VPMU share page\n");
+            goto out_err_hvm;
+        }
+
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+    }
 
     core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
     core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
@@ -387,10 +407,12 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     return 1;
 
-out_err:
-    release_pmu_ownship(PMU_OWNER_HVM);
-
+ out_err_hvm:
     xfree(core2_vpmu_cxt);
+    if ( is_hvm_domain(v->domain) )
+        release_pmu_ownship(PMU_OWNER_HVM);
+
+ out_err:
     xfree(p);
 
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
@@ -756,6 +778,11 @@ static int core2_vpmu_initialise(struct vcpu *v)
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
     check_pmc_quirk();
+
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
+        return -EIO;
+
     return 0;
 }
 
@@ -766,12 +793,20 @@ static void core2_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    xfree(vpmu->context);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
     xfree(vpmu->priv_context);
-    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-    release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 struct arch_vpmu_ops core2_vpmu_ops = {
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 5fcee0e..dde3367 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -26,6 +26,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
+#include <asm/p2m.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
@@ -256,6 +257,7 @@ void vpmu_initialise(struct vcpu *v)
         vpmu_destroy(v);
     vpmu_clear(vpmu);
     vpmu->context = NULL;
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
 
     switch ( vendor )
     {
@@ -282,7 +284,74 @@ void vpmu_destroy(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, v, 1);
+
         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
+}
+
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct page_info *page;
+    uint64_t gfn = params->val;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        return -EINVAL;
+    }
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if ( v != current )
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page_and_type(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if ( v != current )
+        vcpu_unpause(v);
 }
 
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
@@ -460,6 +529,18 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         break;
 
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+
     default:
         ret = -EINVAL;
     }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 7d6de54..a991b2d 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -108,6 +108,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_XENPMU:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index c612e1a..93f1fc2 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -62,6 +62,7 @@ struct vpmu_struct {
     void *context;      /* May be shared with PV guest */
     void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
+    xen_pmu_data_t *xenpmu_data;
 };
 
 /* VPMU states */
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index c2293be..b8c5682 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -25,6 +25,8 @@
 #define XENPMU_mode_set        1
 #define XENPMU_feature_get     2
 #define XENPMU_feature_set     3
+#define XENPMU_init            4
+#define XENPMU_finish          5
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 0766790..e4d0b79 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
+#define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index d423c1c..29dae2e 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -663,6 +663,9 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
     case XENPMU_feature_set:
     case XENPMU_feature_get:
         return xsm_default_action(XSM_PRIV, d, current->domain);
+    case XENPMU_init:
+    case XENPMU_finish: 
+        return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index b437a24..8bd4a3d 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1498,6 +1498,10 @@ static int flask_pmu_op (struct domain *d, int op)
     case XENPMU_feature_get:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
                             XEN2__PMU_CTRL, NULL);
+    case XENPMU_init:
+    case XENPMU_finish:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
+                            XEN2__PMU_USE, NULL);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 64c7378..36b69c6 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -83,6 +83,8 @@ class xen2
     get_symbol
 # PMU control
     pmu_ctrl
+# PMU use (anyone has access)
+    pmu_use
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 13/20] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (11 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-29 15:52   ` Jan Beulich
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 14/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Save VPMU state during context switch for both HVM and PV(H) guests.

A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible
for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs
is_running to be correctly set/cleared. To prepare for that, call context_saved()
before vpmu_switch_to() is executed. (Note that while this change could have
been dalayed until that later patch, the changes are harmless to existing code
and so we do it here)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/domain.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 6a07737..57b3c80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1498,16 +1498,13 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     }
 
     if ( prev != next )
-        _update_runstate_area(prev);
-
-    if ( is_hvm_vcpu(prev) )
     {
-        if (prev != next)
-            vpmu_switch_from(prev, next);
+        _update_runstate_area(prev);
+        vpmu_switch_from(prev, next);
+    }
 
-        if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+    if ( is_hvm_vcpu(prev) && !list_empty(&prev->arch.hvm_vcpu.tm_list) )
             pt_save_timer(prev);
-    }
 
     local_irq_disable();
 
@@ -1546,15 +1543,16 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if ( is_hvm_vcpu(prev) && (prev != next) )
-        /* Must be done with interrupts enabled */
-        vpmu_switch_to(prev, next);
-
     context_saved(prev);
 
     if ( prev != next )
+    {
         _update_runstate_area(next);
 
+        /* Must be done with interrupts enabled */
+        vpmu_switch_to(prev, next);
+    }
+
     /* Ensure that the vcpu has an up-to-date time base. */
     update_vcpu_system_time(next);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 14/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (12 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 13/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

With this patch return value of 1 of vpmu_do_msr() will now indicate whether an
error was encountered during MSR processing (instead of stating that the access
was to a VPMU register).

As part of this patch we also check for validity of certain MSR accesses right
when we determine which register is being written, as opposed to postponing this
until later.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/svm/svm.c        |  6 ++-
 xen/arch/x86/hvm/svm/vpmu.c       |  6 +--
 xen/arch/x86/hvm/vmx/vmx.c        | 24 +++++++++---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 78 ++++++++++++++-------------------------
 4 files changed, 53 insertions(+), 61 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 319e5da..d6278f3 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1653,7 +1653,8 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_rdmsr(msr, msr_content);
+        if ( vpmu_do_rdmsr(msr, msr_content) )
+            goto gpf;
         break;
 
     case MSR_AMD64_DR0_ADDRESS_MASK:
@@ -1804,7 +1805,8 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_wrmsr(msr, msr_content, 0);
+        if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+            goto gpf;
         break;
 
     case MSR_IA32_MCx_MISC(4): /* Threshold register */
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index be3ab27..63c099c 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -306,7 +306,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 1;
+            return 0;
         vpmu_set(vpmu, VPMU_RUNNING);
 
         if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
@@ -336,7 +336,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
 
     /* Write to hw counters */
     wrmsrl(msr, msr_content);
-    return 1;
+    return 0;
 }
 
 static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -354,7 +354,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
     rdmsrl(msr, *msr_content);
 
-    return 1;
+    return 0;
 }
 
 static int amd_vpmu_initialise(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index bebe879..da497e4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2088,12 +2088,17 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
                        MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
         /* Perhaps vpmu will change some bits. */
+        /* FALLTHROUGH */
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_IA32_PEBS_ENABLE:
+    case MSR_IA32_DS_AREA:
         if ( vpmu_do_rdmsr(msr, msr_content) )
-            goto done;
+            goto gp_fault;
         break;
     default:
-        if ( vpmu_do_rdmsr(msr, msr_content) )
-            break;
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
         switch ( long_mode_do_msr_read(msr, msr_content) )
@@ -2265,7 +2270,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( msr_content & ~supported )
         {
             /* Perhaps some other bits are supported in vpmu. */
-            if ( !vpmu_do_wrmsr(msr, msr_content, supported) )
+            if ( vpmu_do_wrmsr(msr, msr_content, supported) )
                 break;
         }
         if ( msr_content & IA32_DEBUGCTLMSR_LBR )
@@ -2293,9 +2298,16 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( !nvmx_msr_write_intercept(msr, msr_content) )
             goto gp_fault;
         break;
+    case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+    case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_IA32_PEBS_ENABLE:
+    case MSR_IA32_DS_AREA:
+         if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+            goto gp_fault;
+        break;
     default:
-        if ( vpmu_do_wrmsr(msr, msr_content, 0) )
-            return X86EMUL_OKAY;
         if ( passive_domain_do_wrmsr(msr, msr_content) )
             return X86EMUL_OKAY;
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 5c0f99a..1f21297 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -468,36 +468,41 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                              IA32_DEBUGCTLMSR_BTS_OFF_USR;
             if ( !(msr_content & ~supported) &&
                  vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                return 1;
+                return 0;
             if ( (msr_content & supported) &&
                  !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
                 printk(XENLOG_G_WARNING
                        "%pv: Debug Store unsupported on this CPU\n",
                        current);
         }
-        return 0;
+        return 1;
     }
 
     ASSERT(!supported);
 
+    if ( type == MSR_TYPE_COUNTER &&
+         (msr_content &
+          ~((1ull << core2_get_bitwidth_fix_count()) - 1)) )
+        /* Writing unsupported bits to a fixed counter */
+        return 1;
+
     core2_vpmu_cxt = vpmu->context;
     enabled_cntrs = vpmu->priv_context;
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_status &= ~msr_content;
-        return 1;
+        return 0;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
                  "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        hvm_inject_hw_exception(TRAP_gp_fault, 0);
         return 1;
     case MSR_IA32_PEBS_ENABLE:
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
         core2_vpmu_cxt->pebs_enable = msr_content;
-        return 1;
+        return 0;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
         {
@@ -506,18 +511,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                 gdprintk(XENLOG_WARNING,
                          "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
                          msr_content);
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
             core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
-        return 1;
+        return 0;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
+        if ( msr_content &
+             ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
+            return 1;
+
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
         *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
@@ -540,6 +548,9 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
                 vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
+            if ( msr_content & (~((1ull << 32) - 1)) )
+                return 1;
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
@@ -551,45 +562,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         }
     }
 
+    if ( type != MSR_TYPE_GLOBAL )
+        wrmsrl(msr, msr_content);
+    else
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+
     if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    if ( type != MSR_TYPE_GLOBAL )
-    {
-        u64 mask;
-        int inject_gp = 0;
-        switch ( type )
-        {
-        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
-            mask = ~((1ull << 32) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
-            if  ( msr == MSR_IA32_DS_AREA )
-                break;
-            /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
-            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        }
-        if (inject_gp)
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
-        else
-            wrmsrl(msr, msr_content);
-    }
-    else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-
-    return 1;
+    return 0;
 }
 
 static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -617,19 +600,14 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             rdmsrl(msr, *msr_content);
         }
     }
-    else
+    else if ( msr == MSR_IA32_MISC_ENABLE )
     {
         /* Extension for BTS */
-        if ( msr == MSR_IA32_MISC_ENABLE )
-        {
-            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-        }
-        else
-            return 0;
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
     }
 
-    return 1;
+    return 0;
 }
 
 static void core2_vpmu_do_cpuid(unsigned int input,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (13 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 14/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 16:34   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
                   ` (6 subsequent siblings)
  21 siblings, 3 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Intercept accesses to PMU MSRs and process them in VPMU module.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/domain.c             |  3 +--
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 49 ++++++++++++++++++++++++++++++++------
 xen/arch/x86/hvm/vpmu.c           |  7 ++++++
 xen/arch/x86/traps.c              | 50 +++++++++++++++++++++++++++++++++++++--
 xen/include/public/pmu.h          |  1 +
 5 files changed, 99 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 57b3c80..0388913 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2030,8 +2030,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
     paging_dump_vcpu_info(v);
 
-    if ( is_hvm_vcpu(v) )
-        vpmu_dump(v);
+    vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 1f21297..0f605bd 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/apic.h>
+#include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
 #include <asm/hvm/support.h>
@@ -294,12 +295,18 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
         rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( !has_hvm_container_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+    if ( !has_hvm_container_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
     if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
@@ -337,6 +344,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
     wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
     wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( !has_hvm_container_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -447,7 +461,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
                                uint64_t supported)
 {
-    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -491,7 +504,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        if ( msr_content & ~(0xC000000000000000 |
+                             (((1ULL << fixed_pmc_cnt) - 1) << 32) |
+                             ((1ULL << arch_pmc_cnt) - 1)) )
+            return 1;
         core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
         return 0;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -519,14 +537,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 0;
     case MSR_CORE_PERF_GLOBAL_CTRL:
-        global_ctrl = msr_content;
+        core2_vpmu_cxt->global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         if ( msr_content &
              ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
             return 1;
 
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        if ( has_hvm_container_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                               &core2_vpmu_cxt->global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
         *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
@@ -551,7 +573,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
             if ( msr_content & (~((1ull << 32) - 1)) )
                 return 1;
 
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                                   &core2_vpmu_cxt->global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
                 *enabled_cntrs |= 1ULL << tmp;
@@ -565,9 +591,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
     if ( type != MSR_TYPE_GLOBAL )
         wrmsrl(msr, msr_content);
     else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    {
+        if ( has_hvm_container_domain(v->domain) )
+            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+        else
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
 
-    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
+    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -594,7 +626,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
         default:
             rdmsrl(msr, *msr_content);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index dde3367..542e23e 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -501,6 +501,13 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 
         spin_unlock(&xenpmu_mode_lock);
         break;
+
+    case XENPMU_lvtpc_set:
+        if ( current->arch.vpmu.xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        ret = 0;
+        break;
     }
 
     case XENPMU_mode_get:
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 10fc2ca..cc70514 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,6 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
+#include <asm/hvm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
@@ -896,8 +897,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
         __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
         break;
 
+    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
+        break;
+
     case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
     case 0x0000000b: /* Extended Topology Enumeration */
     case 0x8000000a: /* SVM revision and features */
     case 0x8000001b: /* Instruction Based Sampling */
@@ -913,6 +916,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
     }
 
  out:
+    /* VPMU may decide to modify some of the leaves */
+    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
+
     regs->eax = a;
     regs->ebx = b;
     regs->ecx = c;
@@ -1935,6 +1941,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
     char io_emul_stub[32];
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val, msr_content;
+    bool_t vpmu_msr;
 
     if ( !read_descriptor(regs->cs, v, regs,
                           &code_base, &code_limit, &ar,
@@ -2425,6 +2432,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         uint32_t eax = regs->eax;
         uint32_t edx = regs->edx;
         msr_content = ((uint64_t)edx << 32) | eax;
+        vpmu_msr = 0;
         switch ( (u32)regs->ecx )
         {
         case MSR_FS_BASE:
@@ -2561,7 +2569,22 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
                 wrmsrl(regs->_ecx, msr_content);
             break;
-
+        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+            {
+                vpmu_msr = 1;
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+                if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+                {
+                    if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
+                        goto fail;
+                }
+                break;
+            }
+            /*FALLTHROUGH*/
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
                 break;
@@ -2593,6 +2616,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         break;
 
     case 0x32: /* RDMSR */
+        vpmu_msr = 0;
         switch ( (u32)regs->ecx )
         {
         case MSR_FS_BASE:
@@ -2663,7 +2687,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                             [regs->_ecx - MSR_AMD64_DR1_ADDRESS_MASK + 1];
             regs->edx = 0;
             break;
+        case MSR_IA32_PERF_CAPABILITIES:
+            /* No extra capabilities are supported */
+            regs->eax = regs->edx = 0;
+            break;
+        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+            {
+                vpmu_msr = 1;
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+                if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
+                {
+                    if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
+                        goto fail;
 
+                    regs->eax = (uint32_t)msr_content;
+                    regs->edx = (uint32_t)(msr_content >> 32);
+                }
+                break;
+            }
+            /*FALLTHROUGH*/
         default:
             if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
             {
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index b8c5682..68a5fb8 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -27,6 +27,7 @@
 #define XENPMU_feature_set     3
 #define XENPMU_init            4
 #define XENPMU_finish          5
+#define XENPMU_lvtpc_set       6
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (14 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-26 22:09   ` Daniel De Graaf
  2014-09-30  8:11   ` Jan Beulich
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 17/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
                   ` (5 subsequent siblings)
  21 siblings, 2 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add support for handling PMU interrupts for PV guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Since the interrupt handler may now force VPMU context save (i.e. set
VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which
until now expected this flag to be set only when the counters were stopped.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c |  11 +--
 xen/arch/x86/hvm/vpmu.c     | 197 +++++++++++++++++++++++++++++++++++++++-----
 xen/include/public/pmu.h    |   7 ++
 xen/include/xsm/dummy.h     |   4 +-
 xen/xsm/flask/hooks.c       |   2 +
 5 files changed, 192 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 63c099c..055b21c 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -229,17 +229,12 @@ static int amd_vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     unsigned int i;
 
-    /*
-     * Stop the counters. If we came here via vpmu_save_force (i.e.
-     * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
-     */
+    for ( i = 0; i < num_counters; i++ )
+        wrmsrl(ctrls[i], 0);
+
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
     {
         vpmu_set(vpmu, VPMU_FROZEN);
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], 0);
-
         return 0;
     }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 542e23e..6a28729 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -80,44 +80,191 @@ static void __init parse_vpmu_param(char *s)
 
 void vpmu_lvtpc_update(uint32_t val)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+    if ( is_hvm_domain(curr->domain) ||
+         !(vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
     if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
         return 0;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
+
+        /*
+         * We may have received a PMU interrupt during WRMSR handling
+         * and since do_wrmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_domain(curr->domain) &&
+             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
     if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
         return 0;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+        if ( !is_hvm_domain(curr->domain) &&
+             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
+static struct vcpu *choose_hwdom_vcpu(void)
+{
+    struct vcpu *v;
+    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
+
+    if ( hardware_domain->vcpu == NULL )
+        return NULL;
+
+    v = hardware_domain->vcpu[idx];
+
+    /*
+     * If index is not populated search downwards the vcpu array until
+     * a valid vcpu can be found
+     */
+    while ( !v && idx-- )
+        v = hardware_domain->vcpu[idx];
+
+    return v;
+}
+
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct vcpu *sampled = current, *sampling;
+    struct vpmu_struct *vpmu;
+
+    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
+    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+    {
+        sampling = choose_hwdom_vcpu();
+        if ( !sampling )
+            return 0;
+    }
+    else
+        sampling = sampled;
+
+    vpmu = vcpu_vpmu(sampling);
+    if ( !is_hvm_domain(sampling->domain) )
+    {
+        /* PV(H) guest */
+        const struct cpu_user_regs *cur_regs;
+
+        if ( !vpmu->xenpmu_data )
+            return 0;
+
+        if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED )
+            return 1;
+
+        if ( is_pvh_domain(sampled->domain) &&
+             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        /* Store appropriate registers in xenpmu_data */
+        if ( is_pv_32bit_domain(sampling->domain) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+             * and therefore we treat it the same way as a non-privileged
+             * PV 32-bit domain.
+             */
+            struct compat_pmu_regs *cmp;
+
+            cur_regs = guest_cpu_user_regs();
+
+            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
+            cmp->eip = cur_regs->rip;
+            cmp->esp = cur_regs->rsp;
+            cmp->cs = cur_regs->cs;
+            if ( (cmp->cs & 3) == 1 )
+                cmp->cs &= ~3;
+        }
+        else
+        {
+            struct xen_pmu_regs *r = &vpmu->xenpmu_data->pmu.r.regs;
+
+            /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
+            if ( (vpmu_mode & XENPMU_MODE_SELF) ||
+                 (!is_hardware_domain(sampled->domain) &&
+                  !is_idle_vcpu(sampled)) )
+                cur_regs = guest_cpu_user_regs();
+            else
+                cur_regs = regs;
+
+            r->rip = cur_regs->rip;
+            r->rsp = cur_regs->rsp;
+
+            if ( !is_pvh_domain(sampled->domain) )
+            {
+                r->cs = cur_regs->cs;
+                if ( sampled->arch.flags & TF_kernel_mode )
+                    r->cs &= ~3;
+            }
+            else
+            {
+                struct segment_register seg_cs;
+
+                hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+                r->cs = seg_cs.sel;
+            }
+        }
+
+        vpmu->xenpmu_data->domain_id = DOMID_SELF;
+        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
+        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
+
+        vpmu->xenpmu_data->pmu_flags |= PMU_CACHED;
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
+
+        return 1;
+    }
 
     if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(v);
+        struct vlapic *vlapic = vcpu_vlapic(sampling);
         u32 vlapic_lvtpc;
         unsigned char int_vec;
 
@@ -131,9 +278,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
 
         if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+            vlapic_set_irq(vcpu_vlapic(sampling), int_vec, 0);
         else
-            v->nmi_pending = 1;
+            sampling->nmi_pending = 1;
         return 1;
     }
 
@@ -232,7 +379,9 @@ void vpmu_load(struct vcpu *v)
     local_irq_enable();
 
     /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (!is_hvm_domain(v->domain) &&
+          (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )
         return;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
@@ -457,6 +606,7 @@ static int vpmu_force_context_switch(void)
 long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 {
     int ret;
+    struct vcpu *curr;
     xen_pmu_params_t pmu_params;
 
     ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
@@ -502,14 +652,6 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         spin_unlock(&xenpmu_mode_lock);
         break;
 
-    case XENPMU_lvtpc_set:
-        if ( current->arch.vpmu.xenpmu_data == NULL )
-            return -EINVAL;
-        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
-        ret = 0;
-        break;
-    }
-
     case XENPMU_mode_get:
         memset(&pmu_params, 0, sizeof(pmu_params));
         pmu_params.val = vpmu_mode;
@@ -548,6 +690,21 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         pvpmu_finish(current->domain, &pmu_params);
         break;
 
+    case XENPMU_lvtpc_set:
+        curr = current;
+        if ( curr->arch.vpmu.xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        break;
+
+    case XENPMU_flush:
+        curr = current;
+        curr->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
+        vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        vpmu_load(curr);
+        break;
+    }
+
     default:
         ret = -EINVAL;
     }
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 68a5fb8..a1886a5 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -28,6 +28,7 @@
 #define XENPMU_init            4
 #define XENPMU_finish          5
 #define XENPMU_lvtpc_set       6
+#define XENPMU_flush           7 /* Write cached MSR values to HW     */
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
@@ -61,6 +62,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  */
 #define XENPMU_FEATURE_INTEL_BTS  1
 
+/*
+ * PMU MSRs are cached in the context so the PV guest doesn't need to trap to
+ * the hypervisor
+ */
+#define PMU_CACHED 1
+
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
     uint32_t domain_id;
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 29dae2e..69e9f63 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -664,7 +664,9 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
     case XENPMU_feature_get:
         return xsm_default_action(XSM_PRIV, d, current->domain);
     case XENPMU_init:
-    case XENPMU_finish: 
+    case XENPMU_finish:
+    case XENPMU_lvtpc_set:
+    case XENPMU_flush:
         return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
         return -EPERM;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 8bd4a3d..9e33b4d 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1500,6 +1500,8 @@ static int flask_pmu_op (struct domain *d, int op)
                             XEN2__PMU_CTRL, NULL);
     case XENPMU_init:
     case XENPMU_finish:
+    case XENPMU_lvtpc_set:
+    case XENPMU_flush:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
                             XEN2__PMU_USE, NULL);
     default:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 17/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (15 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-30  8:13   ` Jan Beulich
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

The two routines share most of their logic.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vpmu.c        | 69 +++++++++++++++++-------------------------
 xen/include/asm-x86/hvm/vpmu.h | 15 +++++++--
 2 files changed, 40 insertions(+), 44 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 6a28729..ab00ad4 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -91,57 +91,42 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+                uint64_t supported, bool_t is_write)
 {
-    struct vcpu *curr = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
+    struct vcpu *curr;
+    struct vpmu_struct *vpmu;
+    struct arch_vpmu_ops *ops;
+    int ret = 0;
 
     if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
         return 0;
 
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
-
-        /*
-         * We may have received a PMU interrupt during WRMSR handling
-         * and since do_wrmsr may load VPMU context we should save
-         * (and unload) it again.
-         */
-        if ( !is_hvm_domain(curr->domain) &&
-             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
-    }
-    return 0;
-}
-
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *curr = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
-
-    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+    curr = current;
+    vpmu = vcpu_vpmu(curr);
+    ops = vpmu->arch_vpmu_ops;
+    if ( !ops )
         return 0;
 
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-    {
-        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    if ( is_write && ops->do_wrmsr )
+        ret = ops->do_wrmsr(msr, *msr_content, supported);
+    else if ( !is_write && ops->do_rdmsr )
+        ret = ops->do_rdmsr(msr, msr_content);
 
-        if ( !is_hvm_domain(curr->domain) &&
-             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
+    /*
+     * We may have received a PMU interrupt while handling MSR access
+     * and since do_wr/rdmsr may load VPMU context we should save
+     * (and unload) it again.
+     */
+    if ( !is_hvm_domain(curr->domain) &&
+         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+    {
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+        ops->arch_vpmu_save(curr);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
     }
-    return 0;
+
+    return ret;
 }
 
 static struct vcpu *choose_hwdom_vcpu(void)
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 93f1fc2..a6e7934 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -96,8 +96,8 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
 }
 
 void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+                uint64_t supported, bool_t is_write);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                        unsigned int *ecx, unsigned int *edx);
@@ -107,6 +107,17 @@ void vpmu_save(struct vcpu *v);
 void vpmu_load(struct vcpu *v);
 void vpmu_dump(struct vcpu *v);
 
+static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+                                uint64_t supported)
+{
+    uint64_t val = msr_content;
+    return vpmu_do_msr(msr, &val, supported, 1);
+}
+static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    return vpmu_do_msr(msr, msr_content, 0, 0);
+}
+
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (16 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 17/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-30  8:18   ` Jan Beulich
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged
domain (dom0) profile both itself (and the hypervisor) and the guests. While
this mode is on profiling in guests is disabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/vpmu.c  | 26 ++++++++++++++++++--------
 xen/arch/x86/traps.c     | 12 ++++++++++++
 xen/include/public/pmu.h |  3 +++
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index ab00ad4..fae778e 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -99,7 +99,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
     struct arch_vpmu_ops *ops;
     int ret = 0;
 
-    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_ALL) &&
+          !is_hardware_domain(current->domain)) )
         return 0;
 
     curr = current;
@@ -154,8 +156,12 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *sampled = current, *sampling;
     struct vpmu_struct *vpmu;
 
-    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
-    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+    /*
+     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+     * in XENPMU_MODE_ALL, for everyone.
+     */
+    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
     {
         sampling = choose_hwdom_vcpu();
         if ( !sampling )
@@ -165,7 +171,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         sampling = sampled;
 
     vpmu = vcpu_vpmu(sampling);
-    if ( !is_hvm_domain(sampling->domain) )
+    if ( !is_hvm_domain(sampling->domain) || (vpmu_mode & XENPMU_MODE_ALL) )
     {
         /* PV(H) guest */
         const struct cpu_user_regs *cur_regs;
@@ -177,6 +183,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             return 1;
 
         if ( is_pvh_domain(sampled->domain) &&
+             !(vpmu_mode & XENPMU_MODE_ALL) &&
              !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
@@ -219,7 +226,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             r->rip = cur_regs->rip;
             r->rsp = cur_regs->rsp;
 
-            if ( !is_pvh_domain(sampled->domain) )
+            if ( !has_hvm_container_domain(sampled->domain) )
             {
                 r->cs = cur_regs->cs;
                 if ( sampled->arch.flags & TF_kernel_mode )
@@ -234,7 +241,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             }
         }
 
-        vpmu->xenpmu_data->domain_id = DOMID_SELF;
+        vpmu->xenpmu_data->domain_id = (sampled == sampling) ?
+                                       DOMID_SELF :
+                                       sampled->domain->domain_id;
         vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
         vpmu->xenpmu_data->pcpu_id = smp_processor_id();
 
@@ -608,7 +617,8 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         if ( copy_from_guest(&pmu_params, arg, 1) )
             return -EFAULT;
 
-        if ( pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV) )
+        if ( pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV |
+                                XENPMU_MODE_ALL) )
             return -EINVAL;
 
         /*
@@ -622,7 +632,7 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         current_mode = vpmu_mode;
         vpmu_mode = pmu_params.val;
 
-        if ( vpmu_mode == XENPMU_MODE_OFF )
+        if ( (vpmu_mode == XENPMU_MODE_OFF) || (vpmu_mode == XENPMU_MODE_ALL) )
         {
             /*
              * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index cc70514..bbecbd0 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2579,6 +2579,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
                 if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
                 {
+                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                         !is_hardware_domain(v->domain) )
+                        break;
+
                     if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
                         goto fail;
                 }
@@ -2701,6 +2705,14 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
                 if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
                 {
+                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                         !is_hardware_domain(v->domain) )
+                    {
+                        /* Don't leak PMU MSRs to unprivileged domains */
+                        regs->eax = regs->edx = 0;
+                        break;
+                    }
+
                     if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
                         goto fail;
 
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index a1886a5..f900f90 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -51,10 +51,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  * - XENPMU_MODE_SELF:  Guests can profile themselves
  * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
  *                      itself and Xen
+ * - XENPMU_MODE_ALL:   Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
  */
 #define XENPMU_MODE_OFF           0
 #define XENPMU_MODE_SELF          (1<<0)
 #define XENPMU_MODE_HV            (1<<1)
+#define XENPMU_MODE_ALL           (1<<2)
 
 /*
  * PMU features:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (17 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-30  8:37   ` Jan Beulich
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add support for using NMIs as PMU interrupts to allow profiling hypervisor
when interrupts are disabled.

Most of processing is still performed by vpmu_do_interrupt(). However, since
certain operations are not NMI-safe we defer them to a softint that vpmu_do_interrupt()
will schedule:
* For PV guests that would be send_guest_vcpu_virq()
* For HVM guests it's VLAPIC accesses and hvm_get_segment_register() (the later
can be called in privileged profiling mode when the interrupted guest is an HVM one).

With send_guest_vcpu_virq() and hvm_get_segment_register() for PV(H) and vlapic
accesses for HVM moved to sofint, the only routines/macros that vpmu_do_interrupt()
calls in NMI mode are:
* memcpy()
* querying domain type (is_XX_domain())
* guest_cpu_user_regs()
* XLAT_cpu_user_regs()
* raise_softirq()
* vcpu_vpmu()
* vpmu_ops->arch_vpmu_save()
* vpmu_ops->do_interrupt()

The latter two only access PMU MSRs with {rd,wr}msrl() (not the _safe versions
which would not be NMI-safe).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 docs/misc/xen-command-line.markdown |   8 +-
 xen/arch/x86/hvm/svm/vpmu.c         |   3 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c   |   3 +-
 xen/arch/x86/hvm/vpmu.c             | 188 +++++++++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/vpmu.h      |   4 +-
 xen/include/xen/softirq.h           |   1 +
 6 files changed, 166 insertions(+), 41 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index af93e17..7c608dd 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1227,11 +1227,11 @@ Use Virtual Processor ID support if available.  This prevents the need for TLB
 flushes on VM entry and exit, increasing performance.
 
 ### vpmu
-> `= ( bts )`
+> `= ( [nmi,][bts] )`
 
 > Default: `off`
 
-Switch on the virtualized performance monitoring unit for HVM guests.
+Switch on the virtualized performance monitoring unit.
 
 If the current cpu isn't supported a message like  
 'VPMU: Initialization failed. ...'  
@@ -1243,6 +1243,10 @@ wrong behaviour (see handle\_pmc\_quirk()).
 If 'vpmu=bts' is specified the virtualisation of the Branch Trace Store (BTS)
 feature is switched on on Intel processors supporting this feature.
 
+If 'vpmu=nmi' is specified the PMU interrupt will cause an NMI instead of a
+regular vector interrupt (which is the default). This can be useful for sampling
+hypervisor code that is executed with interrupts disabled.
+
 *Warning:*
 As the BTS virtualisation is not 100% safe and because of the nehalem quirk
 don't use the vpmu flag on production systems with Intel cpus!
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 055b21c..9db0559 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -169,7 +169,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
     msr_bitmap_off(vpmu);
 }
 
-static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+static int amd_vpmu_do_interrupt(const struct cpu_user_regs *regs)
 {
     return 1;
 }
@@ -224,6 +224,7 @@ static inline void context_save(struct vcpu *v)
         rdmsrl(counters[i], counter_regs[i]);
 }
 
+/* Must be NMI-safe */
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 0f605bd..2c84194 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -300,6 +300,7 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
+/* Must be NMI-safe */
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
@@ -711,7 +712,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     }
 }
 
-static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+static int core2_vpmu_do_interrupt(const struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
     u64 msr_content;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index fae778e..4b34109 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -34,6 +34,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 #include <public/pmu.h>
 #include <xen/tasklet.h>
 #include <xsm/xsm.h>
@@ -54,36 +55,57 @@ uint64_t __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
+static void pmu_softnmi(void);
+
 static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
+
+static uint32_t __read_mostly vpmu_interrupt_type = PMU_APIC_VECTOR;
 
 static void __init parse_vpmu_param(char *s)
 {
-    switch ( parse_bool(s) )
-    {
-    case 0:
-        break;
-    default:
-        if ( !strcmp(s, "bts") )
-            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
-        else if ( *s )
+    char *ss;
+
+    vpmu_mode = XENPMU_MODE_SELF;
+    if (*s == '\0')
+        return;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch  ( parse_bool(s) )
         {
-            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
-            break;
+        case 0:
+            vpmu_mode = XENPMU_MODE_OFF;
+            /* FALLTHROUGH */
+        case 1:
+            return;
+        default:
+            if ( !strcmp(s, "nmi") )
+                vpmu_interrupt_type = APIC_DM_NMI;
+            else if ( !strcmp(s, "bts") )
+                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
+            else
+            {
+                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+                vpmu_mode = XENPMU_MODE_OFF;
+                return;
+            }
         }
-        /* fall through */
-    case 1:
-        /* Default VPMU mode */
-        vpmu_mode = XENPMU_MODE_SELF;
-        break;
-    }
+
+        s = ss + 1;
+    } while ( ss );
 }
 
+
 void vpmu_lvtpc_update(uint32_t val)
 {
     struct vcpu *curr = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | (val & APIC_LVT_MASKED);
 
     /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
     if ( is_hvm_domain(curr->domain) ||
@@ -91,6 +113,24 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
+static void vpmu_send_interrupt(struct vcpu *v)
+{
+    struct vlapic *vlapic;
+    u32 vlapic_lvtpc;
+
+    ASSERT( is_hvm_vcpu(v) );
+
+    vlapic = vcpu_vlapic(v);
+    if ( !is_vlapic_lvtpc_enabled(vlapic) )
+        return;
+
+    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
+    else
+        v->nmi_pending = 1;
+}
+
 int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
                 uint64_t supported, bool_t is_write)
 {
@@ -151,7 +191,8 @@ static struct vcpu *choose_hwdom_vcpu(void)
     return v;
 }
 
-int vpmu_do_interrupt(struct cpu_user_regs *regs)
+/* This routine may be called in NMI context */
+int vpmu_do_interrupt(const struct cpu_user_regs *regs)
 {
     struct vcpu *sampled = current, *sampling;
     struct vpmu_struct *vpmu;
@@ -232,8 +273,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
                 if ( sampled->arch.flags & TF_kernel_mode )
                     r->cs &= ~3;
             }
-            else
+            else if ( !(vpmu_interrupt_type & APIC_DM_NMI) )
             {
+                /* Unsafe in NMI context, defer to softint later */
                 struct segment_register seg_cs;
 
                 hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
@@ -251,30 +293,30 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 
-        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            this_cpu(sampled_vcpu) = sampled;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
 
         return 1;
     }
 
     if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(sampling);
-        u32 vlapic_lvtpc;
-        unsigned char int_vec;
-
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
-        if ( !is_vlapic_lvtpc_enabled(vlapic) )
-            return 1;
-
-        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-        int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
-
-        if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(sampling), int_vec, 0);
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            this_cpu(sampled_vcpu) = sampled;
+            raise_softirq(PMU_SOFTIRQ);
+        }
         else
-            sampling->nmi_pending = 1;
+            vpmu_send_interrupt(sampling);
+
         return 1;
     }
 
@@ -307,6 +349,9 @@ static void vpmu_save_force(void *arg)
     vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
+
+    /* Make sure there are no outstanding PMU NMIs */
+    pmu_softnmi();
 }
 
 void vpmu_save(struct vcpu *v)
@@ -324,7 +369,10 @@ void vpmu_save(struct vcpu *v)
         if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu_interrupt_type | APIC_LVT_MASKED);
+
+    /* Make sure there are no outstanding PMU NMIs */
+    pmu_softnmi();
 }
 
 void vpmu_load(struct vcpu *v)
@@ -378,6 +426,9 @@ void vpmu_load(struct vcpu *v)
           (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )
         return;
 
+    /* Make sure there are no outstanding PMU NMIs from previous vcpu */
+    pmu_softnmi();
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
     {
         apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
@@ -400,7 +451,7 @@ void vpmu_initialise(struct vcpu *v)
         vpmu_destroy(v);
     vpmu_clear(vpmu);
     vpmu->context = NULL;
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
+    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | APIC_LVT_MASKED;
 
     switch ( vendor )
     {
@@ -436,11 +487,57 @@ void vpmu_destroy(struct vcpu *v)
     }
 }
 
+/* Process the softirq set by PMU NMI handler */
+static void pmu_softnmi(void)
+{
+    unsigned int cpu = smp_processor_id();
+    struct vcpu *v, *sampled = per_cpu(sampled_vcpu, cpu);
+
+    if ( sampled == NULL )
+        return;
+
+    per_cpu(sampled_vcpu, cpu) = NULL;
+
+    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+            v = choose_hwdom_vcpu();
+            if ( !v )
+                return;
+    }
+    else
+    {
+        if ( is_hvm_domain(sampled->domain) )
+        {
+            vpmu_send_interrupt(sampled);
+            return;
+        }
+        v = sampled;
+    }
+
+    if ( has_hvm_container_domain(sampled->domain) )
+    {
+        struct segment_register seg_cs;
+
+        hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+int pmu_nmi_interrupt(const struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
 static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
 {
     struct vcpu *v;
     struct page_info *page;
     uint64_t gfn = params->val;
+    static bool_t __read_mostly pvpmu_init_done;
+    static DEFINE_SPINLOCK(init_lock);
 
     if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
          (d->vcpu[params->vcpu] == NULL) )
@@ -464,6 +561,27 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
         return -EINVAL;
     }
 
+    spin_lock(&init_lock);
+
+    if ( !pvpmu_init_done )
+    {
+        if ( reserve_lapic_nmi() != 0 )
+        {
+            spin_unlock(&init_lock);
+            printk(XENLOG_G_ERR "Failed to reserve PMU NMI\n");
+            put_page(page);
+            return -EBUSY;
+        }
+
+        set_nmi_callback(pmu_nmi_interrupt);
+
+        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
+
+        pvpmu_init_done = 1;
+    }
+
+    spin_unlock(&init_lock);
+
     vpmu_initialise(v);
 
     return 0;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index a6e7934..7d86f64 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -42,7 +42,7 @@ struct arch_vpmu_ops {
     int (*do_wrmsr)(unsigned int msr, uint64_t msr_content,
                     uint64_t supported);
     int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-    int (*do_interrupt)(struct cpu_user_regs *regs);
+    int (*do_interrupt)(const struct cpu_user_regs *regs);
     void (*do_cpuid)(unsigned int input,
                      unsigned int *eax, unsigned int *ebx,
                      unsigned int *ecx, unsigned int *edx);
@@ -98,7 +98,7 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
 void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
                 uint64_t supported, bool_t is_write);
-int vpmu_do_interrupt(struct cpu_user_regs *regs);
+int vpmu_do_interrupt(const struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                        unsigned int *ecx, unsigned int *edx);
 void vpmu_initialise(struct vcpu *v);
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index 0895a16..9bada14 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -8,6 +8,7 @@ enum {
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
     TASKLET_SOFTIRQ,
+    PMU_SOFTIRQ,
     NR_COMMON_SOFTIRQS
 };
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v12 for-xen-4.5 20/20] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (18 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-09-25 19:28 ` Boris Ostrovsky
  2014-09-30  8:40   ` Jan Beulich
  2014-09-26 17:03 ` [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Konrad Rzeszutek Wilk
  2014-09-29 13:28 ` Dietmar Hahn
  21 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-25 19:28 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Since PMU is now not HVM specific we can move VPMU-related files up from
arch/x86/hvm/ directory.

Specifically:
    arch/x86/hvm/vpmu.c -> arch/x86/vpmu.c
    arch/x86/hvm/svm/vpmu.c -> arch/x86/vpmu_amd.c
    arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/vpmu_intel.c
    include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/Makefile                               | 1 +
 xen/arch/x86/hvm/Makefile                           | 1 -
 xen/arch/x86/hvm/svm/Makefile                       | 1 -
 xen/arch/x86/hvm/vlapic.c                           | 2 +-
 xen/arch/x86/hvm/vmx/Makefile                       | 1 -
 xen/arch/x86/oprofile/op_model_ppro.c               | 2 +-
 xen/arch/x86/traps.c                                | 2 +-
 xen/arch/x86/{hvm => }/vpmu.c                       | 2 +-
 xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}         | 2 +-
 xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} | 2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h                  | 2 +-
 xen/include/asm-x86/{hvm => }/vpmu.h                | 0
 12 files changed, 8 insertions(+), 10 deletions(-)
 rename xen/arch/x86/{hvm => }/vpmu.c (99%)
 rename xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c} (99%)
 rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} (99%)
 rename xen/include/asm-x86/{hvm => }/vpmu.h (100%)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index c1e244d..e985cfc 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -59,6 +59,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..742b83b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,3 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
index a10a55e..760d295 100644
--- a/xen/arch/x86/hvm/svm/Makefile
+++ b/xen/arch/x86/hvm/svm/Makefile
@@ -6,4 +6,3 @@ obj-y += nestedsvm.o
 obj-y += svm.o
 obj-y += svmdebug.o
 obj-y += vmcb.o
-obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index f8cdc9b..daecc6b 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,7 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..04a29ce 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -3,5 +3,4 @@ obj-y += intr.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
-obj-y += vpmu_core2.o
 obj-y += vvmx.o
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index ca429a1..89649d0 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -19,7 +19,7 @@
 #include <asm/processor.h>
 #include <asm/regs.h>
 #include <asm/current.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index bbecbd0..aac1091 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,7 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/vpmu.c
similarity index 99%
rename from xen/arch/x86/hvm/vpmu.c
rename to xen/arch/x86/vpmu.c
index 4b34109..8f56bea 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/vpmu.c
@@ -27,10 +27,10 @@
 #include <asm/types.h>
 #include <asm/msr.h>
 #include <asm/p2m.h>
+#include <asm/vpmu.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
-#include <asm/hvm/vpmu.h>
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/vpmu_amd.c
similarity index 99%
rename from xen/arch/x86/hvm/svm/vpmu.c
rename to xen/arch/x86/vpmu_amd.c
index 9db0559..bae7a53 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/vpmu_amd.c
@@ -28,8 +28,8 @@
 #include <xen/sched.h>
 #include <xen/irq.h>
 #include <asm/apic.h>
+#include <asm/vpmu.h>
 #include <asm/hvm/vlapic.h>
-#include <asm/hvm/vpmu.h>
 #include <public/pmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/vpmu_intel.c
similarity index 99%
rename from xen/arch/x86/hvm/vmx/vpmu_core2.c
rename to xen/arch/x86/vpmu_intel.c
index 2c84194..e9f68a4 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/vpmu_intel.c
@@ -30,6 +30,7 @@
 #include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
+#include <asm/vpmu.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vmx/vmx.h>
@@ -37,7 +38,6 @@
 #include <public/sched.h>
 #include <public/hvm/save.h>
 #include <public/pmu.h>
-#include <asm/hvm/vpmu.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 949884b..dcf2d31 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -20,7 +20,7 @@
 #define __ASM_X86_HVM_VMX_VMCS_H__
 
 #include <asm/hvm/io.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <irq_vectors.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/vpmu.h
similarity index 100%
rename from xen/include/asm-x86/hvm/vpmu.h
rename to xen/include/asm-x86/vpmu.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
@ 2014-09-26 14:49   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 14:49 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

On Thu, Sep 25, 2014 at 03:28:38PM -0400, Boris Ostrovsky wrote:
> There is a possibility that we set VPMU_CONTEXT_SAVE on VPMU context in
> vpmu_load() and never clear it (because vpmu_save_force() will see
> VPMU_CONTEXT_LOADED bit clear, which is possible on AMD processors)
> 
> The problem is that amd_vpmu_save() assumes that if VPMU_CONTEXT_SAVE is set
> then (1) we need to save counters and (2) we don't need to "stop" control
> registers since they must have been stopped earlier. The latter may cause all
> sorts of problem (like counters still running in a wrong guest and hypervisor
> sending to that guest unexpected PMU interrupts).
> 
> Since setting this flag is currently always done prior to calling
> vpmu_save_force() let's both set and clear it there.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/arch/x86/hvm/vpmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 15d5b6f..451b346 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -130,6 +130,8 @@ static void vpmu_save_force(void *arg)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>          return;
>  
> +    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> +
>      if ( vpmu->arch_vpmu_ops )
>          (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
>  
> @@ -178,7 +180,6 @@ void vpmu_load(struct vcpu *v)
>           */
>          if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>          {
> -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>              on_selected_cpus(cpumask_of(vpmu->last_pcpu),
>                               vpmu_save_force, (void *)v, 1);
>              vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> @@ -195,7 +196,6 @@ void vpmu_load(struct vcpu *v)
>          vpmu = vcpu_vpmu(prev);
>  
>          /* Someone ran here before us */
> -        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>          vpmu_save_force(prev);
>          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>  
> -- 
> 1.8.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-09-26 14:58   ` Konrad Rzeszutek Wilk
  2014-09-26 15:10     ` Jan Beulich
  2014-09-26 21:43   ` Daniel De Graaf
  1 sibling, 1 reply; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 14:58 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> index 053b9fa..4f21b17 100644
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -527,6 +527,24 @@ struct xenpf_core_parking {
>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>  
> +#define XENPF_get_symbol   61
> +struct xenpf_symdata {
> +    /* IN/OUT variables */
> +    uint32_t namelen; /* IN:  size of name buffer                       */
> +                      /* OUT: strlen(name) of hypervisor symbol (may be */
> +                      /*      larger than what's been copied to guest)  */
> +    uint32_t symnum;  /* IN:  Symbol to read                            */
> +                      /* OUT: Next available symbol. If same as IN then */
> +                      /*      we reached the end                        */
> +
> +    /* OUT variables */
> +    char type;
> +    XEN_GUEST_HANDLE(char) name;
> +    uint64_t address;
> +};
> +typedef struct xenpf_symdata xenpf_symdata_t;

This is what 'pahole' says:

struct xenpf_symdata {                                                          
    uint32_t                   namelen;              /*     0     4 */          
    uint32_t                   symnum;               /*     4     4 */          
    char                       type;                 /*     8     1 */          
                                                                                
    /* XXX 7 bytes hole, try to pack */                                         
                                                                                
    __guest_handle_char        name;                 /*    16     8 */          
    uint64_t                   address;              /*    24     8 */          
                                                                                
    /* size: 32, cachelines: 1, members: 5 */                                   
    /* sum members: 25, holes: 1, sum holes: 7 */                               
    /* last cacheline: 32 bytes */                                              
};                                            

If I move them just a bit:


diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 4f21b17..b97e476 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -538,9 +538,9 @@ struct xenpf_symdata {
                       /*      we reached the end                        */
 
     /* OUT variables */
-    char type;
-    XEN_GUEST_HANDLE(char) name;
     uint64_t address;
+    XEN_GUEST_HANDLE(char) name;
+    char type;
 };
 typedef struct xenpf_symdata xenpf_symdata_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);


'pahole' is satisfied:

struct xenpf_symdata {                                                          
    uint32_t                   namelen;              /*     0     4 */          
    uint32_t                   symnum;               /*     4     4 */          
    uint64_t                   address;              /*     8     8 */          
    __guest_handle_char        name;                 /*    16     8 */          
    char                       type;                 /*    24     1 */          
                                                                                
    /* size: 32, cachelines: 1, members: 5 */                                   
    /* padding: 7 */                                                            
    /* last cacheline: 32 bytes */                                              
};                                    


With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
@ 2014-09-26 14:59   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 14:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

On Thu, Sep 25, 2014 at 03:28:39PM -0400, Boris Ostrovsky wrote:
> In preparation for making VPMU code shared with PV make sure that we we update
> MSR bitmaps only for HVM/PVH guests
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Kevin Tian <kevin.tian@intel.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/arch/x86/hvm/svm/vpmu.c       | 21 +++++++++++++--------
>  xen/arch/x86/hvm/vmx/vpmu_core2.c |  8 +++++---
>  2 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 8e07a98..c7e0946 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -244,7 +244,8 @@ static int amd_vpmu_save(struct vcpu *v)
>  
>      context_save(v);
>  
> -    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
> +    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
> +         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
>          amd_vpmu_unset_msr_bitmap(v);
>  
>      return 1;
> @@ -287,8 +288,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
>      ASSERT(!supported);
>  
>      /* For all counters, enable guest only mode for HVM guest */
> -    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
> -        !(is_guest_mode(msr_content)) )
> +    if ( has_hvm_container_domain(v->domain) &&
> +         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
> +         !is_guest_mode(msr_content) )
>      {
>          set_guest_mode(msr_content);
>      }
> @@ -303,8 +305,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
>          apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
>          vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
>  
> -        if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> -            amd_vpmu_set_msr_bitmap(v);
> +        if ( has_hvm_container_domain(v->domain) &&
> +             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> +             amd_vpmu_set_msr_bitmap(v);
>      }
>  
>      /* stop saving & restore if guest stops first counter */
> @@ -314,8 +317,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
>          apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
>          vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
>          vpmu_reset(vpmu, VPMU_RUNNING);
> -        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> -            amd_vpmu_unset_msr_bitmap(v);
> +        if ( has_hvm_container_domain(v->domain) &&
> +             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> +             amd_vpmu_unset_msr_bitmap(v);
>          release_pmu_ownship(PMU_OWNER_HVM);
>      }
>  
> @@ -406,7 +410,8 @@ static void amd_vpmu_destroy(struct vcpu *v)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
>  
> -    if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> +    if ( has_hvm_container_domain(v->domain) &&
> +         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
>          amd_vpmu_unset_msr_bitmap(v);
>  
>      xfree(vpmu->context);
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 68b6272..c9f6ae4 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -335,7 +335,8 @@ static int core2_vpmu_save(struct vcpu *v)
>      __core2_vpmu_save(v);
>  
>      /* Unset PMU MSR bitmap to trap lazy load. */
> -    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap )
> +    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
> +         has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
>          core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
>  
>      return 1;
> @@ -448,7 +449,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
>      {
>          __core2_vpmu_load(current);
>          vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
> -        if ( cpu_has_vmx_msr_bitmap )
> +        if ( has_hvm_container_domain(current->domain) &&
> +             cpu_has_vmx_msr_bitmap )
>              core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
>      }
>      return 1;
> @@ -822,7 +824,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
>          return;
>      xfree(core2_vpmu_cxt->pmu_enable);
>      xfree(vpmu->context);
> -    if ( cpu_has_vmx_msr_bitmap )
> +    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
>          core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
>      release_pmu_ownship(PMU_OWNER_HVM);
>      vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> -- 
> 1.8.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-26 14:58   ` Konrad Rzeszutek Wilk
@ 2014-09-26 15:10     ` Jan Beulich
  2014-09-26 16:49       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-26 15:10 UTC (permalink / raw)
  To: Boris Ostrovsky, Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 26.09.14 at 16:58, <konrad.wilk@oracle.com> wrote:
> If I move them just a bit:
> 
> 
> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> index 4f21b17..b97e476 100644
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -538,9 +538,9 @@ struct xenpf_symdata {
>                        /*      we reached the end                        */
>  
>      /* OUT variables */
> -    char type;
> -    XEN_GUEST_HANDLE(char) name;
>      uint64_t address;
> +    XEN_GUEST_HANDLE(char) name;
> +    char type;
>  };
>  typedef struct xenpf_symdata xenpf_symdata_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
> 
> 
> 'pahole' is satisfied:
> 
> struct xenpf_symdata {                                                       
>     uint32_t                   namelen;              /*     0     4 */       
>     uint32_t                   symnum;               /*     4     4 */       
>     uint64_t                   address;              /*     8     8 */       
>     __guest_handle_char        name;                 /*    16     8 */       
>     char                       type;                 /*    24     1 */       
>                                                                              
>     /* size: 32, cachelines: 1, members: 5 */                                
>     /* padding: 7 */                                                         
>     /* last cacheline: 32 bytes */                                           
> };                                    
> 
> 
> With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

This change buys us exactly nothing: Structure size doesn't change,
and 7 bytes of padding are still there.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
@ 2014-09-26 16:34   ` Konrad Rzeszutek Wilk
  2014-09-26 16:44     ` Boris Ostrovsky
  2014-09-29 16:04   ` Jan Beulich
  2014-10-01  0:17   ` Tian, Kevin
  2 siblings, 1 reply; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 16:34 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index b8c5682..68a5fb8 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -27,6 +27,7 @@
>  #define XENPMU_feature_set     3
>  #define XENPMU_init            4
>  #define XENPMU_finish          5
> +#define XENPMU_lvtpc_set       6

You also need this:



diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 29dae2e..d98256c 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -664,7 +664,8 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
     case XENPMU_feature_get:
         return xsm_default_action(XSM_PRIV, d, current->domain);
     case XENPMU_init:
-    case XENPMU_finish: 
+    case XENPMU_finish:
+    case XENPMU_lvtpc_set:
         return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
         return -EPERM;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 8bd4a3d..d89a857 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1500,6 +1500,7 @@ static int flask_pmu_op (struct domain *d, int op)
                             XEN2__PMU_CTRL, NULL);
     case XENPMU_init:
     case XENPMU_finish:
+    case XENPMU_lvtpc_set:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
                             XEN2__PMU_USE, NULL);
     default:

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-09-26 16:34   ` Konrad Rzeszutek Wilk
@ 2014-09-26 16:44     ` Boris Ostrovsky
  2014-09-26 16:49       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-26 16:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

On 09/26/2014 12:34 PM, Konrad Rzeszutek Wilk wrote:
>> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
>> index b8c5682..68a5fb8 100644
>> --- a/xen/include/public/pmu.h
>> +++ b/xen/include/public/pmu.h
>> @@ -27,6 +27,7 @@
>>   #define XENPMU_feature_set     3
>>   #define XENPMU_init            4
>>   #define XENPMU_finish          5
>> +#define XENPMU_lvtpc_set       6
> You also need this:

Right, this slipped into the next patch (#16).

-boris

>
>
>
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index 29dae2e..d98256c 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -664,7 +664,8 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
>       case XENPMU_feature_get:
>           return xsm_default_action(XSM_PRIV, d, current->domain);
>       case XENPMU_init:
> -    case XENPMU_finish:
> +    case XENPMU_finish:
> +    case XENPMU_lvtpc_set:
>           return xsm_default_action(XSM_HOOK, d, current->domain);
>       default:
>           return -EPERM;
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 8bd4a3d..d89a857 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1500,6 +1500,7 @@ static int flask_pmu_op (struct domain *d, int op)
>                               XEN2__PMU_CTRL, NULL);
>       case XENPMU_init:
>       case XENPMU_finish:
> +    case XENPMU_lvtpc_set:
>           return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
>                               XEN2__PMU_USE, NULL);
>       default:

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-26 15:10     ` Jan Beulich
@ 2014-09-26 16:49       ` Konrad Rzeszutek Wilk
  2014-09-29  6:43         ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	Boris Ostrovsky, dgdegra

On Fri, Sep 26, 2014 at 04:10:09PM +0100, Jan Beulich wrote:
> >>> On 26.09.14 at 16:58, <konrad.wilk@oracle.com> wrote:
> > If I move them just a bit:
> > 
> > 
> > diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> > index 4f21b17..b97e476 100644
> > --- a/xen/include/public/platform.h
> > +++ b/xen/include/public/platform.h
> > @@ -538,9 +538,9 @@ struct xenpf_symdata {
> >                        /*      we reached the end                        */
> >  
> >      /* OUT variables */
> > -    char type;
> > -    XEN_GUEST_HANDLE(char) name;
> >      uint64_t address;
> > +    XEN_GUEST_HANDLE(char) name;
> > +    char type;
> >  };
> >  typedef struct xenpf_symdata xenpf_symdata_t;
> >  DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
> > 
> > 
> > 'pahole' is satisfied:
> > 
> > struct xenpf_symdata {                                                       
> >     uint32_t                   namelen;              /*     0     4 */       
> >     uint32_t                   symnum;               /*     4     4 */       
> >     uint64_t                   address;              /*     8     8 */       
> >     __guest_handle_char        name;                 /*    16     8 */       
> >     char                       type;                 /*    24     1 */       
> >                                                                              
> >     /* size: 32, cachelines: 1, members: 5 */                                
> >     /* padding: 7 */                                                         
> >     /* last cacheline: 32 bytes */                                           
> > };                                    
> > 
> > 
> > With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> This change buys us exactly nothing: Structure size doesn't change,
> and 7 bytes of padding are still there.

It does allow us to put more parameters (if we want to) at the end of the
structure instead of fitting them in between.

But that is more of personel preference in terms of extending the structures.

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-09-26 16:44     ` Boris Ostrovsky
@ 2014-09-26 16:49       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 16:49 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

On Fri, Sep 26, 2014 at 12:44:04PM -0400, Boris Ostrovsky wrote:
> On 09/26/2014 12:34 PM, Konrad Rzeszutek Wilk wrote:
> >>diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> >>index b8c5682..68a5fb8 100644
> >>--- a/xen/include/public/pmu.h
> >>+++ b/xen/include/public/pmu.h
> >>@@ -27,6 +27,7 @@
> >>  #define XENPMU_feature_set     3
> >>  #define XENPMU_init            4
> >>  #define XENPMU_finish          5
> >>+#define XENPMU_lvtpc_set       6
> >You also need this:
> 
> Right, this slipped into the next patch (#16).

Yup.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (19 preceding siblings ...)
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
@ 2014-09-26 17:03 ` Konrad Rzeszutek Wilk
  2014-09-29 13:28 ` Dietmar Hahn
  21 siblings, 0 replies; 92+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-26 17:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit, dgdegra

On Thu, Sep 25, 2014 at 03:28:36PM -0400, Boris Ostrovsky wrote:
> Here is the twelfth version of PV(H) PMU patches.

Daniel,

Would you be OK taking a look at patch #1, #11, and #16 (#15
has the new hypercall, but the patch for XSM made it in there)
to see if the XSM changes are correct?

Dietmar,

I took a look at the patches with the SDM and with
https://software.intel.com/sites/default/files/76/87/30320
in hand, and it all looked correct. But having an expert Review it
would be fantastic!

Jan,
I believe the patch #9 has the changes to the structure that
would be satisfactory to you?


Boris,
I only had one change (the one related to 'pahole') and in the grand
scheme of things it does not matter.

All,

Looking at the patches from an perspective of making an exception
for Xen 4.5 I would like to explain my thinking a bit.

The PMU functionality is only active if the system admin boots with 'pmu=1'
or (with this new patchset) through a hypercall to toggle it on/off different modes.

That means from a default use case this functionality is off.
It should have no regressions whatsoever. Digging at the code does demonstrate
that so I am OK with that.

With the use-case of somebody using 'pmu=1' and expecting it to
continue working - that is smaller subset of users, and important nonethless.
I plan on testing it next week just that test-case in mind and making sure
it does not break.


>From an 'awesome release' perspective I believe this patchset is important.
It will allow us to further enhance Xen and figure out where it has problems
and work on fixing them. That either being in the field or on developers
machines.

Barring any major changes above, and if the maintainers are OK with the
patchset (and could provide the Reviewed-by or Acked-by tags), I believe this
patchset warrants an exception.

> 
> Changes in v12:
> 
> * Added XSM support
> * Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL
> * Updated documentation for 'vpmu=nmi' option
> * Added more text to a bunch of commit messages (per Konrad's request)
> 
> Changes in v11:
> 
> * Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch.
>   - as part of this re-work noticed that CS registers were set in later patch then
>     needed. Moved those changes to appropriate place
> * Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only provide dom0
>   with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will be what
>   XENPMU_MODE_SELF used to be.
> * Kept  vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around vmx_add_msr()
> * Cleaned up VPMU context switch macros (moved  'if(prev!=next)' back to context_switch())
> * Dropped hypercall continuation from vpmu_force_context_switch() and replaced it with
>   -EAGAIN error if hypercall_preempt_check() is true after 2ms.
> * Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr()
> * Move context switching patch (#13) earlier in the series (for proper bisection support)
> * Various comment updates and cleanups
> * Dropped a bunch of Reviewed-by and all Tested-by tags
> 
> Changes in v10:
> 
> * Swapped address and name fields of xenpf_symdata (to make it smaller on 32-bit)
> * Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more complicated.
> * Cleaned up vlapic_reg_write()
> * Call vpmu_destroy() for both HVM and PVH VCPUs
> * Verify that (xen_pmu_data+PMU register bank) fit into a page
> * Return error codes from arch-specific VPMU init code
> * Moved VPMU-related context switch logic into inlines
> * vpmu_force_context_switch() changes:
>   o Avoid greater than page-sized allocations
>   o Prevent another VCPU from starting VPMU sync while the first sync is in progress
> * Avoid stack leak in do_xenpmu_op()
> * Checked validity of Intel VPMU MSR values before they are committed
> * Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on AMD)
> * Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => sampling)
> * Clarified commit messages (patches 2, 13, 18) 
> * Various cleanups
> 
> Changes in v9:
> 
> * Restore VPMU context after context_saved() is called in
>   context_switch(). This is needed because vpmu_load() may end up
>   calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
>   to be correctly set/cleared. (patch 18, dropped review acks)
> * Added patch 2 to properly manage VPMU_CONTEXT_LOADED
> * Addressed most of Jan's comments.
>   o Keep track of time in vpmu_force_context_switch() to properly break
>     out of a loop when using hypercall continuations
>   o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
>   o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
>     make it more clear which vcpu we are using
>   o Cleaned up vpmu_do_wrmsr()
>   o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
>     amd_vpmu_initialise(): throughout the code registers are declared as
>     uint64_t and if we are to add a new type (e.g. reg_t) this should be
>     done in a separate patch, unrelated to this series.
>   o Various more minor cleanups and code style fixes
>   
> Changes in v8:
> 
> * Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
> * Added compat checks for vpmu structures
> * Converted vpmu flag manipulation macros to inline routines
> * Reimplemented vpmu_unload_all() to avoid long loops
> * Reworked PMU fault generation and handling (new patch #12)
> * Added checks for domain->vcpu[] non-NULLness
> * Added more comments, renamed some routines and macros, code style cleanup
> 
> 
> Changes in v7:
> 
> * When reading hypervisor symbols make the caller pass buffer length
>   (as opposed to having this length be part of the API). Make the
>   hypervisor buffer static, make xensyms_read() return zero-length
>   string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
>   drop compat_pf_symdata definition.
> * Spread PVH support across patches as opposed to lumping it into a
>   separate patch
> * Rename vpmu_is_set_all() to vpmu_are_all_set()
> * Split VPMU cleanup patch in two
> * Use memmove when copying VMX guest and host MSRs
> * Make padding of xen_arch_pmu's context union a constand that does not
>   depend on arch context size.
> * Set interface version to 0.1
> * Check pointer validity in pvpmu_init/destroy()
> * Fixed crash in core2_vpmu_dump()
> * Fixed crash in vmx_add_msr()
> * Break handling of Intel and AMD MSRs in traps.c into separate cases
> * Pass full CS selector to guests
> * Add lock in pvpmu init code to prevent potential race
> 
> 
> Changes in v6:
> 
> * Two new patches:
>   o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
>   o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
> * Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled
>   guest from overwriting sampled_vcpu written by de-scheduled VPCU.
> * Keep track of enabled counters on Intel. This was removed in earlier patches and
>   was a mistake. As result of this change struct vpmu will have a pointer to private
>   context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer
>   on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests
>   earlier).
>   Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed
>   agan (core2_vpmu_do_wrmsr() routine, mostly)
> * Replaced references to dom0 with hardware_domain (and is_control_domain with
>   is_hardware_domain for consistency)
> * Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
> * Reverted unnecessary changes in vpmu_initialise()'s switch statement
> * Fixed comment in vpmu_do_interrupt
> 
> 
> Changes in v5:
> 
> * Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()")
>   as no longer needed
> * Added patch number 2 that marks context as loaded before PMU registers are
>   loaded. This prevents situation where a PMU interrupt may occur while context
>   is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
>   code)
> * Renamed xenpmu.h files to pmu.h
> * More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
>   has_hvm_container_domain(). Also explicitly disabled support for PVH until
>   patch 16 to make distinction between usage of the above macros more clear.
> * Added support for disabling VPMU support during runtime.
> * Disable VPMUs for non-privileged domains when switching to privileged
>   profiling mode
> * Added ARM stub for xen_arch_pmu_t
> * Separated vpmu_mode from vpmu_features
> * Moved CS register query to make sure we use appropriate query mechanism for
>   various guest types.
> * LVTPC is now set from value in shared area, not copied from dom0
> * Various code and comments cleanup as suggested by Jan.
> 
> Changes in v4:
> 
> * Added support for PVH guests:
>   o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10
>   o more careful use of is_hvm_domain
>   o Additional patch (16)
> * Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
> * Fixed dom0's VCPU selection in privileged mode
> * Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
>   (don't want to expose compat_cpu_user_regs in a public header)
> * Renamed public structures by prefixing them with "xen_"
> * Added an entry for xenpf_symdata in xlat.lst
> * Fixed pv_cpuid check for vpmu-specific cpuid adjustments
> * Varios code style fixes
> * Eliminated anonymous unions
> * Added more verbiage to NMI patch description
> 
> 
> Changes in v3:
> 
> * Moved PMU MSR banks out from architectural context data structures to allow
> for future expansion without protocol changes
> * PMU interrupts can be either NMIs or regular vector interrupts (the latter
> is the default)
> * Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
> race conditions with the guest
> * Fixed races with PV guest in MSR access handlers
> * More Intel VPMU cleanup
> * Moved NMI-unsafe code from NMI handler
> * Dropped changes to vcpu->is_running
> * Added LVTPC apic handling (cached for PV guests)
> * Separated privileged profiling mode into a standalone patch
> * Separated NMI handling into a standalone patch
> 
> 
> Changes in v2:
> 
> * Xen symbols are exported as data structure (as opoosed to a set of formatted
> strings in v1). Even though one symbol per hypercall is returned performance
> appears to be acceptable: reading whole file from dom0 userland takes on average
> about twice as long as reading /proc/kallsyms
> * More cleanup of Intel VPMU code to simplify publicly exported structures
> * There is an architecture-independent and x86-specific public include files (ARM
> has a stub)
> * General cleanup of public include files to make them more presentable (and
> to make auto doc generation better)
> * Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
> changes to common/schedule.c architecture-independent). Note that this is not
> tested since I don't have access to ARM hardware.
> * PCPU ID of interrupted processor is now passed to PV guest
> 
> 
> The following patch series adds PMU support in Xen for PV(H)
> guests. There is a companion patchset for Linux kernel. In addition,
> another set of changes will be provided (later) for userland perf
> code.
> 
> This version has following limitations:
> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
> * Hypervisor code is only profiled on processors that have running dom0 VCPUs
> on them.
> * No backtrace support.
> 
> A few notes that may help reviewing:
> 
> * A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
> CPU is used for passing registers' values as well as PMU state at the time of
> PMU interrupt.
> * PMU interrupts are taken by hypervisor either as NMIs or regular vector
> interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
> and as virtual interrupts to PV(H) guests
> * PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
> accesses xenpmu_data_t and flushes it to HW it before returning.
> * PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
> in addition to 'vpmu' boot option (which is preserved for back compatibility).
> The following modes are provided:
>   * disable: VPMU is off
>   * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
>   * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
>     in guests is suspended.
> * /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
> /proc/kallsyms)
> * VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
> up from hvm subtree
> 
> 
> 
> Boris Ostrovsky (20):
>   common/symbols: Export hypervisor symbols to privileged guest
>   x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
>   x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
>   x86/VPMU: Make vpmu macros a bit more efficient
>   intel/VPMU: Clean up Intel VPMU code
>   vmx: Merge MSR management routines
>   x86/VPMU: Handle APIC_LVTPC accesses
>   intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
>   x86/VPMU: Add public xenpmu.h
>   x86/VPMU: Make vpmu not HVM-specific
>   x86/VPMU: Interface for setting PMU mode and flags
>   x86/VPMU: Initialize PMU for PV(H) guests
>   x86/VPMU: Save VPMU state for PV guests during context switch
>   x86/VPMU: When handling MSR accesses, leave fault injection to callers
>   x86/VPMU: Add support for PMU register handling on PV guests
>   x86/VPMU: Handle PMU interrupts for PV guests
>   x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
>   x86/VPMU: Add privileged PMU mode
>   x86/VPMU: NMI-based VPMU support
>   x86/VPMU: Move VPMU files up from hvm/ directory
> 
>  docs/misc/xen-command-line.markdown                |   8 +-
>  tools/flask/policy/policy/modules/xen/xen.te       |   7 +
>  xen/arch/x86/Makefile                              |   1 +
>  xen/arch/x86/domain.c                              |  23 +-
>  xen/arch/x86/hvm/Makefile                          |   1 -
>  xen/arch/x86/hvm/hvm.c                             |   3 +-
>  xen/arch/x86/hvm/svm/Makefile                      |   1 -
>  xen/arch/x86/hvm/svm/svm.c                         |  10 +-
>  xen/arch/x86/hvm/vlapic.c                          |   3 +
>  xen/arch/x86/hvm/vmx/Makefile                      |   1 -
>  xen/arch/x86/hvm/vmx/vmcs.c                        |  84 +--
>  xen/arch/x86/hvm/vmx/vmx.c                         |  28 +-
>  xen/arch/x86/hvm/vpmu.c                            | 265 -------
>  xen/arch/x86/oprofile/op_model_ppro.c              |   8 +-
>  xen/arch/x86/platform_hypercall.c                  |  33 +
>  xen/arch/x86/traps.c                               |  60 +-
>  xen/arch/x86/vpmu.c                                | 826 +++++++++++++++++++++
>  xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}        | 158 ++--
>  .../x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c}     | 639 ++++++++--------
>  xen/arch/x86/x86_64/compat/entry.S                 |   4 +
>  xen/arch/x86/x86_64/entry.S                        |   4 +
>  xen/common/event_channel.c                         |   1 +
>  xen/common/symbols.c                               |  54 ++
>  xen/include/Makefile                               |   2 +
>  xen/include/asm-x86/domain.h                       |   2 +
>  xen/include/asm-x86/hvm/vcpu.h                     |   3 -
>  xen/include/asm-x86/hvm/vmx/vmcs.h                 |  18 +-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h           |  51 --
>  xen/include/asm-x86/{hvm => }/vpmu.h               |  94 ++-
>  xen/include/public/arch-arm.h                      |   3 +
>  xen/include/public/arch-x86/pmu.h                  |  77 ++
>  xen/include/public/arch-x86/xen-x86_32.h           |   8 +
>  xen/include/public/arch-x86/xen-x86_64.h           |   8 +
>  xen/include/public/platform.h                      |  19 +
>  xen/include/public/pmu.h                           |  95 +++
>  xen/include/public/xen.h                           |   2 +
>  xen/include/xen/hypercall.h                        |   4 +
>  xen/include/xen/softirq.h                          |   1 +
>  xen/include/xen/symbols.h                          |   3 +
>  xen/include/xlat.lst                               |   5 +
>  xen/include/xsm/dummy.h                            |  20 +
>  xen/include/xsm/xsm.h                              |   6 +
>  xen/xsm/dummy.c                                    |   1 +
>  xen/xsm/flask/hooks.c                              |  28 +
>  xen/xsm/flask/policy/access_vectors                |  18 +-
>  xen/xsm/flask/policy/security_classes              |   1 +
>  46 files changed, 1872 insertions(+), 819 deletions(-)
>  delete mode 100644 xen/arch/x86/hvm/vpmu.c
>  create mode 100644 xen/arch/x86/vpmu.c
>  rename xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c} (74%)
>  rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} (60%)
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  rename xen/include/asm-x86/{hvm => }/vpmu.h (55%)
>  create mode 100644 xen/include/public/arch-x86/pmu.h
>  create mode 100644 xen/include/public/pmu.h
> 
> -- 
> 1.8.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 06/20] vmx: Merge MSR management routines
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 06/20] vmx: Merge MSR management routines Boris Ostrovsky
@ 2014-09-26 20:48   ` Tian, Kevin
  0 siblings, 0 replies; 92+ messages in thread
From: Tian, Kevin @ 2014-09-26 20:48 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: Thursday, September 25, 2014 12:29 PM
> 
> vmx_add_host_load_msr() and vmx_add_guest_msr() share fair amount of
> code. Merge
> them to simplify code maintenance.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        | 84
> +++++++++++++++++++-------------------
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 16 +++++++-
>  2 files changed, 55 insertions(+), 45 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index fc1f882..6649837 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1201,64 +1201,62 @@ int vmx_write_guest_msr(u32 msr, u64 val)
>      return -ESRCH;
>  }
> 
> -int vmx_add_guest_msr(u32 msr)
> +int vmx_add_msr(u32 msr, int type)
>  {
>      struct vcpu *curr = current;
> -    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
> -    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
> +    unsigned int idx, *msr_count;
> +    struct vmx_msr_entry **msr_area, *msr_area_elem;
> +
> +    if ( type == VMX_GUEST_MSR )
> +    {
> +        msr_count = &curr->arch.hvm_vmx.msr_count;
> +        msr_area = &curr->arch.hvm_vmx.msr_area;
> +    }
> +    else
> +    {
> +        ASSERT(type == VMX_HOST_MSR);
> +        msr_count = &curr->arch.hvm_vmx.host_msr_count;
> +        msr_area = &curr->arch.hvm_vmx.host_msr_area;
> +    }
> 
> -    if ( msr_area == NULL )
> +    if ( *msr_area == NULL )
>      {
> -        if ( (msr_area = alloc_xenheap_page()) == NULL )
> +        if ( (*msr_area = alloc_xenheap_page()) == NULL )
>              return -ENOMEM;
> -        curr->arch.hvm_vmx.msr_area = msr_area;
> -        __vmwrite(VM_EXIT_MSR_STORE_ADDR,
> virt_to_maddr(msr_area));
> -        __vmwrite(VM_ENTRY_MSR_LOAD_ADDR,
> virt_to_maddr(msr_area));
> +
> +        if ( type == VMX_GUEST_MSR )
> +        {
> +            __vmwrite(VM_EXIT_MSR_STORE_ADDR,
> virt_to_maddr(*msr_area));
> +            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR,
> virt_to_maddr(*msr_area));
> +        }
> +        else
> +            __vmwrite(VM_EXIT_MSR_LOAD_ADDR,
> virt_to_maddr(*msr_area));
>      }
> 
> -    for ( i = 0; i < msr_count; i++ )
> -        if ( msr_area[i].index == msr )
> +    for ( idx = 0; idx < *msr_count; idx++ )
> +        if ( (*msr_area)[idx].index == msr )
>              return 0;
> 
> -    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
> +    if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
>          return -ENOSPC;
> 
> -    msr_area[msr_count].index = msr;
> -    msr_area[msr_count].mbz   = 0;
> -    msr_area[msr_count].data  = 0;
> -    curr->arch.hvm_vmx.msr_count = ++msr_count;
> -    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
> -    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
> +    msr_area_elem = *msr_area + *msr_count;
> +    msr_area_elem->index = msr;
> +    msr_area_elem->mbz = 0;
> 
> -    return 0;
> -}
> +    ++*msr_count;
> 
> -int vmx_add_host_load_msr(u32 msr)
> -{
> -    struct vcpu *curr = current;
> -    unsigned int i, msr_count = curr->arch.hvm_vmx.host_msr_count;
> -    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
> -
> -    if ( msr_area == NULL )
> +    if ( type == VMX_GUEST_MSR )
>      {
> -        if ( (msr_area = alloc_xenheap_page()) == NULL )
> -            return -ENOMEM;
> -        curr->arch.hvm_vmx.host_msr_area = msr_area;
> -        __vmwrite(VM_EXIT_MSR_LOAD_ADDR,
> virt_to_maddr(msr_area));
> +        msr_area_elem->data = 0;
> +        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
> +        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
> +    }
> +    else
> +    {
> +        rdmsrl(msr, msr_area_elem->data);
> +        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
>      }
> -
> -    for ( i = 0; i < msr_count; i++ )
> -        if ( msr_area[i].index == msr )
> -            return 0;
> -
> -    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
> -        return -ENOSPC;
> -
> -    msr_area[msr_count].index = msr;
> -    msr_area[msr_count].mbz   = 0;
> -    rdmsrl(msr, msr_area[msr_count].data);
> -    curr->arch.hvm_vmx.host_msr_count = ++msr_count;
> -    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
> 
>      return 0;
>  }
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 6a99dca..949884b 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -482,12 +482,15 @@ extern const unsigned int
> vmx_introspection_force_enabled_msrs_size;
> 
>  #define MSR_TYPE_R 1
>  #define MSR_TYPE_W 2
> +
> +#define VMX_GUEST_MSR 0
> +#define VMX_HOST_MSR  1
> +
>  void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
>  void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
>  int vmx_read_guest_msr(u32 msr, u64 *val);
>  int vmx_write_guest_msr(u32 msr, u64 val);
> -int vmx_add_guest_msr(u32 msr);
> -int vmx_add_host_load_msr(u32 msr);
> +int vmx_add_msr(u32 msr, int type);
>  void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
>  void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
>  void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
> @@ -497,6 +500,15 @@ void virtual_vmcs_exit(void *vvmcs);
>  u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding);
>  void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val);
> 
> +static inline int vmx_add_guest_msr(u32 msr)
> +{
> +    return vmx_add_msr(msr, VMX_GUEST_MSR);
> +}
> +static inline int vmx_add_host_load_msr(u32 msr)
> +{
> +    return vmx_add_msr(msr, VMX_HOST_MSR);
> +}
> +
>  DECLARE_PER_CPU(bool_t, vmxon);
> 
>  #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-09-26 20:49   ` Tian, Kevin
  2014-09-29 14:17   ` Jan Beulich
  1 sibling, 0 replies; 92+ messages in thread
From: Tian, Kevin @ 2014-09-26 20:49 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: Thursday, September 25, 2014 12:29 PM
> 
> Add pmu.h header files, move various macros and structures that will be
> shared between hypervisor and PV guests to it.
> 
> Move MSR banks out of architectural PMU structures to allow for larger sizes
> in the future. The banks are allocated immediately after the context and
> PMU structures store offsets to them.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/arch/x86/hvm/svm/vpmu.c              |  84
> ++++++++++++----------
>  xen/arch/x86/hvm/vmx/vpmu_core2.c        | 118
> +++++++++++++++++--------------
>  xen/arch/x86/hvm/vpmu.c                  |   6 ++
>  xen/arch/x86/oprofile/op_model_ppro.c    |   6 +-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 ---------
>  xen/include/asm-x86/hvm/vpmu.h           |  16 ++---
>  xen/include/public/arch-arm.h            |   3 +
>  xen/include/public/arch-x86/pmu.h        |  77 ++++++++++++++++++++
>  xen/include/public/arch-x86/xen-x86_32.h |   8 +++
>  xen/include/public/arch-x86/xen-x86_64.h |   8 +++
>  xen/include/public/pmu.h                 |  38 ++++++++++
>  11 files changed, 263 insertions(+), 133 deletions(-)
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  create mode 100644 xen/include/public/arch-x86/pmu.h
>  create mode 100644 xen/include/public/pmu.h
> 
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 11e9484..124b147 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -30,10 +30,7 @@
>  #include <asm/apic.h>
>  #include <asm/hvm/vlapic.h>
>  #include <asm/hvm/vpmu.h>
> -
> -#define F10H_NUM_COUNTERS 4
> -#define F15H_NUM_COUNTERS 6
> -#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
> +#include <public/pmu.h>
> 
>  #define MSR_F10H_EVNTSEL_GO_SHIFT   40
>  #define MSR_F10H_EVNTSEL_EN_SHIFT   22
> @@ -49,6 +46,10 @@ static const u32 __read_mostly *counters;
>  static const u32 __read_mostly *ctrls;
>  static bool_t __read_mostly k7_counters_mirrored;
> 
> +#define F10H_NUM_COUNTERS   4
> +#define F15H_NUM_COUNTERS   6
> +#define AMD_MAX_COUNTERS    6
> +
>  /* PMU Counter MSRs. */
>  static const u32 AMD_F10H_COUNTERS[] = {
>      MSR_K7_PERFCTR0,
> @@ -83,12 +84,14 @@ static const u32 AMD_F15H_CTRLS[] = {
>      MSR_AMD_FAM15H_EVNTSEL5
>  };
> 
> -/* storage for context switching */
> -struct amd_vpmu_context {
> -    u64 counters[MAX_NUM_COUNTERS];
> -    u64 ctrls[MAX_NUM_COUNTERS];
> -    bool_t msr_bitmap_set;
> -};
> +/* Use private context as a flag for MSR bitmap */
> +#define msr_bitmap_on(vpmu)    do
> {                                    \
> +                                   (vpmu)->priv_context = (void *)-1L;
> \
> +                               } while (0)
> +#define msr_bitmap_off(vpmu)   do
> {                                    \
> +                                   (vpmu)->priv_context = NULL;
> \
> +                               } while (0)
> +#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
> 
>  static inline int get_pmu_reg_type(u32 addr)
>  {
> @@ -142,7 +145,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu
> *v)
>  {
>      unsigned int i;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctxt = vpmu->context;
> 
>      for ( i = 0; i < num_counters; i++ )
>      {
> @@ -150,14 +152,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu
> *v)
>          svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
>      }
> 
> -    ctxt->msr_bitmap_set = 1;
> +    msr_bitmap_on(vpmu);
>  }
> 
>  static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
>  {
>      unsigned int i;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctxt = vpmu->context;
> 
>      for ( i = 0; i < num_counters; i++ )
>      {
> @@ -165,7 +166,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu
> *v)
>          svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
>      }
> 
> -    ctxt->msr_bitmap_set = 0;
> +    msr_bitmap_off(vpmu);
>  }
> 
>  static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
> @@ -177,19 +178,22 @@ static inline void context_load(struct vcpu *v)
>  {
>      unsigned int i;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctxt = vpmu->context;
> +    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
> +    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
> +    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
> 
>      for ( i = 0; i < num_counters; i++ )
>      {
> -        wrmsrl(counters[i], ctxt->counters[i]);
> -        wrmsrl(ctrls[i], ctxt->ctrls[i]);
> +        wrmsrl(counters[i], counter_regs[i]);
> +        wrmsrl(ctrls[i], ctrl_regs[i]);
>      }
>  }
> 
>  static void amd_vpmu_load(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctxt = vpmu->context;
> +    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
> +    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
> 
>      vpmu_reset(vpmu, VPMU_FROZEN);
> 
> @@ -198,7 +202,7 @@ static void amd_vpmu_load(struct vcpu *v)
>          unsigned int i;
> 
>          for ( i = 0; i < num_counters; i++ )
> -            wrmsrl(ctrls[i], ctxt->ctrls[i]);
> +            wrmsrl(ctrls[i], ctrl_regs[i]);
> 
>          return;
>      }
> @@ -212,17 +216,17 @@ static inline void context_save(struct vcpu *v)
>  {
>      unsigned int i;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctxt = vpmu->context;
> +    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
> +    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
> 
>      /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr
> */
>      for ( i = 0; i < num_counters; i++ )
> -        rdmsrl(counters[i], ctxt->counters[i]);
> +        rdmsrl(counters[i], counter_regs[i]);
>  }
> 
>  static int amd_vpmu_save(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctx = vpmu->context;
>      unsigned int i;
> 
>      /*
> @@ -245,7 +249,7 @@ static int amd_vpmu_save(struct vcpu *v)
>      context_save(v);
> 
>      if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
> -         has_hvm_container_domain(v->domain) &&
> ctx->msr_bitmap_set )
> +         has_hvm_container_domain(v->domain) &&
> is_msr_bitmap_on(vpmu) )
>          amd_vpmu_unset_msr_bitmap(v);
> 
>      return 1;
> @@ -256,7 +260,9 @@ static void context_update(unsigned int msr, u64
> msr_content)
>      unsigned int i;
>      struct vcpu *v = current;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct amd_vpmu_context *ctxt = vpmu->context;
> +    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
> +    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
> +    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
> 
>      if ( k7_counters_mirrored &&
>          ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
> @@ -268,12 +274,12 @@ static void context_update(unsigned int msr, u64
> msr_content)
>      {
>         if ( msr == ctrls[i] )
>         {
> -           ctxt->ctrls[i] = msr_content;
> +           ctrl_regs[i] = msr_content;
>             return;
>         }
>          else if (msr == counters[i] )
>          {
> -            ctxt->counters[i] = msr_content;
> +            counter_regs[i] = msr_content;
>              return;
>          }
>      }
> @@ -303,8 +309,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>              return 1;
>          vpmu_set(vpmu, VPMU_RUNNING);
> 
> -        if ( has_hvm_container_domain(v->domain) &&
> -             !((struct amd_vpmu_context
> *)vpmu->context)->msr_bitmap_set )
> +        if ( has_hvm_container_domain(v->domain) &&
> is_msr_bitmap_on(vpmu) )
>               amd_vpmu_set_msr_bitmap(v);
>      }
> 
> @@ -313,8 +318,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>          (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu,
> VPMU_RUNNING) )
>      {
>          vpmu_reset(vpmu, VPMU_RUNNING);
> -        if ( has_hvm_container_domain(v->domain) &&
> -             ((struct amd_vpmu_context
> *)vpmu->context)->msr_bitmap_set )
> +        if ( has_hvm_container_domain(v->domain) &&
> is_msr_bitmap_on(vpmu) )
>               amd_vpmu_unset_msr_bitmap(v);
>          release_pmu_ownship(PMU_OWNER_HVM);
>      }
> @@ -355,7 +359,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr,
> uint64_t *msr_content)
> 
>  static int amd_vpmu_initialise(struct vcpu *v)
>  {
> -    struct amd_vpmu_context *ctxt;
> +    struct xen_pmu_amd_ctxt *ctxt;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
> 
> @@ -385,7 +389,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
> 
> -    ctxt = xzalloc(struct amd_vpmu_context);
> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>      if ( !ctxt )
>      {
>          gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> @@ -394,7 +399,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>          return -ENOMEM;
>      }
> 
> +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
> +
>      vpmu->context = ctxt;
> +    vpmu->priv_context = NULL;
>      vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
>      return 0;
>  }
> @@ -406,8 +415,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
> 
> -    if ( has_hvm_container_domain(v->domain) &&
> -         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
> +    if ( has_hvm_container_domain(v->domain) &&
> is_msr_bitmap_on(vpmu) )
>          amd_vpmu_unset_msr_bitmap(v);
> 
>      xfree(vpmu->context);
> @@ -424,7 +432,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
>  static void amd_vpmu_dump(const struct vcpu *v)
>  {
>      const struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    const struct amd_vpmu_context *ctxt = vpmu->context;
> +    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
> +    const uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
> +    const uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
>      unsigned int i;
> 
>      printk("    VPMU state: 0x%x ", vpmu->flags);
> @@ -454,8 +464,8 @@ static void amd_vpmu_dump(const struct vcpu *v)
>          rdmsrl(ctrls[i], ctrl);
>          rdmsrl(counters[i], cntr);
>          printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in
> HW)\n",
> -               ctrls[i], ctxt->ctrls[i], ctrl,
> -               counters[i], ctxt->counters[i], cntr);
> +               ctrls[i], ctrl_regs[i], ctrl,
> +               counters[i], counter_regs[i], cntr);
>      }
>  }
> 
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 79a82a3..beff5c3 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -35,8 +35,8 @@
>  #include <asm/hvm/vmx/vmcs.h>
>  #include <public/sched.h>
>  #include <public/hvm/save.h>
> +#include <public/pmu.h>
>  #include <asm/hvm/vpmu.h>
> -#include <asm/hvm/vmx/vpmu_core2.h>
> 
>  /*
>   * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
> @@ -68,6 +68,10 @@
>  #define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^
> MSR_IA32_A_PERFCTR0))
>  static bool_t __read_mostly full_width_write;
> 
> +/* Intel-specific VPMU features */
> +#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug
> Store */
> +#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch
> Trace Store */
> +
>  /*
>   * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
>   * counters. 4 bits for every counter.
> @@ -75,17 +79,6 @@ static bool_t __read_mostly full_width_write;
>  #define FIXED_CTR_CTRL_BITS 4
>  #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
> 
> -#define VPMU_CORE2_MAX_FIXED_PMCS     4
> -struct core2_vpmu_context {
> -    u64 fixed_ctrl;
> -    u64 ds_area;
> -    u64 pebs_enable;
> -    u64 global_ovf_status;
> -    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
> -    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
> -    struct arch_msr_pair arch_msr_pair[1];
> -};
> -
>  /* Number of general-purpose and fixed performance counters */
>  static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
> 
> @@ -222,6 +215,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int
> *type, int *index)
>      }
>  }
> 
> +#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
>  static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>  {
>      int i;
> @@ -291,12 +285,15 @@ static void
> core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
>  static inline void __core2_vpmu_save(struct vcpu *v)
>  {
>      int i;
> -    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
> +    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
> +    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt,
> fixed_counters);
> +    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
> +        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
> 
>      for ( i = 0; i < fixed_pmc_cnt; i++ )
> -        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
> core2_vpmu_cxt->fix_counters[i]);
> +        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
>      for ( i = 0; i < arch_pmc_cnt; i++ )
> -        rdmsrl(MSR_IA32_PERFCTR0 + i,
> core2_vpmu_cxt->arch_msr_pair[i].counter);
> +        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
>  }
> 
>  static int core2_vpmu_save(struct vcpu *v)
> @@ -319,10 +316,13 @@ static int core2_vpmu_save(struct vcpu *v)
>  static inline void __core2_vpmu_load(struct vcpu *v)
>  {
>      unsigned int i, pmc_start;
> -    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
> +    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
> +    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt,
> fixed_counters);
> +    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
> +        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
> 
>      for ( i = 0; i < fixed_pmc_cnt; i++ )
> -        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
> core2_vpmu_cxt->fix_counters[i]);
> +        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
> 
>      if ( full_width_write )
>          pmc_start = MSR_IA32_A_PERFCTR0;
> @@ -330,8 +330,8 @@ static inline void __core2_vpmu_load(struct vcpu *v)
>          pmc_start = MSR_IA32_PERFCTR0;
>      for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
> -        wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
> -        wrmsrl(MSR_P6_EVNTSEL(i),
> core2_vpmu_cxt->arch_msr_pair[i].control);
> +        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
> +        wrmsrl(MSR_P6_EVNTSEL(i), xen_pmu_cntr_pair[i].control);
>      }
> 
>      wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL,
> core2_vpmu_cxt->fixed_ctrl);
> @@ -354,7 +354,8 @@ static void core2_vpmu_load(struct vcpu *v)
>  static int core2_vpmu_alloc_resource(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt;
> +    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
> +    uint64_t *p = NULL;
> 
>      if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
>          return 0;
> @@ -367,12 +368,20 @@ static int core2_vpmu_alloc_resource(struct vcpu
> *v)
>          goto out_err;
>      vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> 
> -    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
> -                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
> -    if ( !core2_vpmu_cxt )
> +    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
> +                                   sizeof(uint64_t) * fixed_pmc_cnt +
> +                                   sizeof(struct xen_pmu_cntr_pair)
> *
> +                                   arch_pmc_cnt);
> +    p = xzalloc(uint64_t);
> +    if ( !core2_vpmu_cxt || !p )
>          goto out_err;
> 
> -    vpmu->context = (void *)core2_vpmu_cxt;
> +    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
> +    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
> +                                    sizeof(uint64_t) * fixed_pmc_cnt;
> +
> +    vpmu->context = core2_vpmu_cxt;
> +    vpmu->priv_context = p;
> 
>      vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
> 
> @@ -381,6 +390,9 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
>  out_err:
>      release_pmu_ownship(PMU_OWNER_HVM);
> 
> +    xfree(core2_vpmu_cxt);
> +    xfree(p);
> +
>      printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
>             v->vcpu_id, v->domain->domain_id);
> 
> @@ -418,7 +430,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>      int type = -1, index = -1;
>      struct vcpu *v = current;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
> +    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
> +    uint64_t *enabled_cntrs;
> 
>      if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
>      {
> @@ -446,10 +459,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>      ASSERT(!supported);
> 
>      core2_vpmu_cxt = vpmu->context;
> +    enabled_cntrs = vpmu->priv_context;
>      switch ( msr )
>      {
>      case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> -        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
> +        core2_vpmu_cxt->global_status &= ~msr_content;
>          return 1;
>      case MSR_CORE_PERF_GLOBAL_STATUS:
>          gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
> @@ -483,15 +497,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>          break;
>      case MSR_CORE_PERF_FIXED_CTR_CTRL:
>          vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> &global_ctrl);
> -        core2_vpmu_cxt->enabled_cntrs &=
> -                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) <<
> 32);
> +        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
>          if ( msr_content != 0 )
>          {
>              u64 val = msr_content;
>              for ( i = 0; i < fixed_pmc_cnt; i++ )
>              {
>                  if ( val & 3 )
> -                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) <<
> i;
> +                    *enabled_cntrs |= (1ULL << 32) << i;
>                  val >>= FIXED_CTR_CTRL_BITS;
>              }
>          }
> @@ -502,19 +515,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>          tmp = msr - MSR_P6_EVNTSEL(0);
>          if ( tmp >= 0 && tmp < arch_pmc_cnt )
>          {
> +            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
> +                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
> +
>              vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> &global_ctrl);
> 
>              if ( msr_content & (1ULL << 22) )
> -                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
> +                *enabled_cntrs |= 1ULL << tmp;
>              else
> -                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
> +                *enabled_cntrs &= ~(1ULL << tmp);
> 
> -            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
> +            xen_pmu_cntr_pair[tmp].control = msr_content;
>          }
>      }
> 
> -    if ( (global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
> -         (core2_vpmu_cxt->ds_area != 0)  )
> +    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
>          vpmu_set(vpmu, VPMU_RUNNING);
>      else
>          vpmu_reset(vpmu, VPMU_RUNNING);
> @@ -560,7 +575,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr,
> uint64_t *msr_content)
>      int type = -1, index = -1;
>      struct vcpu *v = current;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
> +    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
> 
>      if ( core2_vpmu_msr_common_check(msr, &type, &index) )
>      {
> @@ -571,7 +586,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr,
> uint64_t *msr_content)
>              *msr_content = 0;
>              break;
>          case MSR_CORE_PERF_GLOBAL_STATUS:
> -            *msr_content = core2_vpmu_cxt->global_ovf_status;
> +            *msr_content = core2_vpmu_cxt->global_status;
>              break;
>          case MSR_CORE_PERF_GLOBAL_CTRL:
>              vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> @@ -620,10 +635,12 @@ static void core2_vpmu_dump(const struct vcpu *v)
>  {
>      const struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      unsigned int i;
> -    const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
> +    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
>      u64 val;
> +    uint64_t *fixed_counters;
> +    struct xen_pmu_cntr_pair *cntr_pair;
> 
> -    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
> +    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu,
> VPMU_CONTEXT_ALLOCATED) )
>           return;
> 
>      if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
> @@ -636,16 +653,15 @@ static void core2_vpmu_dump(const struct vcpu *v)
>      }
> 
>      printk("    vPMU running\n");
> -    core2_vpmu_cxt = vpmu->context;
> +
> +    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
> +    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
> 
>      /* Print the contents of the counter and its configuration msr. */
>      for ( i = 0; i < arch_pmc_cnt; i++ )
> -    {
> -        const struct arch_msr_pair *msr_pair =
> core2_vpmu_cxt->arch_msr_pair;
> -
>          printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
> -               i, msr_pair[i].counter, msr_pair[i].control);
> -    }
> +            i, cntr_pair[i].counter, cntr_pair[i].control);
> +
>      /*
>       * The configuration of the fixed counter is 4 bits each in the
>       * MSR_CORE_PERF_FIXED_CTR_CTRL.
> @@ -654,7 +670,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
>      for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
>          printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
> -               i, core2_vpmu_cxt->fix_counters[i],
> +               i, fixed_counters[i],
>                 val & FIXED_CTR_CTRL_MASK);
>          val >>= FIXED_CTR_CTRL_BITS;
>      }
> @@ -665,14 +681,14 @@ static int core2_vpmu_do_interrupt(struct
> cpu_user_regs *regs)
>      struct vcpu *v = current;
>      u64 msr_content;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
> +    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
> 
>      rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
>      if ( msr_content )
>      {
>          if ( is_pmc_quirk )
>              handle_pmc_quirk(msr_content);
> -        core2_vpmu_cxt->global_ovf_status |= msr_content;
> +        core2_vpmu_cxt->global_status |= msr_content;
>          msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
>          wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
>      }
> @@ -739,13 +755,6 @@ static int core2_vpmu_initialise(struct vcpu *v,
> unsigned int vpmu_flags)
> 
>      arch_pmc_cnt = core2_get_arch_pmc_count();
>      fixed_pmc_cnt = core2_get_fixed_pmc_count();
> -    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
> -    {
> -        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
> -        printk(XENLOG_G_WARNING "Limiting number of fixed counters
> to %d\n",
> -               fixed_pmc_cnt);
> -    }
> -
>      check_pmc_quirk();
>      return 0;
>  }
> @@ -758,6 +767,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
>          return;
> 
>      xfree(vpmu->context);
> +    xfree(vpmu->priv_context);
>      if ( has_hvm_container_domain(v->domain) &&
> cpu_has_vmx_msr_bitmap )
>          core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
>      release_pmu_ownship(PMU_OWNER_HVM);
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 0210284..071b869 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -31,6 +31,7 @@
>  #include <asm/hvm/svm/svm.h>
>  #include <asm/hvm/svm/vmcb.h>
>  #include <asm/apic.h>
> +#include <public/pmu.h>
> 
>  /*
>   * "vpmu" :     vpmu generally enabled
> @@ -228,6 +229,11 @@ void vpmu_initialise(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t vendor = current_cpu_data.x86_vendor;
> 
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) >
> XENPMU_CTXT_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) >
> XENPMU_CTXT_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) >
> XENPMU_REGS_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) >
> XENPMU_REGS_PAD_SZ);
> +
>      if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          vpmu_destroy(v);
>      vpmu_clear(vpmu);
> diff --git a/xen/arch/x86/oprofile/op_model_ppro.c
> b/xen/arch/x86/oprofile/op_model_ppro.c
> index aa99e4d..ca429a1 100644
> --- a/xen/arch/x86/oprofile/op_model_ppro.c
> +++ b/xen/arch/x86/oprofile/op_model_ppro.c
> @@ -20,11 +20,15 @@
>  #include <asm/regs.h>
>  #include <asm/current.h>
>  #include <asm/hvm/vpmu.h>
> -#include <asm/hvm/vmx/vpmu_core2.h>
> 
>  #include "op_x86_model.h"
>  #include "op_counter.h"
> 
> +struct arch_msr_pair {
> +    u64 counter;
> +    u64 control;
> +};
> +
>  /*
>   * Intel "Architectural Performance Monitoring" CPUID
>   * detection/enumeration details:
> diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> deleted file mode 100644
> index 410372d..0000000
> --- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> +++ /dev/null
> @@ -1,32 +0,0 @@
> -
> -/*
> - * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
> - *
> - * Copyright (c) 2007, Intel Corporation.
> - *
> - * This program is free software; you can redistribute it and/or modify it
> - * under the terms and conditions of the GNU General Public License,
> - * version 2, as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope it will be useful, but WITHOUT
> - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> or
> - * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> - * more details.
> - *
> - * You should have received a copy of the GNU General Public License along
> with
> - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> - * Place - Suite 330, Boston, MA 02111-1307 USA.
> - *
> - * Author: Haitao Shan <haitao.shan@intel.com>
> - */
> -
> -#ifndef __ASM_X86_HVM_VPMU_CORE_H_
> -#define __ASM_X86_HVM_VPMU_CORE_H_
> -
> -struct arch_msr_pair {
> -    u64 counter;
> -    u64 control;
> -};
> -
> -#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
> -
> diff --git a/xen/include/asm-x86/hvm/vpmu.h
> b/xen/include/asm-x86/hvm/vpmu.h
> index 761c556..f0b2686 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -22,6 +22,8 @@
>  #ifndef __ASM_X86_HVM_VPMU_H_
>  #define __ASM_X86_HVM_VPMU_H_
> 
> +#include <public/pmu.h>
> +
>  /*
>   * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
>   * See arch/x86/hvm/vpmu.c.
> @@ -29,12 +31,9 @@
>  #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
>  #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
> 
> -
> -#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
>  #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
>  #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>                                            arch.hvm_vcpu.vpmu))
> -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
> 
>  #define MSR_TYPE_COUNTER            0
>  #define MSR_TYPE_CTRL               1
> @@ -42,6 +41,9 @@
>  #define MSR_TYPE_ARCH_COUNTER       3
>  #define MSR_TYPE_ARCH_CTRL          4
> 
> +/* Start of PMU register bank */
> +#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
> +
> (uintptr_t)ctxt->offset))
> 
>  /* Arch specific operations shared by all vpmus */
>  struct arch_vpmu_ops {
> @@ -65,7 +67,8 @@ struct vpmu_struct {
>      u32 flags;
>      u32 last_pcpu;
>      u32 hw_lapic_lvtpc;
> -    void *context;
> +    void *context;      /* May be shared with PV guest */
> +    void *priv_context; /* hypervisor-only */
>      struct arch_vpmu_ops *arch_vpmu_ops;
>  };
> 
> @@ -77,11 +80,6 @@ struct vpmu_struct {
>  #define VPMU_FROZEN                         0x10  /* Stop
> counters while VCPU is not running */
>  #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
> 
> -/* VPMU features */
> -#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug
> Store */
> -#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch
> Trace Store */
> -
> -
>  static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
>  {
>      vpmu->flags |= mask;
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index ac54cd6..9de6d66 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -407,6 +407,9 @@ typedef uint64_t xen_callback_t;
> 
>  #endif
> 
> +/* Stub definition of PMU structure */
> +typedef struct xen_pmu_arch {} xen_pmu_arch_t;
> +
>  #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
> 
>  /*
> diff --git a/xen/include/public/arch-x86/pmu.h
> b/xen/include/public/arch-x86/pmu.h
> new file mode 100644
> index 0000000..c0cfc6c
> --- /dev/null
> +++ b/xen/include/public/arch-x86/pmu.h
> @@ -0,0 +1,77 @@
> +#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
> +#define __XEN_PUBLIC_ARCH_X86_PMU_H__
> +
> +/* x86-specific PMU definitions */
> +
> +/* AMD PMU registers and structures */
> +struct xen_pmu_amd_ctxt {
> +    /* Offsets to counter and control MSRs (relative to
> xen_pmu_arch.c.amd) */
> +    uint32_t counters;
> +    uint32_t ctrls;
> +};
> +typedef struct xen_pmu_amd_ctxt xen_pmu_amd_ctxt_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_amd_ctxt_t);
> +
> +/* Intel PMU registers and structures */
> +struct xen_pmu_cntr_pair {
> +    uint64_t counter;
> +    uint64_t control;
> +};
> +typedef struct xen_pmu_cntr_pair xen_pmu_cntr_pair_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_cntr_pair_t);
> +
> +struct xen_pmu_intel_ctxt {
> +    uint64_t global_ctrl;
> +    uint64_t global_ovf_ctrl;
> +    uint64_t global_status;
> +    uint64_t fixed_ctrl;
> +    uint64_t ds_area;
> +    uint64_t pebs_enable;
> +    uint64_t debugctl;
> +    /*
> +     * Offsets to fixed and architectural counter MSRs (relative to
> +     * xen_pmu_arch.c.intel)
> +     */
> +    uint32_t fixed_counters;
> +    uint32_t arch_counters;
> +};
> +typedef struct xen_pmu_intel_ctxt xen_pmu_intel_ctxt_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_intel_ctxt_t);
> +
> +struct xen_pmu_arch {
> +    union {
> +        struct xen_pmu_regs regs;
> +        /* Padding for adding new registers to xen_pmu_regs in the future
> */
> +#define XENPMU_REGS_PAD_SZ  64
> +        uint8_t pad[XENPMU_REGS_PAD_SZ];
> +    } r;
> +    union {
> +        uint32_t lapic_lvtpc;
> +        uint64_t pad;
> +    } l;
> +    union {
> +        struct xen_pmu_amd_ctxt amd;
> +        struct xen_pmu_intel_ctxt intel;
> +
> +        /*
> +         * Padding for contexts (fixed parts only, does not include MSR
> banks
> +         * that are specified by offsets
> +         */
> +#define XENPMU_CTXT_PAD_SZ  128
> +        uint8_t pad[XENPMU_CTXT_PAD_SZ];
> +    } c;
> +};
> +typedef struct xen_pmu_arch xen_pmu_arch_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
> +
> +#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/xen/include/public/arch-x86/xen-x86_32.h
> b/xen/include/public/arch-x86/xen-x86_32.h
> index 1504191..5b437cf 100644
> --- a/xen/include/public/arch-x86/xen-x86_32.h
> +++ b/xen/include/public/arch-x86/xen-x86_32.h
> @@ -136,6 +136,14 @@ struct cpu_user_regs {
>  typedef struct cpu_user_regs cpu_user_regs_t;
>  DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
> 
> +struct xen_pmu_regs {
> +    uint32_t eip;
> +    uint32_t esp;
> +    uint16_t cs;
> +};
> +typedef struct xen_pmu_regs xen_pmu_regs_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
> +
>  /*
>   * Page-directory addresses above 4GB do not fit into architectural %cr3.
>   * When accessing %cr3, or equivalent field in vcpu_guest_context, guests
> diff --git a/xen/include/public/arch-x86/xen-x86_64.h
> b/xen/include/public/arch-x86/xen-x86_64.h
> index 1c4e159..86b6844 100644
> --- a/xen/include/public/arch-x86/xen-x86_64.h
> +++ b/xen/include/public/arch-x86/xen-x86_64.h
> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>  typedef struct cpu_user_regs cpu_user_regs_t;
>  DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
> 
> +struct xen_pmu_regs {
> +    __DECL_REG(ip);
> +    __DECL_REG(sp);
> +    uint16_t cs;
> +};
> +typedef struct xen_pmu_regs xen_pmu_regs_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
> +
>  #undef __DECL_REG
> 
>  #define xen_pfn_to_cr3(pfn) ((unsigned long)(pfn) << 12)
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> new file mode 100644
> index 0000000..e6f45ee
> --- /dev/null
> +++ b/xen/include/public/pmu.h
> @@ -0,0 +1,38 @@
> +#ifndef __XEN_PUBLIC_PMU_H__
> +#define __XEN_PUBLIC_PMU_H__
> +
> +#include "xen.h"
> +#if defined(__i386__) || defined(__x86_64__)
> +#include "arch-x86/pmu.h"
> +#elif defined (__arm__) || defined (__aarch64__)
> +#include "arch-arm.h"
> +#else
> +#error "Unsupported architecture"
> +#endif
> +
> +#define XENPMU_VER_MAJ    0
> +#define XENPMU_VER_MIN    1
> +
> +
> +/* Shared between hypervisor and PV domain */
> +struct xen_pmu_data {
> +    uint32_t domain_id;
> +    uint32_t vcpu_id;
> +    uint32_t pcpu_id;
> +    uint32_t pmu_flags;
> +
> +    xen_pmu_arch_t pmu;
> +};
> +typedef struct xen_pmu_data xen_pmu_data_t;
> +
> +#endif /* __XEN_PUBLIC_PMU_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-09-26 21:04   ` Tian, Kevin
  2014-09-26 21:24     ` Boris Ostrovsky
  2014-09-26 22:00   ` Daniel De Graaf
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 92+ messages in thread
From: Tian, Kevin @ 2014-09-26 21:04 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: Thursday, September 25, 2014 12:29 PM
> 
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * XENPMU_MODE_OFF:  PMU is not virtualized
> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU
> interrupts.
> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged
> guests, dom0
>   can profile itself and the hypervisor.
> 
> Note that PMU modes are different from what can be provided at Xen's boot
> line
> with 'vpmu' argument. An 'off' (or '0') value is equivalent to
> XENPMU_MODE_OFF.
> Any other value, on the other hand, will cause VPMU mode to be set to
> XENPMU_MODE_SELF during boot.
> 
> For feature flags only Intel's BTS is currently supported.
> 
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

I'll need focused time to review this one and several latter big patches, which
can't be afforded this week. Will do that next week.

One immediate comment though. Looks below comment is not fixed
which you said is for XENPMU_MODE_ALL:

>
> +        if ( vpmu_mode == XENPMU_MODE_OFF )
> +        {
> +            /*
> +             * Make sure all (non-dom0) VCPUs have unloaded their
> VPMUs. This
> +             * can be achieved by having all physical processors go
> through
> +             * context_switch().
> +             */
> +            ret = vpmu_force_context_switch();
> +            if ( ret )
> +                vpmu_mode = current_mode;
> +        }
> +
> +        spin_unlock(&xenpmu_mode_lock);
> +        break;
> +    }
> +

Thanks
Kevin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-26 21:04   ` Tian, Kevin
@ 2014-09-26 21:24     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-26 21:24 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

On 09/26/2014 05:04 PM, Tian, Kevin wrote:
>> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
>> Sent: Thursday, September 25, 2014 12:29 PM
>>
>> Add runtime interface for setting PMU mode and flags. Three main modes are
>> provided:
>> * XENPMU_MODE_OFF:  PMU is not virtualized
>> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU
>> interrupts.
>> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged
>> guests, dom0
>>    can profile itself and the hypervisor.
>>
>> Note that PMU modes are different from what can be provided at Xen's boot
>> line
>> with 'vpmu' argument. An 'off' (or '0') value is equivalent to
>> XENPMU_MODE_OFF.
>> Any other value, on the other hand, will cause VPMU mode to be set to
>> XENPMU_MODE_SELF during boot.
>>
>> For feature flags only Intel's BTS is currently supported.
>>
>> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> I'll need focused time to review this one and several latter big patches, which
> can't be afforded this week. Will do that next week.

Thank you.

>
> One immediate comment though. Looks below comment is not fixed
> which you said is for XENPMU_MODE_ALL:


XENPMU_MODE_ALL is defined later, in patch 18, and the 'if' below is 
updated there.

(and I think when I said earlier that the comment was incorrect I was 
wrong. I re-read it and I don't see anything wrong with it. I may have 
been thinking about the 'if' statement not being correct, not realizing 
that it is updated by the later patch).


-boris

>
>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>> +        {
>> +            /*
>> +             * Make sure all (non-dom0) VCPUs have unloaded their
>> VPMUs. This
>> +             * can be achieved by having all physical processors go
>> through
>> +             * context_switch().
>> +             */
>> +            ret = vpmu_force_context_switch();
>> +            if ( ret )
>> +                vpmu_mode = current_mode;
>> +        }
>> +
>> +        spin_unlock(&xenpmu_mode_lock);
>> +        break;
>> +    }
>> +
> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
  2014-09-26 14:58   ` Konrad Rzeszutek Wilk
@ 2014-09-26 21:43   ` Daniel De Graaf
  2014-09-26 22:12     ` Boris Ostrovsky
  1 sibling, 1 reply; 92+ messages in thread
From: Daniel De Graaf @ 2014-09-26 21:43 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn
  Cc: andrew.cooper3, xen-devel, keir, jun.nakajima, tim

On 09/25/2014 03:28 PM, Boris Ostrovsky wrote:
> Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
> hypercall
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
> Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

For the XSM parts:
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

One comment on the patch in general (outside the XSM parts):
[...]
> diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
> index 2162811..68bc6d9 100644
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -23,6 +23,7 @@
>   #include <xen/cpu.h>
>   #include <xen/pmstat.h>
>   #include <xen/irq.h>
> +#include <xen/symbols.h>
>   #include <asm/current.h>
>   #include <public/platform.h>
>   #include <acpi/cpufreq/processor_perf.h>
> @@ -601,6 +602,38 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>       }
>       break;
>
> +    case XENPF_get_symbol:
> +    {
> +        static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
> +        XEN_GUEST_HANDLE(char) nameh;
> +        uint32_t namelen, copylen;
> +
> +        guest_from_compat_handle(nameh, op->u.symdata.name);
> +
> +        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
> +                           &op->u.symdata.address, name);
> +
> +        namelen = strlen(name) + 1;
> +
> +        if ( namelen > op->u.symdata.namelen )
> +        {
> +            /* Caller's buffer is too small for the whole string */
> +            if ( op->u.symdata.namelen )
> +                name[op->u.symdata.namelen] = '\0';

I don't think this assignment is needed at all: name[copylen] is never
copied to the guest and the buffer is not reused internally.

> +            copylen = op->u.symdata.namelen;
> +        }
> +        else
> +            copylen = namelen;
> +
> +        op->u.symdata.namelen = namelen;
> +
> +        if ( !ret && copy_to_guest(nameh, name, copylen) )
> +            ret = -EFAULT;
> +        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
> +            ret = -EFAULT;
> +    }
> +    break;
> +
>       default:
>           ret = -ENOSYS;
>           break;
[...]

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
  2014-09-26 21:04   ` Tian, Kevin
@ 2014-09-26 22:00   ` Daniel De Graaf
  2014-09-26 22:26     ` Boris Ostrovsky
  2014-09-29 13:25   ` Dietmar Hahn
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 92+ messages in thread
From: Daniel De Graaf @ 2014-09-26 22:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit

On 09/25/2014 03:28 PM, Boris Ostrovsky wrote:
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * XENPMU_MODE_OFF:  PMU is not virtualized
> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
>    can profile itself and the hypervisor.
>
> Note that PMU modes are different from what can be provided at Xen's boot line
> with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
> Any other value, on the other hand, will cause VPMU mode to be set to
> XENPMU_MODE_SELF during boot.
>
> For feature flags only Intel's BTS is currently supported.
>
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Do you think it would be useful for some service domain in a disaggregated
system to be able to query but not modify these PMU settings (i.e. only use
XENPMU_mode_get and XENPMU_feature_get)?  If this might be useful, then
splitting up the permission checks to use two bits (pmu_get, pmu_set) is
preferred.  However, I don't want to suggest this split if it will never be
useful - and I think that's the case here.  If you agree, then:

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-09-26 22:09   ` Daniel De Graaf
  2014-09-30  8:11   ` Jan Beulich
  1 sibling, 0 replies; 92+ messages in thread
From: Daniel De Graaf @ 2014-09-26 22:09 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn
  Cc: andrew.cooper3, xen-devel, keir, jun.nakajima, tim

On 09/25/2014 03:28 PM, Boris Ostrovsky wrote:
> Add support for handling PMU interrupts for PV guests.
>
> VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
> hypercall. This allows the guest to access PMU MSR values that are stored in
> VPMU context which is shared between hypervisor and domain, thus avoiding
> traps to hypervisor.
>
> Since the interrupt handler may now force VPMU context save (i.e. set
> VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which
> until now expected this flag to be set only when the counters were stopped.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

For the XSM parts:
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-26 21:43   ` Daniel De Graaf
@ 2014-09-26 22:12     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-26 22:12 UTC (permalink / raw)
  To: Daniel De Graaf, jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn
  Cc: andrew.cooper3, xen-devel, keir, jun.nakajima, tim

On 09/26/2014 05:43 PM, Daniel De Graaf wrote:
>
>> @@ -601,6 +602,38 @@ ret_t 
>> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>       }
>>       break;
>>
>> +    case XENPF_get_symbol:
>> +    {
>> +        static char name[KSYM_NAME_LEN + 1]; /* protected by 
>> xenpf_lock */
>> +        XEN_GUEST_HANDLE(char) nameh;
>> +        uint32_t namelen, copylen;
>> +
>> +        guest_from_compat_handle(nameh, op->u.symdata.name);
>> +
>> +        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
>> +                           &op->u.symdata.address, name);
>> +
>> +        namelen = strlen(name) + 1;
>> +
>> +        if ( namelen > op->u.symdata.namelen )
>> +        {
>> +            /* Caller's buffer is too small for the whole string */
>> +            if ( op->u.symdata.namelen )
>> +                name[op->u.symdata.namelen] = '\0';
>
> I don't think this assignment is needed at all: name[copylen] is never
> copied to the guest and the buffer is not reused internally.


True, this is not needed.

Thanks.
-boris


>
>> +            copylen = op->u.symdata.namelen;
>> +        }
>> +        else
>> +            copylen = namelen;
>> +
>> +        op->u.symdata.namelen = namelen;
>> +
>> +        if ( !ret && copy_to_guest(nameh, name, copylen) )
>> +            ret = -EFAULT;
>> +        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
>> +            ret = -EFAULT;
>> +    }
>> +    break;
>> +
>>       default:
>>           ret = -ENOSYS;
>>           break;
> [...]
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
@ 2014-09-26 22:16   ` Daniel De Graaf
  2014-09-26 22:23     ` Boris Ostrovsky
  2014-09-29 15:25   ` Jan Beulich
  2014-10-01  0:16   ` Tian, Kevin
  2 siblings, 1 reply; 92+ messages in thread
From: Daniel De Graaf @ 2014-09-26 22:16 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn
  Cc: andrew.cooper3, xen-devel, keir, jun.nakajima, tim

On 09/25/2014 03:28 PM, Boris Ostrovsky wrote:
> Code for initializing/tearing down PMU for PV guests
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

With one minor comment tweak (below):
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

[...]
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index 64c7378..36b69c6 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -83,6 +83,8 @@ class xen2
>       get_symbol
>   # PMU control
>       pmu_ctrl
> +# PMU use (anyone has access)
> +    pmu_use

This comment should refer to what the operation does (lets a domain use
PMU - unprivileged operations only / operation on self only), not what
the default policy is.  An administrator may decide not to let certain
guests use PMU (because they are less trusted, or because they are stub
domains that don't support it anyway), and in that case this comment
would be misleading.

>   }
>
>   # Classes domain and domain2 consist of operations that a domain performs on
>


-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-26 22:16   ` Daniel De Graaf
@ 2014-09-26 22:23     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-26 22:23 UTC (permalink / raw)
  To: Daniel De Graaf, jbeulich, kevin.tian, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn
  Cc: andrew.cooper3, xen-devel, keir, jun.nakajima, tim

On 09/26/2014 06:16 PM, Daniel De Graaf wrote:
> On 09/25/2014 03:28 PM, Boris Ostrovsky wrote:
>> Code for initializing/tearing down PMU for PV guests
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> With one minor comment tweak (below):
> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>
> [...]
>> diff --git a/xen/xsm/flask/policy/access_vectors 
>> b/xen/xsm/flask/policy/access_vectors
>> index 64c7378..36b69c6 100644
>> --- a/xen/xsm/flask/policy/access_vectors
>> +++ b/xen/xsm/flask/policy/access_vectors
>> @@ -83,6 +83,8 @@ class xen2
>>       get_symbol
>>   # PMU control
>>       pmu_ctrl
>> +# PMU use (anyone has access)
>> +    pmu_use
>
> This comment should refer to what the operation does (lets a domain use
> PMU - unprivileged operations only / operation on self only), not what
> the default policy is.  An administrator may decide not to let certain
> guests use PMU (because they are less trusted, or because they are stub
> domains that don't support it anyway), and in that case this comment
> would be misleading.

Right, the comment was meant to describe who will be using this operation.

Thanks.
-boris

>
>>   }
>>
>>   # Classes domain and domain2 consist of operations that a domain 
>> performs on
>>
>
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-26 22:00   ` Daniel De Graaf
@ 2014-09-26 22:26     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-26 22:26 UTC (permalink / raw)
  To: Daniel De Graaf
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan,
	suravee.suthikulpanit

On 09/26/2014 06:00 PM, Daniel De Graaf wrote:
> On 09/25/2014 03:28 PM, Boris Ostrovsky wrote:
>> Add runtime interface for setting PMU mode and flags. Three main 
>> modes are
>> provided:
>> * XENPMU_MODE_OFF:  PMU is not virtualized
>> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU 
>> interrupts.
>> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged 
>> guests, dom0
>>    can profile itself and the hypervisor.
>>
>> Note that PMU modes are different from what can be provided at Xen's 
>> boot line
>> with 'vpmu' argument. An 'off' (or '0') value is equivalent to 
>> XENPMU_MODE_OFF.
>> Any other value, on the other hand, will cause VPMU mode to be set to
>> XENPMU_MODE_SELF during boot.
>>
>> For feature flags only Intel's BTS is currently supported.
>>
>> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> Do you think it would be useful for some service domain in a 
> disaggregated
> system to be able to query but not modify these PMU settings (i.e. 
> only use
> XENPMU_mode_get and XENPMU_feature_get)?  If this might be useful, then
> splitting up the permission checks to use two bits (pmu_get, pmu_set) is
> preferred.  However, I don't want to suggest this split if it will 
> never be
> useful - and I think that's the case here.  If you agree, then:
>
> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>


I can't think of a need to use PMU in such a scenario, to be honest.

At least now. Eventually, when we extend its use to profile full system, 
that might be useful.


-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-26 16:49       ` Konrad Rzeszutek Wilk
@ 2014-09-29  6:43         ` Jan Beulich
  2014-09-29 13:29           ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29  6:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	Boris Ostrovsky, dgdegra

>>> On 26.09.14 at 18:49, <konrad.wilk@oracle.com> wrote:
> On Fri, Sep 26, 2014 at 04:10:09PM +0100, Jan Beulich wrote:
>> >>> On 26.09.14 at 16:58, <konrad.wilk@oracle.com> wrote:
>> > If I move them just a bit:
>> > 
>> > 
>> > diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
>> > index 4f21b17..b97e476 100644
>> > --- a/xen/include/public/platform.h
>> > +++ b/xen/include/public/platform.h
>> > @@ -538,9 +538,9 @@ struct xenpf_symdata {
>> >                        /*      we reached the end                        */
>> >  
>> >      /* OUT variables */
>> > -    char type;
>> > -    XEN_GUEST_HANDLE(char) name;
>> >      uint64_t address;
>> > +    XEN_GUEST_HANDLE(char) name;
>> > +    char type;
>> >  };
>> >  typedef struct xenpf_symdata xenpf_symdata_t;
>> >  DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>> > 
>> > 
>> > 'pahole' is satisfied:
>> > 
>> > struct xenpf_symdata {                                                      
>  
>> >     uint32_t                   namelen;              /*     0     4 */      
>  
>> >     uint32_t                   symnum;               /*     4     4 */      
>  
>> >     uint64_t                   address;              /*     8     8 */      
>  
>> >     __guest_handle_char        name;                 /*    16     8 */      
>  
>> >     char                       type;                 /*    24     1 */      
>  
>> >                                                                             
>  
>> >     /* size: 32, cachelines: 1, members: 5 */                               
>  
>> >     /* padding: 7 */                                                        
>  
>> >     /* last cacheline: 32 bytes */                                          
>  
>> > };                                    
>> > 
>> > 
>> > With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> 
>> This change buys us exactly nothing: Structure size doesn't change,
>> and 7 bytes of padding are still there.
> 
> It does allow us to put more parameters (if we want to) at the end of the
> structure instead of fitting them in between.

Regardless of where the gap is, adding further fields in the future
would work only if the code now checked that this field is zero
(which first of all would require it being given a name). I keep
pointing out that this should be done for all padding fields, but I'm
afraid I may have missed doing so on this occasion.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
  2014-09-26 21:04   ` Tian, Kevin
  2014-09-26 22:00   ` Daniel De Graaf
@ 2014-09-29 13:25   ` Dietmar Hahn
  2014-09-29 13:56     ` Boris Ostrovsky
  2014-09-29 13:59     ` Jan Beulich
  2014-09-29 15:14   ` Jan Beulich
  2014-10-01  0:48   ` Tian, Kevin
  4 siblings, 2 replies; 92+ messages in thread
From: Dietmar Hahn @ 2014-09-29 13:25 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Only a minor note below.

Am Donnerstag 25 September 2014, 15:28:47 schrieb Boris Ostrovsky:
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * XENPMU_MODE_OFF:  PMU is not virtualized
> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
>   can profile itself and the hypervisor.
> 
> Note that PMU modes are different from what can be provided at Xen's boot line
> with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
> Any other value, on the other hand, will cause VPMU mode to be set to
> XENPMU_MODE_SELF during boot.
> 
> For feature flags only Intel's BTS is currently supported.
> 
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/flask/policy/policy/modules/xen/xen.te |   3 +
>  xen/arch/x86/domain.c                        |   6 +-
>  xen/arch/x86/hvm/svm/vpmu.c                  |   4 +-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c            |  10 +-
>  xen/arch/x86/hvm/vpmu.c                      | 206 +++++++++++++++++++++++++--
>  xen/arch/x86/x86_64/compat/entry.S           |   4 +
>  xen/arch/x86/x86_64/entry.S                  |   4 +
>  xen/include/Makefile                         |   2 +
>  xen/include/asm-x86/hvm/vpmu.h               |  27 ++--
>  xen/include/public/pmu.h                     |  44 ++++++
>  xen/include/public/xen.h                     |   1 +
>  xen/include/xen/hypercall.h                  |   4 +
>  xen/include/xlat.lst                         |   4 +
>  xen/include/xsm/dummy.h                      |  15 ++
>  xen/include/xsm/xsm.h                        |   6 +
>  xen/xsm/dummy.c                              |   1 +
>  xen/xsm/flask/hooks.c                        |  18 +++
>  xen/xsm/flask/policy/access_vectors          |   2 +
>  18 files changed, 334 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
> index 1937883..fb761cd 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
>  	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
>  	tmem_control getscheduler setscheduler
>  };
> +allow dom0_t xen_t:xen2 {
> +    pmu_ctrl
> +};
>  allow dom0_t xen_t:mmu memorymap;
>  
>  # Allow dom0 to use these domctls on itself. For domctls acting on other
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 7b1dfe6..6a07737 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1503,7 +1503,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>      if ( is_hvm_vcpu(prev) )
>      {
>          if (prev != next)
> -            vpmu_save(prev);
> +            vpmu_switch_from(prev, next);
>  
>          if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>              pt_save_timer(prev);
> @@ -1546,9 +1546,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>                             !is_hardware_domain(next->domain));
>      }
>  
> -    if (is_hvm_vcpu(next) && (prev != next) )
> +    if ( is_hvm_vcpu(prev) && (prev != next) )
>          /* Must be done with interrupts enabled */
> -        vpmu_load(next);
> +        vpmu_switch_to(prev, next);
>  
>      context_saved(prev);
>  
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 124b147..37d8228 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -479,14 +479,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
>      .arch_vpmu_dump = amd_vpmu_dump
>  };
>  
> -int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> +int svm_vpmu_initialise(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
>      int ret = 0;
>  
>      /* vpmu enabled? */
> -    if ( !vpmu_flags )
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>          return 0;
>  
>      switch ( family )
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index beff5c3..c0a45cd 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -703,13 +703,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
>      return 1;
>  }
>  
> -static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> +static int core2_vpmu_initialise(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      u64 msr_content;
>      static bool_t ds_warned;
>  
> -    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
> +    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
>          goto func_out;
>      /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
>      while ( boot_cpu_has(X86_FEATURE_DS) )
> @@ -824,7 +824,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
>      .do_cpuid = core2_no_vpmu_do_cpuid,
>  };
>  
> -int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> +int vmx_vpmu_initialise(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
> @@ -832,7 +832,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>      int ret = 0;
>  
>      vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
> -    if ( !vpmu_flags )
> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>          return 0;
>  
>      if ( family == 6 )
> @@ -875,7 +875,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>          /* future: */
>          case 0x3d:
>          case 0x4e:
> -            ret = core2_vpmu_initialise(v, vpmu_flags);
> +            ret = core2_vpmu_initialise(v);
>              if ( !ret )
>                  vpmu->arch_vpmu_ops = &core2_vpmu_ops;
>              return ret;
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 071b869..5fcee0e 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -21,6 +21,8 @@
>  #include <xen/config.h>
>  #include <xen/sched.h>
>  #include <xen/xenoprof.h>
> +#include <xen/event.h>
> +#include <xen/guest_access.h>
>  #include <asm/regs.h>
>  #include <asm/types.h>
>  #include <asm/msr.h>
> @@ -32,13 +34,22 @@
>  #include <asm/hvm/svm/vmcb.h>
>  #include <asm/apic.h>
>  #include <public/pmu.h>
> +#include <xen/tasklet.h>
> +#include <xsm/xsm.h>
> +
> +#include <compat/pmu.h>
> +CHECK_pmu_params;
> +CHECK_pmu_intel_ctxt;
> +CHECK_pmu_amd_ctxt;
> +CHECK_pmu_cntr_pair;
>  
>  /*
>   * "vpmu" :     vpmu generally enabled
>   * "vpmu=off" : vpmu generally disabled
>   * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
>   */
> -static unsigned int __read_mostly opt_vpmu_enabled;
> +uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
> +uint64_t __read_mostly vpmu_features = 0;
>  static void parse_vpmu_param(char *s);
>  custom_param("vpmu", parse_vpmu_param);
>  
> @@ -52,7 +63,7 @@ static void __init parse_vpmu_param(char *s)
>          break;
>      default:
>          if ( !strcmp(s, "bts") )
> -            opt_vpmu_enabled |= VPMU_BOOT_BTS;
> +            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
>          else if ( *s )
>          {
>              printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
> @@ -60,7 +71,8 @@ static void __init parse_vpmu_param(char *s)
>          }
>          /* fall through */
>      case 1:
> -        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
> +        /* Default VPMU mode */
> +        vpmu_mode = XENPMU_MODE_SELF;
>          break;
>      }
>  }
> @@ -77,6 +89,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>  
> +    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
> +        return 0;
> +
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
>          return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
>      return 0;
> @@ -86,6 +101,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>  
> +    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
> +        return 0;
> +
>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
>          return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
>      return 0;
> @@ -242,19 +260,19 @@ void vpmu_initialise(struct vcpu *v)
>      switch ( vendor )
>      {
>      case X86_VENDOR_AMD:
> -        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
> -            opt_vpmu_enabled = 0;
> +        if ( svm_vpmu_initialise(v) != 0 )
> +            vpmu_mode = XENPMU_MODE_OFF;
>          break;
>  
>      case X86_VENDOR_INTEL:
> -        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
> -            opt_vpmu_enabled = 0;
> +        if ( vmx_vpmu_initialise(v) != 0 )
> +            vpmu_mode = XENPMU_MODE_OFF;
>          break;
>  
>      default:
>          printk("VPMU: Initialization failed. "
>                 "Unknown CPU vendor %d\n", vendor);
> -        opt_vpmu_enabled = 0;
> +        vpmu_mode = XENPMU_MODE_OFF;
>          break;
>      }
>  }
> @@ -276,3 +294,175 @@ void vpmu_dump(struct vcpu *v)
>          vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
>  }
>  
> +static atomic_t vpmu_sched_counter;
> +
> +static void vpmu_sched_checkin(unsigned long unused)
> +{
> +    atomic_inc(&vpmu_sched_counter);
> +}
> +
> +static int vpmu_force_context_switch(void)
> +{
> +    unsigned i, j, allbutself_num, mycpu;
> +    static s_time_t start, now;
> +    struct tasklet **sync_task;
> +    struct vcpu *curr_vcpu = current;
> +    int ret = 0;
> +
> +    allbutself_num = num_online_cpus() - 1;
> +
> +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
> +    if ( !sync_task )
> +    {
> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> +        return -ENOMEM;
> +    }
> +
> +    for ( i = 0; i < allbutself_num; i++ )
> +    {
> +        sync_task[i] = xmalloc(struct tasklet);
> +        if ( sync_task[i] == NULL )
> +        {
> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> +            ret = -ENOMEM;
> +            goto out;
> +        }
> +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);

Only a question of understanding.
Is there a special reason not to use a single memory allocation
except for memory fragmentation on systems with a large number of cpus?

     struct tasklet *sync_task;
     sync_task = xmalloc(sizeof(struct tasklet) * allbutself_num);


> +    }
> +
> +    atomic_set(&vpmu_sched_counter, 0);
> +
> +    j = 0;
> +    mycpu = smp_processor_id();
> +    for_each_online_cpu( i )
> +    {
> +        if ( i != mycpu )
> +            tasklet_schedule_on_cpu(sync_task[j++], i);
> +    }
> +
> +    vpmu_save(curr_vcpu);
> +
> +    start = NOW();
> +
> +    /*
> +     * Note that we may fail here if a CPU is hot-plugged while we are
> +     * waiting. We will then time out.
> +     */
> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
> +    {
> +        cpu_relax();
> +
> +        now = NOW();
> +
> +        /* Give up after 5 seconds */
> +        if ( now > start + SECONDS(5) )
> +        {
> +            printk(XENLOG_WARNING
> +                   "vpmu_force_context_switch: failed to sync\n");
> +            ret = -EBUSY;
> +            break;
> +        }
> +
> +        /* Or after 2 milliseconds if need to be preempted */
> +        if ( (now > start + MILLISECS(2)) && hypercall_preempt_check() )
> +        {
> +            ret = -EAGAIN;
> +            break;
> +        }
> +    }
> +
> + out:
> +    for ( i = 0; i < allbutself_num; i++ )
> +    {
> +        if ( sync_task[i] )
> +        {
> +            tasklet_kill(sync_task[i]);
> +            xfree(sync_task[i]);
> +        }
> +    }
> +    xfree(sync_task);
> +
> +    return ret;
> +}
> +
> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> +{
> +    int ret;
> +    xen_pmu_params_t pmu_params;
> +
> +    ret = xsm_pmu_op(XSM_OTHER, current->domain, op);
> +    if ( ret )
> +        return ret;
> +
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    {
> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
> +        uint32_t current_mode;
> +
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV) )
> +            return -EINVAL;
> +
> +        /*
> +         * Return error is someone else is in the middle of changing mode ---

s/is/if ?

Dietmar.


> +         * this is most likely indication of two system administrators
> +         * working against each other
> +         */
> +        if ( !spin_trylock(&xenpmu_mode_lock) )
> +            return -EAGAIN;
> +
> +        current_mode = vpmu_mode;
> +        vpmu_mode = pmu_params.val;
> +
> +        if ( vpmu_mode == XENPMU_MODE_OFF )
> +        {
> +            /*
> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
> +             * can be achieved by having all physical processors go through
> +             * context_switch().
> +             */
> +            ret = vpmu_force_context_switch();
> +            if ( ret )
> +                vpmu_mode = current_mode;
> +        }
> +
> +        spin_unlock(&xenpmu_mode_lock);
> +        break;
> +    }
> +
> +    case XENPMU_mode_get:
> +        memset(&pmu_params, 0, sizeof(pmu_params));
> +        pmu_params.val = vpmu_mode;
> +        pmu_params.version.maj = XENPMU_VER_MAJ;
> +        pmu_params.version.min = XENPMU_VER_MIN;
> +        if ( copy_to_guest(arg, &pmu_params, 1) )
> +            return -EFAULT;
> +        break;
> +
> +    case XENPMU_feature_set:
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
> +            return -EINVAL;
> +
> +        vpmu_features = pmu_params.val;
> +        break;
> +
> +    case XENPMU_feature_get:
> +        memset(&pmu_params, 0, sizeof(pmu_params));
> +        pmu_params.val = vpmu_features;
> +        if ( copy_to_guest(arg, &pmu_params, 1) )
> +            return -EFAULT;
> +        break;
> +
> +    default:
> +        ret = -EINVAL;
> +    }
> +
> +    return ret;
> +}
> diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
> index ac594c9..8587c46 100644
> --- a/xen/arch/x86/x86_64/compat/entry.S
> +++ b/xen/arch/x86/x86_64/compat/entry.S
> @@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
>          .quad do_domctl
>          .quad compat_kexec_op
>          .quad do_tmem_op
> +        .quad do_ni_hypercall           /* reserved for XenClient */
> +        .quad do_xenpmu_op              /* 40 */
>          .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
>          .quad compat_ni_hypercall
>          .endr
> @@ -465,6 +467,8 @@ ENTRY(compat_hypercall_args_table)
>          .byte 1 /* do_domctl                */
>          .byte 2 /* compat_kexec_op          */
>          .byte 1 /* do_tmem_op               */
> +        .byte 0 /* reserved for XenClient   */
> +        .byte 2 /* do_xenpmu_op             */  /* 40 */
>          .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
>          .byte 0 /* compat_ni_hypercall      */
>          .endr
> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
> index ade555b..7f5dedf 100644
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -772,6 +772,8 @@ ENTRY(hypercall_table)
>          .quad do_domctl
>          .quad do_kexec_op
>          .quad do_tmem_op
> +        .quad do_ni_hypercall       /* reserved for XenClient */
> +        .quad do_xenpmu_op          /* 40 */
>          .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
>          .quad do_ni_hypercall
>          .endr
> @@ -820,6 +822,8 @@ ENTRY(hypercall_args_table)
>          .byte 1 /* do_domctl            */
>          .byte 2 /* do_kexec             */
>          .byte 1 /* do_tmem_op           */
> +        .byte 0 /* reserved for XenClient */
> +        .byte 2 /* do_xenpmu_op         */  /* 40 */
>          .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
>          .byte 0 /* do_ni_hypercall      */
>          .endr
> diff --git a/xen/include/Makefile b/xen/include/Makefile
> index f7ccbc9..f97733a 100644
> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -26,7 +26,9 @@ headers-y := \
>  headers-$(CONFIG_X86)     += compat/arch-x86/xen-mca.h
>  headers-$(CONFIG_X86)     += compat/arch-x86/xen.h
>  headers-$(CONFIG_X86)     += compat/arch-x86/xen-$(compat-arch-y).h
> +headers-$(CONFIG_X86)     += compat/arch-x86/pmu.h
>  headers-y                 += compat/arch-$(compat-arch-y).h compat/xlat.h
> +headers-y                 += compat/pmu.h
>  headers-$(FLASK_ENABLE)   += compat/xsm/flask_op.h
>  
>  cppflags-y                := -include public/xen-compat.h
> diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
> index 6fa0def..c612e1a 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -24,13 +24,6 @@
>  
>  #include <public/pmu.h>
>  
> -/*
> - * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
> - * See arch/x86/hvm/vpmu.c.
> - */
> -#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
> -#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
> -
>  #define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
>  #define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
>  
> @@ -59,8 +52,8 @@ struct arch_vpmu_ops {
>      void (*arch_vpmu_dump)(const struct vcpu *);
>  };
>  
> -int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
> -int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
> +int vmx_vpmu_initialise(struct vcpu *);
> +int svm_vpmu_initialise(struct vcpu *);
>  
>  struct vpmu_struct {
>      u32 flags;
> @@ -116,5 +109,21 @@ void vpmu_dump(struct vcpu *v);
>  extern int acquire_pmu_ownership(int pmu_ownership);
>  extern void release_pmu_ownership(int pmu_ownership);
>  
> +extern uint64_t vpmu_mode;
> +extern uint64_t vpmu_features;
> +
> +/* Context switch */
> +inline void vpmu_switch_from(struct vcpu *prev, struct vcpu *next)
> +{
> +    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
> +        vpmu_save(prev);
> +}
> +
> +inline void vpmu_switch_to(struct vcpu *prev, struct vcpu *next)
> +{
> +    if ( vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV) )
> +        vpmu_load(next);
> +}
> +
>  #endif /* __ASM_X86_HVM_VPMU_H_*/
>  
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index e6f45ee..c2293be 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -13,6 +13,50 @@
>  #define XENPMU_VER_MAJ    0
>  #define XENPMU_VER_MIN    1
>  
> +/*
> + * ` enum neg_errnoval
> + * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
> + *
> + * @cmd  == XENPMU_* (PMU operation)
> + * @args == struct xenpmu_params
> + */
> +/* ` enum xenpmu_op { */
> +#define XENPMU_mode_get        0 /* Also used for getting PMU version */
> +#define XENPMU_mode_set        1
> +#define XENPMU_feature_get     2
> +#define XENPMU_feature_set     3
> +/* ` } */
> +
> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
> +struct xen_pmu_params {
> +    /* IN/OUT parameters */
> +    struct {
> +        uint32_t maj;
> +        uint32_t min;
> +    } version;
> +    uint64_t val;
> +
> +    /* IN parameters */
> +    uint64_t vcpu;
> +};
> +typedef struct xen_pmu_params xen_pmu_params_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
> +
> +/* PMU modes:
> + * - XENPMU_MODE_OFF:   No PMU virtualization
> + * - XENPMU_MODE_SELF:  Guests can profile themselves
> + * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
> + *                      itself and Xen
> + */
> +#define XENPMU_MODE_OFF           0
> +#define XENPMU_MODE_SELF          (1<<0)
> +#define XENPMU_MODE_HV            (1<<1)
> +
> +/*
> + * PMU features:
> + * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
> + */
> +#define XENPMU_FEATURE_INTEL_BTS  1
>  
>  /* Shared between hypervisor and PV domain */
>  struct xen_pmu_data {
> diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> index a6a2092..0766790 100644
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  #define __HYPERVISOR_kexec_op             37
>  #define __HYPERVISOR_tmem_op              38
>  #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
> +#define __HYPERVISOR_xenpmu_op            40
>  
>  /* Architecture-specific hypercall definitions. */
>  #define __HYPERVISOR_arch_0               48
> diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
> index a9e5229..cf34547 100644
> --- a/xen/include/xen/hypercall.h
> +++ b/xen/include/xen/hypercall.h
> @@ -14,6 +14,7 @@
>  #include <public/event_channel.h>
>  #include <public/tmem.h>
>  #include <public/version.h>
> +#include <public/pmu.h>
>  #include <asm/hypercall.h>
>  #include <xsm/xsm.h>
>  
> @@ -139,6 +140,9 @@ do_tmem_op(
>  extern long
>  do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
>  
> +extern long
> +do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
> +
>  #ifdef CONFIG_COMPAT
>  
>  extern int
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index c8fafef..5809c60 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -101,6 +101,10 @@
>  !	vcpu_set_singleshot_timer	vcpu.h
>  ?	xenoprof_init			xenoprof.h
>  ?	xenoprof_passive		xenoprof.h
> +?	pmu_params			pmu.h
> +?	pmu_intel_ctxt			arch-x86/pmu.h
> +?	pmu_amd_ctxt			arch-x86/pmu.h
> +?	pmu_cntr_pair			arch-x86/pmu.h
>  ?	flask_access			xsm/flask_op.h
>  !	flask_boolean			xsm/flask_op.h
>  ?	flask_cache_stats		xsm/flask_op.h
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index df55e70..d423c1c 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -653,4 +653,19 @@ static XSM_INLINE int xsm_ioport_mapping(XSM_DEFAULT_ARG struct domain *d, uint3
>      return xsm_default_action(action, current->domain, d);
>  }
>  
> +static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, int op)
> +{
> +    XSM_ASSERT_ACTION(XSM_OTHER);
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    case XENPMU_mode_get:
> +    case XENPMU_feature_set:
> +    case XENPMU_feature_get:
> +        return xsm_default_action(XSM_PRIV, d, current->domain);
> +    default:
> +        return -EPERM;
> +    }
> +}
> +
>  #endif /* CONFIG_X86 */
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 6c1c079..635f7df 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -170,6 +170,7 @@ struct xsm_operations {
>      int (*unbind_pt_irq) (struct domain *d, struct xen_domctl_bind_pt_irq *bind);
>      int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>      int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
> +    int (*pmu_op) (struct domain *d, int op);
>  #endif
>  };
>  
> @@ -660,6 +661,11 @@ static inline int xsm_ioport_mapping (xsm_default_t def, struct domain *d, uint3
>      return xsm_ops->ioport_mapping(d, s, e, allow);
>  }
>  
> +static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, int op)
> +{
> +    return xsm_ops->pmu_op(d, op);
> +}
> +
>  #endif /* CONFIG_X86 */
>  
>  #endif /* XSM_NO_WRAPPERS */
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index 0826a8b..3638bd9 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -141,5 +141,6 @@ void xsm_fixup_ops (struct xsm_operations *ops)
>      set_to_dummy_if_null(ops, unbind_pt_irq);
>      set_to_dummy_if_null(ops, ioport_permission);
>      set_to_dummy_if_null(ops, ioport_mapping);
> +    set_to_dummy_if_null(ops, pmu_op);
>  #endif
>  }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 5afc1d7..b437a24 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1485,6 +1485,23 @@ static int flask_unbind_pt_irq (struct domain *d, struct xen_domctl_bind_pt_irq
>  {
>      return current_has_perm(d, SECCLASS_RESOURCE, RESOURCE__REMOVE);
>  }
> +
> +static int flask_pmu_op (struct domain *d, int op)
> +{
> +    u32 dsid = domain_sid(d);
> +
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    case XENPMU_mode_get:
> +    case XENPMU_feature_set:
> +    case XENPMU_feature_get:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
> +                            XEN2__PMU_CTRL, NULL);
> +    default:
> +        return -EPERM;
> +    }
> +}
>  #endif /* CONFIG_X86 */
>  
>  long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
> @@ -1604,6 +1621,7 @@ static struct xsm_operations flask_ops = {
>      .unbind_pt_irq = flask_unbind_pt_irq,
>      .ioport_permission = flask_ioport_permission,
>      .ioport_mapping = flask_ioport_mapping,
> +    .pmu_op = flask_pmu_op,
>  #endif
>  };
>  
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index 2ddbeba..64c7378 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -81,6 +81,8 @@ class xen2
>  {
>  # XENPF_get_symbol
>      get_symbol
> +# PMU control
> +    pmu_ctrl
>  }
>  
>  # Classes domain and domain2 consist of operations that a domain performs on
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support
  2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (20 preceding siblings ...)
  2014-09-26 17:03 ` [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Konrad Rzeszutek Wilk
@ 2014-09-29 13:28 ` Dietmar Hahn
  21 siblings, 0 replies; 92+ messages in thread
From: Dietmar Hahn @ 2014-09-29 13:28 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra,
	Boris Ostrovsky

Am Donnerstag 25 September 2014, 15:28:36 schrieb Boris Ostrovsky:
> Here is the twelfth version of PV(H) PMU patches.

For the complete series

Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>

I'm going to do some tests tomorrow.
Many thanks.

Dietmar.


> 
> Changes in v12:
> 
> * Added XSM support
> * Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL
> * Updated documentation for 'vpmu=nmi' option
> * Added more text to a bunch of commit messages (per Konrad's request)
> 
> Changes in v11:
> 
> * Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch.
>   - as part of this re-work noticed that CS registers were set in later patch then
>     needed. Moved those changes to appropriate place
> * Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only provide dom0
>   with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will be what
>   XENPMU_MODE_SELF used to be.
> * Kept  vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around vmx_add_msr()
> * Cleaned up VPMU context switch macros (moved  'if(prev!=next)' back to context_switch())
> * Dropped hypercall continuation from vpmu_force_context_switch() and replaced it with
>   -EAGAIN error if hypercall_preempt_check() is true after 2ms.
> * Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr()
> * Move context switching patch (#13) earlier in the series (for proper bisection support)
> * Various comment updates and cleanups
> * Dropped a bunch of Reviewed-by and all Tested-by tags
> 
> Changes in v10:
> 
> * Swapped address and name fields of xenpf_symdata (to make it smaller on 32-bit)
> * Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more complicated.
> * Cleaned up vlapic_reg_write()
> * Call vpmu_destroy() for both HVM and PVH VCPUs
> * Verify that (xen_pmu_data+PMU register bank) fit into a page
> * Return error codes from arch-specific VPMU init code
> * Moved VPMU-related context switch logic into inlines
> * vpmu_force_context_switch() changes:
>   o Avoid greater than page-sized allocations
>   o Prevent another VCPU from starting VPMU sync while the first sync is in progress
> * Avoid stack leak in do_xenpmu_op()
> * Checked validity of Intel VPMU MSR values before they are committed
> * Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on AMD)
> * Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => sampling)
> * Clarified commit messages (patches 2, 13, 18) 
> * Various cleanups
> 
> Changes in v9:
> 
> * Restore VPMU context after context_saved() is called in
>   context_switch(). This is needed because vpmu_load() may end up
>   calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
>   to be correctly set/cleared. (patch 18, dropped review acks)
> * Added patch 2 to properly manage VPMU_CONTEXT_LOADED
> * Addressed most of Jan's comments.
>   o Keep track of time in vpmu_force_context_switch() to properly break
>     out of a loop when using hypercall continuations
>   o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
>   o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
>     make it more clear which vcpu we are using
>   o Cleaned up vpmu_do_wrmsr()
>   o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
>     amd_vpmu_initialise(): throughout the code registers are declared as
>     uint64_t and if we are to add a new type (e.g. reg_t) this should be
>     done in a separate patch, unrelated to this series.
>   o Various more minor cleanups and code style fixes
>   
> Changes in v8:
> 
> * Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
> * Added compat checks for vpmu structures
> * Converted vpmu flag manipulation macros to inline routines
> * Reimplemented vpmu_unload_all() to avoid long loops
> * Reworked PMU fault generation and handling (new patch #12)
> * Added checks for domain->vcpu[] non-NULLness
> * Added more comments, renamed some routines and macros, code style cleanup
> 
> 
> Changes in v7:
> 
> * When reading hypervisor symbols make the caller pass buffer length
>   (as opposed to having this length be part of the API). Make the
>   hypervisor buffer static, make xensyms_read() return zero-length
>   string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
>   drop compat_pf_symdata definition.
> * Spread PVH support across patches as opposed to lumping it into a
>   separate patch
> * Rename vpmu_is_set_all() to vpmu_are_all_set()
> * Split VPMU cleanup patch in two
> * Use memmove when copying VMX guest and host MSRs
> * Make padding of xen_arch_pmu's context union a constand that does not
>   depend on arch context size.
> * Set interface version to 0.1
> * Check pointer validity in pvpmu_init/destroy()
> * Fixed crash in core2_vpmu_dump()
> * Fixed crash in vmx_add_msr()
> * Break handling of Intel and AMD MSRs in traps.c into separate cases
> * Pass full CS selector to guests
> * Add lock in pvpmu init code to prevent potential race
> 
> 
> Changes in v6:
> 
> * Two new patches:
>   o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
>   o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
> * Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled
>   guest from overwriting sampled_vcpu written by de-scheduled VPCU.
> * Keep track of enabled counters on Intel. This was removed in earlier patches and
>   was a mistake. As result of this change struct vpmu will have a pointer to private
>   context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer
>   on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests
>   earlier).
>   Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed
>   agan (core2_vpmu_do_wrmsr() routine, mostly)
> * Replaced references to dom0 with hardware_domain (and is_control_domain with
>   is_hardware_domain for consistency)
> * Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
> * Reverted unnecessary changes in vpmu_initialise()'s switch statement
> * Fixed comment in vpmu_do_interrupt
> 
> 
> Changes in v5:
> 
> * Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()")
>   as no longer needed
> * Added patch number 2 that marks context as loaded before PMU registers are
>   loaded. This prevents situation where a PMU interrupt may occur while context
>   is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
>   code)
> * Renamed xenpmu.h files to pmu.h
> * More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
>   has_hvm_container_domain(). Also explicitly disabled support for PVH until
>   patch 16 to make distinction between usage of the above macros more clear.
> * Added support for disabling VPMU support during runtime.
> * Disable VPMUs for non-privileged domains when switching to privileged
>   profiling mode
> * Added ARM stub for xen_arch_pmu_t
> * Separated vpmu_mode from vpmu_features
> * Moved CS register query to make sure we use appropriate query mechanism for
>   various guest types.
> * LVTPC is now set from value in shared area, not copied from dom0
> * Various code and comments cleanup as suggested by Jan.
> 
> Changes in v4:
> 
> * Added support for PVH guests:
>   o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10
>   o more careful use of is_hvm_domain
>   o Additional patch (16)
> * Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
> * Fixed dom0's VCPU selection in privileged mode
> * Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
>   (don't want to expose compat_cpu_user_regs in a public header)
> * Renamed public structures by prefixing them with "xen_"
> * Added an entry for xenpf_symdata in xlat.lst
> * Fixed pv_cpuid check for vpmu-specific cpuid adjustments
> * Varios code style fixes
> * Eliminated anonymous unions
> * Added more verbiage to NMI patch description
> 
> 
> Changes in v3:
> 
> * Moved PMU MSR banks out from architectural context data structures to allow
> for future expansion without protocol changes
> * PMU interrupts can be either NMIs or regular vector interrupts (the latter
> is the default)
> * Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
> race conditions with the guest
> * Fixed races with PV guest in MSR access handlers
> * More Intel VPMU cleanup
> * Moved NMI-unsafe code from NMI handler
> * Dropped changes to vcpu->is_running
> * Added LVTPC apic handling (cached for PV guests)
> * Separated privileged profiling mode into a standalone patch
> * Separated NMI handling into a standalone patch
> 
> 
> Changes in v2:
> 
> * Xen symbols are exported as data structure (as opoosed to a set of formatted
> strings in v1). Even though one symbol per hypercall is returned performance
> appears to be acceptable: reading whole file from dom0 userland takes on average
> about twice as long as reading /proc/kallsyms
> * More cleanup of Intel VPMU code to simplify publicly exported structures
> * There is an architecture-independent and x86-specific public include files (ARM
> has a stub)
> * General cleanup of public include files to make them more presentable (and
> to make auto doc generation better)
> * Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
> changes to common/schedule.c architecture-independent). Note that this is not
> tested since I don't have access to ARM hardware.
> * PCPU ID of interrupted processor is now passed to PV guest
> 
> 
> The following patch series adds PMU support in Xen for PV(H)
> guests. There is a companion patchset for Linux kernel. In addition,
> another set of changes will be provided (later) for userland perf
> code.
> 
> This version has following limitations:
> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
> * Hypervisor code is only profiled on processors that have running dom0 VCPUs
> on them.
> * No backtrace support.
> 
> A few notes that may help reviewing:
> 
> * A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
> CPU is used for passing registers' values as well as PMU state at the time of
> PMU interrupt.
> * PMU interrupts are taken by hypervisor either as NMIs or regular vector
> interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
> and as virtual interrupts to PV(H) guests
> * PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
> accesses xenpmu_data_t and flushes it to HW it before returning.
> * PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
> in addition to 'vpmu' boot option (which is preserved for back compatibility).
> The following modes are provided:
>   * disable: VPMU is off
>   * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
>   * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
>     in guests is suspended.
> * /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
> /proc/kallsyms)
> * VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
> up from hvm subtree
> 
> 
> 
> Boris Ostrovsky (20):
>   common/symbols: Export hypervisor symbols to privileged guest
>   x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
>   x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
>   x86/VPMU: Make vpmu macros a bit more efficient
>   intel/VPMU: Clean up Intel VPMU code
>   vmx: Merge MSR management routines
>   x86/VPMU: Handle APIC_LVTPC accesses
>   intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
>   x86/VPMU: Add public xenpmu.h
>   x86/VPMU: Make vpmu not HVM-specific
>   x86/VPMU: Interface for setting PMU mode and flags
>   x86/VPMU: Initialize PMU for PV(H) guests
>   x86/VPMU: Save VPMU state for PV guests during context switch
>   x86/VPMU: When handling MSR accesses, leave fault injection to callers
>   x86/VPMU: Add support for PMU register handling on PV guests
>   x86/VPMU: Handle PMU interrupts for PV guests
>   x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
>   x86/VPMU: Add privileged PMU mode
>   x86/VPMU: NMI-based VPMU support
>   x86/VPMU: Move VPMU files up from hvm/ directory
> 
>  docs/misc/xen-command-line.markdown                |   8 +-
>  tools/flask/policy/policy/modules/xen/xen.te       |   7 +
>  xen/arch/x86/Makefile                              |   1 +
>  xen/arch/x86/domain.c                              |  23 +-
>  xen/arch/x86/hvm/Makefile                          |   1 -
>  xen/arch/x86/hvm/hvm.c                             |   3 +-
>  xen/arch/x86/hvm/svm/Makefile                      |   1 -
>  xen/arch/x86/hvm/svm/svm.c                         |  10 +-
>  xen/arch/x86/hvm/vlapic.c                          |   3 +
>  xen/arch/x86/hvm/vmx/Makefile                      |   1 -
>  xen/arch/x86/hvm/vmx/vmcs.c                        |  84 +--
>  xen/arch/x86/hvm/vmx/vmx.c                         |  28 +-
>  xen/arch/x86/hvm/vpmu.c                            | 265 -------
>  xen/arch/x86/oprofile/op_model_ppro.c              |   8 +-
>  xen/arch/x86/platform_hypercall.c                  |  33 +
>  xen/arch/x86/traps.c                               |  60 +-
>  xen/arch/x86/vpmu.c                                | 826 +++++++++++++++++++++
>  xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}        | 158 ++--
>  .../x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c}     | 639 ++++++++--------
>  xen/arch/x86/x86_64/compat/entry.S                 |   4 +
>  xen/arch/x86/x86_64/entry.S                        |   4 +
>  xen/common/event_channel.c                         |   1 +
>  xen/common/symbols.c                               |  54 ++
>  xen/include/Makefile                               |   2 +
>  xen/include/asm-x86/domain.h                       |   2 +
>  xen/include/asm-x86/hvm/vcpu.h                     |   3 -
>  xen/include/asm-x86/hvm/vmx/vmcs.h                 |  18 +-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h           |  51 --
>  xen/include/asm-x86/{hvm => }/vpmu.h               |  94 ++-
>  xen/include/public/arch-arm.h                      |   3 +
>  xen/include/public/arch-x86/pmu.h                  |  77 ++
>  xen/include/public/arch-x86/xen-x86_32.h           |   8 +
>  xen/include/public/arch-x86/xen-x86_64.h           |   8 +
>  xen/include/public/platform.h                      |  19 +
>  xen/include/public/pmu.h                           |  95 +++
>  xen/include/public/xen.h                           |   2 +
>  xen/include/xen/hypercall.h                        |   4 +
>  xen/include/xen/softirq.h                          |   1 +
>  xen/include/xen/symbols.h                          |   3 +
>  xen/include/xlat.lst                               |   5 +
>  xen/include/xsm/dummy.h                            |  20 +
>  xen/include/xsm/xsm.h                              |   6 +
>  xen/xsm/dummy.c                                    |   1 +
>  xen/xsm/flask/hooks.c                              |  28 +
>  xen/xsm/flask/policy/access_vectors                |  18 +-
>  xen/xsm/flask/policy/security_classes              |   1 +
>  46 files changed, 1872 insertions(+), 819 deletions(-)
>  delete mode 100644 xen/arch/x86/hvm/vpmu.c
>  create mode 100644 xen/arch/x86/vpmu.c
>  rename xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c} (74%)
>  rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} (60%)
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  rename xen/include/asm-x86/{hvm => }/vpmu.h (55%)
>  create mode 100644 xen/include/public/arch-x86/pmu.h
>  create mode 100644 xen/include/public/pmu.h
> 
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-29  6:43         ` Jan Beulich
@ 2014-09-29 13:29           ` Boris Ostrovsky
  2014-09-29 13:47             ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 13:29 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 02:43 AM, Jan Beulich wrote:
>>>> On 26.09.14 at 18:49, <konrad.wilk@oracle.com> wrote:
>> On Fri, Sep 26, 2014 at 04:10:09PM +0100, Jan Beulich wrote:
>>>>>> On 26.09.14 at 16:58, <konrad.wilk@oracle.com> wrote:
>>>> If I move them just a bit:
>>>>
>>>>
>>>> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
>>>> index 4f21b17..b97e476 100644
>>>> --- a/xen/include/public/platform.h
>>>> +++ b/xen/include/public/platform.h
>>>> @@ -538,9 +538,9 @@ struct xenpf_symdata {
>>>>                         /*      we reached the end                        */
>>>>   
>>>>       /* OUT variables */
>>>> -    char type;
>>>> -    XEN_GUEST_HANDLE(char) name;
>>>>       uint64_t address;
>>>> +    XEN_GUEST_HANDLE(char) name;
>>>> +    char type;
>>>>   };
>>>>   typedef struct xenpf_symdata xenpf_symdata_t;
>>>>   DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>>>>
>>>>
>>>> 'pahole' is satisfied:
>>>>
>>>> struct xenpf_symdata {
>>   
>>>>      uint32_t                   namelen;              /*     0     4 */
>>   
>>>>      uint32_t                   symnum;               /*     4     4 */
>>   
>>>>      uint64_t                   address;              /*     8     8 */
>>   
>>>>      __guest_handle_char        name;                 /*    16     8 */
>>   
>>>>      char                       type;                 /*    24     1 */
>>   
>>>>                                                                              
>>   
>>>>      /* size: 32, cachelines: 1, members: 5 */
>>   
>>>>      /* padding: 7 */
>>   
>>>>      /* last cacheline: 32 bytes */
>>   
>>>> };
>>>>
>>>>
>>>> With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> This change buys us exactly nothing: Structure size doesn't change,
>>> and 7 bytes of padding are still there.
>> It does allow us to put more parameters (if we want to) at the end of the
>> structure instead of fitting them in between.
> Regardless of where the gap is, adding further fields in the future
> would work only if the code now checked that this field is zero
> (which first of all would require it being given a name). I keep
> pointing out that this should be done for all padding fields, but I'm
> afraid I may have missed doing so on this occasion.

I am not sure I understand how setting fields to zero would help with 
figuring out whether a new fields has been added. I can see how it can 
in some cases but not in general.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-29 13:29           ` Boris Ostrovsky
@ 2014-09-29 13:47             ` Jan Beulich
  2014-09-29 14:16               ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 13:47 UTC (permalink / raw)
  To: Boris Ostrovsky, Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 15:29, <boris.ostrovsky@oracle.com> wrote:
> On 09/29/2014 02:43 AM, Jan Beulich wrote:
>>>>> On 26.09.14 at 18:49, <konrad.wilk@oracle.com> wrote:
>>> On Fri, Sep 26, 2014 at 04:10:09PM +0100, Jan Beulich wrote:
>>>>>>> On 26.09.14 at 16:58, <konrad.wilk@oracle.com> wrote:
>>>>> If I move them just a bit:
>>>>>
>>>>>
>>>>> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
>>>>> index 4f21b17..b97e476 100644
>>>>> --- a/xen/include/public/platform.h
>>>>> +++ b/xen/include/public/platform.h
>>>>> @@ -538,9 +538,9 @@ struct xenpf_symdata {
>>>>>                         /*      we reached the end                        */
>>>>>   
>>>>>       /* OUT variables */
>>>>> -    char type;
>>>>> -    XEN_GUEST_HANDLE(char) name;
>>>>>       uint64_t address;
>>>>> +    XEN_GUEST_HANDLE(char) name;
>>>>> +    char type;
>>>>>   };
>>>>>   typedef struct xenpf_symdata xenpf_symdata_t;
>>>>>   DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>>>>>
>>>>>
>>>>> 'pahole' is satisfied:
>>>>>
>>>>> struct xenpf_symdata {
>>>   
>>>>>      uint32_t                   namelen;              /*     0     4 */
>>>   
>>>>>      uint32_t                   symnum;               /*     4     4 */
>>>   
>>>>>      uint64_t                   address;              /*     8     8 */
>>>   
>>>>>      __guest_handle_char        name;                 /*    16     8 */
>>>   
>>>>>      char                       type;                 /*    24     1 */
>>>   
>>>>>                                                                              
> 
>>>   
>>>>>      /* size: 32, cachelines: 1, members: 5 */
>>>   
>>>>>      /* padding: 7 */
>>>   
>>>>>      /* last cacheline: 32 bytes */
>>>   
>>>>> };
>>>>>
>>>>>
>>>>> With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>> This change buys us exactly nothing: Structure size doesn't change,
>>>> and 7 bytes of padding are still there.
>>> It does allow us to put more parameters (if we want to) at the end of the
>>> structure instead of fitting them in between.
>> Regardless of where the gap is, adding further fields in the future
>> would work only if the code now checked that this field is zero
>> (which first of all would require it being given a name). I keep
>> pointing out that this should be done for all padding fields, but I'm
>> afraid I may have missed doing so on this occasion.
> 
> I am not sure I understand how setting fields to zero would help with 
> figuring out whether a new fields has been added. I can see how it can 
> in some cases but not in general.

If you check that padding fields are zero now, meaning can be
assigned to them later on, while if you allow them to be uninitialized,
that's not an option.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-29 13:25   ` Dietmar Hahn
@ 2014-09-29 13:56     ` Boris Ostrovsky
  2014-09-29 14:03       ` Dietmar Hahn
  2014-09-29 13:59     ` Jan Beulich
  1 sibling, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 13:56 UTC (permalink / raw)
  To: Dietmar Hahn, xen-devel
  Cc: kevin.tian, keir, jbeulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, dgdegra

On 09/29/2014 09:25 AM, Dietmar Hahn wrote:
> +static int vpmu_force_context_switch(void)
> +{
> +    unsigned i, j, allbutself_num, mycpu;
> +    static s_time_t start, now;
> +    struct tasklet **sync_task;
> +    struct vcpu *curr_vcpu = current;
> +    int ret = 0;
> +
> +    allbutself_num = num_online_cpus() - 1;
> +
> +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
> +    if ( !sync_task )
> +    {
> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> +        return -ENOMEM;
> +    }
> +
> +    for ( i = 0; i < allbutself_num; i++ )
> +    {
> +        sync_task[i] = xmalloc(struct tasklet);
> +        if ( sync_task[i] == NULL )
> +        {
> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> +            ret = -ENOMEM;
> +            goto out;
> +        }
> +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
> Only a question of understanding.
> Is there a special reason not to use a single memory allocation
> except for memory fragmentation on systems with a large number of cpus?
>
>       struct tasklet *sync_task;
>       sync_task = xmalloc(sizeof(struct tasklet) * allbutself_num);


Exactly because of fragmentation -- this will avoid asking for more than 
a page during runtime. I, in fact, originally had it allocated as a 
single chunk, just as you suggested above, but Jan asked this to be 
split into smaller, sub-page pieces.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-29 13:25   ` Dietmar Hahn
  2014-09-29 13:56     ` Boris Ostrovsky
@ 2014-09-29 13:59     ` Jan Beulich
  2014-09-29 14:05       ` Dietmar Hahn
  1 sibling, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 13:59 UTC (permalink / raw)
  To: xen-devel, Dietmar Hahn
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, jun.nakajima, Boris Ostrovsky, dgdegra

>>> On 29.09.14 at 15:25, <dietmar.hahn@ts.fujitsu.com> wrote:
> Only a minor note below.
> 
> Am Donnerstag 25 September 2014, 15:28:47 schrieb Boris Ostrovsky:
>> Add runtime interface for setting PMU mode and flags. Three main modes are
>> provided:
>> * XENPMU_MODE_OFF:  PMU is not virtualized
>> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
>> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
>>   can profile itself and the hypervisor.
>> 
>> Note that PMU modes are different from what can be provided at Xen's boot 
> line
>> with 'vpmu' argument. An 'off' (or '0') value is equivalent to 
> XENPMU_MODE_OFF.
>> Any other value, on the other hand, will cause VPMU mode to be set to
>> XENPMU_MODE_SELF during boot.
>> 
>> For feature flags only Intel's BTS is currently supported.
>> 
>> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>> 
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  tools/flask/policy/policy/modules/xen/xen.te |   3 +
>>  xen/arch/x86/domain.c                        |   6 +-
>>  xen/arch/x86/hvm/svm/vpmu.c                  |   4 +-
>>  xen/arch/x86/hvm/vmx/vpmu_core2.c            |  10 +-
>>  xen/arch/x86/hvm/vpmu.c                      | 206 
> +++++++++++++++++++++++++--
>>  xen/arch/x86/x86_64/compat/entry.S           |   4 +
>>  xen/arch/x86/x86_64/entry.S                  |   4 +
>>  xen/include/Makefile                         |   2 +
>>  xen/include/asm-x86/hvm/vpmu.h               |  27 ++--
>>  xen/include/public/pmu.h                     |  44 ++++++
>>  xen/include/public/xen.h                     |   1 +
>>  xen/include/xen/hypercall.h                  |   4 +
>>  xen/include/xlat.lst                         |   4 +
>>  xen/include/xsm/dummy.h                      |  15 ++
>>  xen/include/xsm/xsm.h                        |   6 +
>>  xen/xsm/dummy.c                              |   1 +
>>  xen/xsm/flask/hooks.c                        |  18 +++
>>  xen/xsm/flask/policy/access_vectors          |   2 +
>>  18 files changed, 334 insertions(+), 27 deletions(-)
>> 
>> diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
> b/tools/flask/policy/policy/modules/xen/xen.te
>> index 1937883..fb761cd 100644
>> --- a/tools/flask/policy/policy/modules/xen/xen.te
>> +++ b/tools/flask/policy/policy/modules/xen/xen.te
>> @@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
>>  	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
>>  	tmem_control getscheduler setscheduler
>>  };
>> +allow dom0_t xen_t:xen2 {
>> +    pmu_ctrl
>> +};
>>  allow dom0_t xen_t:mmu memorymap;
>>  
>>  # Allow dom0 to use these domctls on itself. For domctls acting on other
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 7b1dfe6..6a07737 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1503,7 +1503,7 @@ void context_switch(struct vcpu *prev, struct vcpu 
> *next)
>>      if ( is_hvm_vcpu(prev) )
>>      {
>>          if (prev != next)
>> -            vpmu_save(prev);
>> +            vpmu_switch_from(prev, next);
>>  
>>          if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>>              pt_save_timer(prev);
>> @@ -1546,9 +1546,9 @@ void context_switch(struct vcpu *prev, struct vcpu 
> *next)
>>                             !is_hardware_domain(next->domain));
>>      }
>>  
>> -    if (is_hvm_vcpu(next) && (prev != next) )
>> +    if ( is_hvm_vcpu(prev) && (prev != next) )
>>          /* Must be done with interrupts enabled */
>> -        vpmu_load(next);
>> +        vpmu_switch_to(prev, next);
>>  
>>      context_saved(prev);
>>  
>> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
>> index 124b147..37d8228 100644
>> --- a/xen/arch/x86/hvm/svm/vpmu.c
>> +++ b/xen/arch/x86/hvm/svm/vpmu.c
>> @@ -479,14 +479,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
>>      .arch_vpmu_dump = amd_vpmu_dump
>>  };
>>  
>> -int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>> +int svm_vpmu_initialise(struct vcpu *v)
>>  {
>>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>      uint8_t family = current_cpu_data.x86;
>>      int ret = 0;
>>  
>>      /* vpmu enabled? */
>> -    if ( !vpmu_flags )
>> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>>          return 0;
>>  
>>      switch ( family )
>> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
> b/xen/arch/x86/hvm/vmx/vpmu_core2.c
>> index beff5c3..c0a45cd 100644
>> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
>> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
>> @@ -703,13 +703,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
> *regs)
>>      return 1;
>>  }
>>  
>> -static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>> +static int core2_vpmu_initialise(struct vcpu *v)
>>  {
>>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>      u64 msr_content;
>>      static bool_t ds_warned;
>>  
>> -    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
>> +    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
>>          goto func_out;
>>      /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
>>      while ( boot_cpu_has(X86_FEATURE_DS) )
>> @@ -824,7 +824,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
>>      .do_cpuid = core2_no_vpmu_do_cpuid,
>>  };
>>  
>> -int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
>> +int vmx_vpmu_initialise(struct vcpu *v)
>>  {
>>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>      uint8_t family = current_cpu_data.x86;
>> @@ -832,7 +832,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int 
> vpmu_flags)
>>      int ret = 0;
>>  
>>      vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
>> -    if ( !vpmu_flags )
>> +    if ( vpmu_mode == XENPMU_MODE_OFF )
>>          return 0;
>>  
>>      if ( family == 6 )
>> @@ -875,7 +875,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int 
> vpmu_flags)
>>          /* future: */
>>          case 0x3d:
>>          case 0x4e:
>> -            ret = core2_vpmu_initialise(v, vpmu_flags);
>> +            ret = core2_vpmu_initialise(v);
>>              if ( !ret )
>>                  vpmu->arch_vpmu_ops = &core2_vpmu_ops;
>>              return ret;
>> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
>> index 071b869..5fcee0e 100644
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -21,6 +21,8 @@
>>  #include <xen/config.h>
>>  #include <xen/sched.h>
>>  #include <xen/xenoprof.h>
>> +#include <xen/event.h>
>> +#include <xen/guest_access.h>
>>  #include <asm/regs.h>
>>  #include <asm/types.h>
>>  #include <asm/msr.h>
>> @@ -32,13 +34,22 @@
>>  #include <asm/hvm/svm/vmcb.h>
>>  #include <asm/apic.h>
>>  #include <public/pmu.h>
>> +#include <xen/tasklet.h>
>> +#include <xsm/xsm.h>
>> +
>> +#include <compat/pmu.h>
>> +CHECK_pmu_params;
>> +CHECK_pmu_intel_ctxt;
>> +CHECK_pmu_amd_ctxt;
>> +CHECK_pmu_cntr_pair;
>>  
>>  /*
>>   * "vpmu" :     vpmu generally enabled
>>   * "vpmu=off" : vpmu generally disabled
>>   * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
>>   */
>> -static unsigned int __read_mostly opt_vpmu_enabled;
>> +uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
>> +uint64_t __read_mostly vpmu_features = 0;
>>  static void parse_vpmu_param(char *s);
>>  custom_param("vpmu", parse_vpmu_param);
>>  
>> @@ -52,7 +63,7 @@ static void __init parse_vpmu_param(char *s)
>>          break;
>>      default:
>>          if ( !strcmp(s, "bts") )
>> -            opt_vpmu_enabled |= VPMU_BOOT_BTS;
>> +            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
>>          else if ( *s )
>>          {
>>              printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
>> @@ -60,7 +71,8 @@ static void __init parse_vpmu_param(char *s)
>>          }
>>          /* fall through */
>>      case 1:
>> -        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
>> +        /* Default VPMU mode */
>> +        vpmu_mode = XENPMU_MODE_SELF;
>>          break;
>>      }
>>  }
>> @@ -77,6 +89,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, 
> uint64_t supported)
>>  {
>>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>>  
>> +    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
>> +        return 0;
>> +
>>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
>>          return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
>>      return 0;
>> @@ -86,6 +101,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>>  {
>>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
>>  
>> +    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
>> +        return 0;
>> +
>>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
>>          return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
>>      return 0;
>> @@ -242,19 +260,19 @@ void vpmu_initialise(struct vcpu *v)
>>      switch ( vendor )
>>      {
>>      case X86_VENDOR_AMD:
>> -        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
>> -            opt_vpmu_enabled = 0;
>> +        if ( svm_vpmu_initialise(v) != 0 )
>> +            vpmu_mode = XENPMU_MODE_OFF;
>>          break;
>>  
>>      case X86_VENDOR_INTEL:
>> -        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
>> -            opt_vpmu_enabled = 0;
>> +        if ( vmx_vpmu_initialise(v) != 0 )
>> +            vpmu_mode = XENPMU_MODE_OFF;
>>          break;
>>  
>>      default:
>>          printk("VPMU: Initialization failed. "
>>                 "Unknown CPU vendor %d\n", vendor);
>> -        opt_vpmu_enabled = 0;
>> +        vpmu_mode = XENPMU_MODE_OFF;
>>          break;
>>      }
>>  }
>> @@ -276,3 +294,175 @@ void vpmu_dump(struct vcpu *v)
>>          vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
>>  }
>>  
>> +static atomic_t vpmu_sched_counter;
>> +
>> +static void vpmu_sched_checkin(unsigned long unused)
>> +{
>> +    atomic_inc(&vpmu_sched_counter);
>> +}
>> +
>> +static int vpmu_force_context_switch(void)
>> +{
>> +    unsigned i, j, allbutself_num, mycpu;
>> +    static s_time_t start, now;
>> +    struct tasklet **sync_task;
>> +    struct vcpu *curr_vcpu = current;
>> +    int ret = 0;
>> +
>> +    allbutself_num = num_online_cpus() - 1;
>> +
>> +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
>> +    if ( !sync_task )
>> +    {
>> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of 
> memory\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +    {
>> +        sync_task[i] = xmalloc(struct tasklet);
>> +        if ( sync_task[i] == NULL )
>> +        {
>> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of 
> memory\n");
>> +            ret = -ENOMEM;
>> +            goto out;
>> +        }
>> +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
> 
> Only a question of understanding.
> Is there a special reason not to use a single memory allocation
> except for memory fragmentation on systems with a large number of cpus?
> 
>      struct tasklet *sync_task;
>      sync_task = xmalloc(sizeof(struct tasklet) * allbutself_num);

Apart from this then needing to be xmalloc_array() - yes, the
reason here is to avoid non-order-zero runtime allocations. I.e.
the alternative would be to provide something vmalloc()-like to
be used here (or open code it as we do in a couple of other
places).

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-29 13:56     ` Boris Ostrovsky
@ 2014-09-29 14:03       ` Dietmar Hahn
  0 siblings, 0 replies; 92+ messages in thread
From: Dietmar Hahn @ 2014-09-29 14:03 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, jbeulich, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, jun.nakajima, dgdegra, Boris Ostrovsky,
	suravee.suthikulpanit

Am Montag 29 September 2014, 09:56:50 schrieb Boris Ostrovsky:
> On 09/29/2014 09:25 AM, Dietmar Hahn wrote:
> > +static int vpmu_force_context_switch(void)
> > +{
> > +    unsigned i, j, allbutself_num, mycpu;
> > +    static s_time_t start, now;
> > +    struct tasklet **sync_task;
> > +    struct vcpu *curr_vcpu = current;
> > +    int ret = 0;
> > +
> > +    allbutself_num = num_online_cpus() - 1;
> > +
> > +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
> > +    if ( !sync_task )
> > +    {
> > +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> > +        return -ENOMEM;
> > +    }
> > +
> > +    for ( i = 0; i < allbutself_num; i++ )
> > +    {
> > +        sync_task[i] = xmalloc(struct tasklet);
> > +        if ( sync_task[i] == NULL )
> > +        {
> > +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> > +            ret = -ENOMEM;
> > +            goto out;
> > +        }
> > +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
> > Only a question of understanding.
> > Is there a special reason not to use a single memory allocation
> > except for memory fragmentation on systems with a large number of cpus?
> >
> >       struct tasklet *sync_task;
> >       sync_task = xmalloc(sizeof(struct tasklet) * allbutself_num);
> 
> 
> Exactly because of fragmentation -- this will avoid asking for more than 
> a page during runtime. I, in fact, originally had it allocated as a 
> single chunk, just as you suggested above, but Jan asked this to be 
> split into smaller, sub-page pieces.
> 
> -boris

OK, then I overlooked this :-(
Sorry for the noise!
Dietmar.

> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-29 13:59     ` Jan Beulich
@ 2014-09-29 14:05       ` Dietmar Hahn
  0 siblings, 0 replies; 92+ messages in thread
From: Dietmar Hahn @ 2014-09-29 14:05 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, keir, Jan Beulich, jun.nakajima, andrew.cooper3, tim,
	Aravind.Gopalakrishnan, suravee.suthikulpanit, Boris Ostrovsky,
	dgdegra

Am Montag 29 September 2014, 14:59:43 schrieb Jan Beulich:
> >>> On 29.09.14 at 15:25, <dietmar.hahn@ts.fujitsu.com> wrote:
> > Only a minor note below.
> > 
> > Am Donnerstag 25 September 2014, 15:28:47 schrieb Boris Ostrovsky:
> >> Add runtime interface for setting PMU mode and flags. Three main modes are
> >> provided:
> >> * XENPMU_MODE_OFF:  PMU is not virtualized
> >> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
> >> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
> >>   can profile itself and the hypervisor.
> >> 
> >> Note that PMU modes are different from what can be provided at Xen's boot 
> > line
> >> with 'vpmu' argument. An 'off' (or '0') value is equivalent to 
> > XENPMU_MODE_OFF.
> >> Any other value, on the other hand, will cause VPMU mode to be set to
> >> XENPMU_MODE_SELF during boot.
> >> 
> >> For feature flags only Intel's BTS is currently supported.
> >> 
> >> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
> >> 
> >> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> ---
> >>  tools/flask/policy/policy/modules/xen/xen.te |   3 +
> >>  xen/arch/x86/domain.c                        |   6 +-
> >>  xen/arch/x86/hvm/svm/vpmu.c                  |   4 +-
> >>  xen/arch/x86/hvm/vmx/vpmu_core2.c            |  10 +-
> >>  xen/arch/x86/hvm/vpmu.c                      | 206 
> > +++++++++++++++++++++++++--
> >>  xen/arch/x86/x86_64/compat/entry.S           |   4 +
> >>  xen/arch/x86/x86_64/entry.S                  |   4 +
> >>  xen/include/Makefile                         |   2 +
> >>  xen/include/asm-x86/hvm/vpmu.h               |  27 ++--
> >>  xen/include/public/pmu.h                     |  44 ++++++
> >>  xen/include/public/xen.h                     |   1 +
> >>  xen/include/xen/hypercall.h                  |   4 +
> >>  xen/include/xlat.lst                         |   4 +
> >>  xen/include/xsm/dummy.h                      |  15 ++
> >>  xen/include/xsm/xsm.h                        |   6 +
> >>  xen/xsm/dummy.c                              |   1 +
> >>  xen/xsm/flask/hooks.c                        |  18 +++
> >>  xen/xsm/flask/policy/access_vectors          |   2 +
> >>  18 files changed, 334 insertions(+), 27 deletions(-)
> >> 
> >> diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
> > b/tools/flask/policy/policy/modules/xen/xen.te
> >> index 1937883..fb761cd 100644
> >> --- a/tools/flask/policy/policy/modules/xen/xen.te
> >> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> >> @@ -64,6 +64,9 @@ allow dom0_t xen_t:xen {
> >>  	getidle debug getcpuinfo heap pm_op mca_op lockprof cpupool_op tmem_op
> >>  	tmem_control getscheduler setscheduler
> >>  };
> >> +allow dom0_t xen_t:xen2 {
> >> +    pmu_ctrl
> >> +};
> >>  allow dom0_t xen_t:mmu memorymap;
> >>  
> >>  # Allow dom0 to use these domctls on itself. For domctls acting on other
> >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> >> index 7b1dfe6..6a07737 100644
> >> --- a/xen/arch/x86/domain.c
> >> +++ b/xen/arch/x86/domain.c
> >> @@ -1503,7 +1503,7 @@ void context_switch(struct vcpu *prev, struct vcpu 
> > *next)
> >>      if ( is_hvm_vcpu(prev) )
> >>      {
> >>          if (prev != next)
> >> -            vpmu_save(prev);
> >> +            vpmu_switch_from(prev, next);
> >>  
> >>          if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
> >>              pt_save_timer(prev);
> >> @@ -1546,9 +1546,9 @@ void context_switch(struct vcpu *prev, struct vcpu 
> > *next)
> >>                             !is_hardware_domain(next->domain));
> >>      }
> >>  
> >> -    if (is_hvm_vcpu(next) && (prev != next) )
> >> +    if ( is_hvm_vcpu(prev) && (prev != next) )
> >>          /* Must be done with interrupts enabled */
> >> -        vpmu_load(next);
> >> +        vpmu_switch_to(prev, next);
> >>  
> >>      context_saved(prev);
> >>  
> >> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> >> index 124b147..37d8228 100644
> >> --- a/xen/arch/x86/hvm/svm/vpmu.c
> >> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> >> @@ -479,14 +479,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
> >>      .arch_vpmu_dump = amd_vpmu_dump
> >>  };
> >>  
> >> -int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> >> +int svm_vpmu_initialise(struct vcpu *v)
> >>  {
> >>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> >>      uint8_t family = current_cpu_data.x86;
> >>      int ret = 0;
> >>  
> >>      /* vpmu enabled? */
> >> -    if ( !vpmu_flags )
> >> +    if ( vpmu_mode == XENPMU_MODE_OFF )
> >>          return 0;
> >>  
> >>      switch ( family )
> >> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
> > b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> >> index beff5c3..c0a45cd 100644
> >> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> >> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> >> @@ -703,13 +703,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
> > *regs)
> >>      return 1;
> >>  }
> >>  
> >> -static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> >> +static int core2_vpmu_initialise(struct vcpu *v)
> >>  {
> >>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> >>      u64 msr_content;
> >>      static bool_t ds_warned;
> >>  
> >> -    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
> >> +    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
> >>          goto func_out;
> >>      /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
> >>      while ( boot_cpu_has(X86_FEATURE_DS) )
> >> @@ -824,7 +824,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
> >>      .do_cpuid = core2_no_vpmu_do_cpuid,
> >>  };
> >>  
> >> -int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
> >> +int vmx_vpmu_initialise(struct vcpu *v)
> >>  {
> >>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> >>      uint8_t family = current_cpu_data.x86;
> >> @@ -832,7 +832,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int 
> > vpmu_flags)
> >>      int ret = 0;
> >>  
> >>      vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
> >> -    if ( !vpmu_flags )
> >> +    if ( vpmu_mode == XENPMU_MODE_OFF )
> >>          return 0;
> >>  
> >>      if ( family == 6 )
> >> @@ -875,7 +875,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int 
> > vpmu_flags)
> >>          /* future: */
> >>          case 0x3d:
> >>          case 0x4e:
> >> -            ret = core2_vpmu_initialise(v, vpmu_flags);
> >> +            ret = core2_vpmu_initialise(v);
> >>              if ( !ret )
> >>                  vpmu->arch_vpmu_ops = &core2_vpmu_ops;
> >>              return ret;
> >> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> >> index 071b869..5fcee0e 100644
> >> --- a/xen/arch/x86/hvm/vpmu.c
> >> +++ b/xen/arch/x86/hvm/vpmu.c
> >> @@ -21,6 +21,8 @@
> >>  #include <xen/config.h>
> >>  #include <xen/sched.h>
> >>  #include <xen/xenoprof.h>
> >> +#include <xen/event.h>
> >> +#include <xen/guest_access.h>
> >>  #include <asm/regs.h>
> >>  #include <asm/types.h>
> >>  #include <asm/msr.h>
> >> @@ -32,13 +34,22 @@
> >>  #include <asm/hvm/svm/vmcb.h>
> >>  #include <asm/apic.h>
> >>  #include <public/pmu.h>
> >> +#include <xen/tasklet.h>
> >> +#include <xsm/xsm.h>
> >> +
> >> +#include <compat/pmu.h>
> >> +CHECK_pmu_params;
> >> +CHECK_pmu_intel_ctxt;
> >> +CHECK_pmu_amd_ctxt;
> >> +CHECK_pmu_cntr_pair;
> >>  
> >>  /*
> >>   * "vpmu" :     vpmu generally enabled
> >>   * "vpmu=off" : vpmu generally disabled
> >>   * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
> >>   */
> >> -static unsigned int __read_mostly opt_vpmu_enabled;
> >> +uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
> >> +uint64_t __read_mostly vpmu_features = 0;
> >>  static void parse_vpmu_param(char *s);
> >>  custom_param("vpmu", parse_vpmu_param);
> >>  
> >> @@ -52,7 +63,7 @@ static void __init parse_vpmu_param(char *s)
> >>          break;
> >>      default:
> >>          if ( !strcmp(s, "bts") )
> >> -            opt_vpmu_enabled |= VPMU_BOOT_BTS;
> >> +            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
> >>          else if ( *s )
> >>          {
> >>              printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
> >> @@ -60,7 +71,8 @@ static void __init parse_vpmu_param(char *s)
> >>          }
> >>          /* fall through */
> >>      case 1:
> >> -        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
> >> +        /* Default VPMU mode */
> >> +        vpmu_mode = XENPMU_MODE_SELF;
> >>          break;
> >>      }
> >>  }
> >> @@ -77,6 +89,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, 
> > uint64_t supported)
> >>  {
> >>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
> >>  
> >> +    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
> >> +        return 0;
> >> +
> >>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
> >>          return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
> >>      return 0;
> >> @@ -86,6 +101,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
> >>  {
> >>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
> >>  
> >> +    if ( !(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
> >> +        return 0;
> >> +
> >>      if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
> >>          return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
> >>      return 0;
> >> @@ -242,19 +260,19 @@ void vpmu_initialise(struct vcpu *v)
> >>      switch ( vendor )
> >>      {
> >>      case X86_VENDOR_AMD:
> >> -        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
> >> -            opt_vpmu_enabled = 0;
> >> +        if ( svm_vpmu_initialise(v) != 0 )
> >> +            vpmu_mode = XENPMU_MODE_OFF;
> >>          break;
> >>  
> >>      case X86_VENDOR_INTEL:
> >> -        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
> >> -            opt_vpmu_enabled = 0;
> >> +        if ( vmx_vpmu_initialise(v) != 0 )
> >> +            vpmu_mode = XENPMU_MODE_OFF;
> >>          break;
> >>  
> >>      default:
> >>          printk("VPMU: Initialization failed. "
> >>                 "Unknown CPU vendor %d\n", vendor);
> >> -        opt_vpmu_enabled = 0;
> >> +        vpmu_mode = XENPMU_MODE_OFF;
> >>          break;
> >>      }
> >>  }
> >> @@ -276,3 +294,175 @@ void vpmu_dump(struct vcpu *v)
> >>          vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
> >>  }
> >>  
> >> +static atomic_t vpmu_sched_counter;
> >> +
> >> +static void vpmu_sched_checkin(unsigned long unused)
> >> +{
> >> +    atomic_inc(&vpmu_sched_counter);
> >> +}
> >> +
> >> +static int vpmu_force_context_switch(void)
> >> +{
> >> +    unsigned i, j, allbutself_num, mycpu;
> >> +    static s_time_t start, now;
> >> +    struct tasklet **sync_task;
> >> +    struct vcpu *curr_vcpu = current;
> >> +    int ret = 0;
> >> +
> >> +    allbutself_num = num_online_cpus() - 1;
> >> +
> >> +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
> >> +    if ( !sync_task )
> >> +    {
> >> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of 
> > memory\n");
> >> +        return -ENOMEM;
> >> +    }
> >> +
> >> +    for ( i = 0; i < allbutself_num; i++ )
> >> +    {
> >> +        sync_task[i] = xmalloc(struct tasklet);
> >> +        if ( sync_task[i] == NULL )
> >> +        {
> >> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of 
> > memory\n");
> >> +            ret = -ENOMEM;
> >> +            goto out;
> >> +        }
> >> +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
> > 
> > Only a question of understanding.
> > Is there a special reason not to use a single memory allocation
> > except for memory fragmentation on systems with a large number of cpus?
> > 
> >      struct tasklet *sync_task;
> >      sync_task = xmalloc(sizeof(struct tasklet) * allbutself_num);
> 
> Apart from this then needing to be xmalloc_array() - yes, the
> reason here is to avoid non-order-zero runtime allocations. I.e.
> the alternative would be to provide something vmalloc()-like to
> be used here (or open code it as we do in a couple of other
> places).

Thank you for the hint!
Dietmar.

> 
> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-29 13:47             ` Jan Beulich
@ 2014-09-29 14:16               ` Boris Ostrovsky
  2014-09-29 14:33                 ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 14:16 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 09:47 AM, Jan Beulich wrote:
>>>> On 29.09.14 at 15:29, <boris.ostrovsky@oracle.com> wrote:
>> On 09/29/2014 02:43 AM, Jan Beulich wrote:
>>>>>> On 26.09.14 at 18:49, <konrad.wilk@oracle.com> wrote:
>>>> On Fri, Sep 26, 2014 at 04:10:09PM +0100, Jan Beulich wrote:
>>>>>>>> On 26.09.14 at 16:58, <konrad.wilk@oracle.com> wrote:
>>>>>> If I move them just a bit:
>>>>>>
>>>>>>
>>>>>> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
>>>>>> index 4f21b17..b97e476 100644
>>>>>> --- a/xen/include/public/platform.h
>>>>>> +++ b/xen/include/public/platform.h
>>>>>> @@ -538,9 +538,9 @@ struct xenpf_symdata {
>>>>>>                          /*      we reached the end                        */
>>>>>>    
>>>>>>        /* OUT variables */
>>>>>> -    char type;
>>>>>> -    XEN_GUEST_HANDLE(char) name;
>>>>>>        uint64_t address;
>>>>>> +    XEN_GUEST_HANDLE(char) name;
>>>>>> +    char type;
>>>>>>    };
>>>>>>    typedef struct xenpf_symdata xenpf_symdata_t;
>>>>>>    DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>>>>>>
>>>>>>
>>>>>> 'pahole' is satisfied:
>>>>>>
>>>>>> struct xenpf_symdata {
>>>>    
>>>>>>       uint32_t                   namelen;              /*     0     4 */
>>>>    
>>>>>>       uint32_t                   symnum;               /*     4     4 */
>>>>    
>>>>>>       uint64_t                   address;              /*     8     8 */
>>>>    
>>>>>>       __guest_handle_char        name;                 /*    16     8 */
>>>>    
>>>>>>       char                       type;                 /*    24     1 */
>>>>    
>>>>>>                                                                               
>>>>    
>>>>>>       /* size: 32, cachelines: 1, members: 5 */
>>>>    
>>>>>>       /* padding: 7 */
>>>>    
>>>>>>       /* last cacheline: 32 bytes */
>>>>    
>>>>>> };
>>>>>>
>>>>>>
>>>>>> With that change, Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>> This change buys us exactly nothing: Structure size doesn't change,
>>>>> and 7 bytes of padding are still there.
>>>> It does allow us to put more parameters (if we want to) at the end of the
>>>> structure instead of fitting them in between.
>>> Regardless of where the gap is, adding further fields in the future
>>> would work only if the code now checked that this field is zero
>>> (which first of all would require it being given a name). I keep
>>> pointing out that this should be done for all padding fields, but I'm
>>> afraid I may have missed doing so on this occasion.
>> I am not sure I understand how setting fields to zero would help with
>> figuring out whether a new fields has been added. I can see how it can
>> in some cases but not in general.
> If you check that padding fields are zero now, meaning can be
> assigned to them later on, while if you allow them to be uninitialized,
> that's not an option.

What if the new field is meant to be zero? You can't guarantee that if 
pad is zero it is still a pad on the "other side" of the call, can you?

Besides, in this particular case, the structure is set up in the caller, 
so presumably it should be the one clearing all pads (which it does, but 
not for this specific reason).

(Also, if we were to add a new field to xenpf_symdata there is plenty of 
space to add a new one (or three) at the end --- it is a part of a 
128-byte union and xenpf_symdata is currently 32 bytes.)

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
  2014-09-26 20:49   ` Tian, Kevin
@ 2014-09-29 14:17   ` Jan Beulich
  2014-09-29 14:30     ` Jan Beulich
  2014-09-29 14:57     ` Boris Ostrovsky
  1 sibling, 2 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 14:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> @@ -385,7 +389,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
>  
> -    ctxt = xzalloc(struct amd_vpmu_context);
> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>      if ( !ctxt )
>      {
>          gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> @@ -394,7 +399,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>          return -ENOMEM;
>      }
>  
> +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;

Is using the compile time count really necessary? I.e. is the runtime
limit (which hopefully is going to be lower) not possible here? If not,
why is doing so on the VMX side possible?

> @@ -228,6 +229,11 @@ void vpmu_initialise(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t vendor = current_cpu_data.x86_vendor;
>  
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);

I'm having trouble finding where struct compat_pmu_regs gets defined
(largely since you're not adding anything to xen/include/xlat.h).

>  #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>                                            arch.hvm_vcpu.vpmu))
> -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)

Is this really useful to delete, i.e. are absolutely sure that no future
use will ever arise?

> --- a/xen/include/public/arch-x86/xen-x86_64.h
> +++ b/xen/include/public/arch-x86/xen-x86_64.h
> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>  typedef struct cpu_user_regs cpu_user_regs_t;
>  DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
>  
> +struct xen_pmu_regs {
> +    __DECL_REG(ip);
> +    __DECL_REG(sp);

Do you really need __DECL_REG() here? I.e. can't these two fields
be just xen_ulong_t e[is]p and the structure definition then be
shared with 32-bit code (and hence moved altogether into pmu.h)?

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 14:17   ` Jan Beulich
@ 2014-09-29 14:30     ` Jan Beulich
  2014-09-29 15:19       ` Boris Ostrovsky
  2014-09-29 14:57     ` Boris Ostrovsky
  1 sibling, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 14:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 16:17, <JBeulich@suse.com> wrote:
>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/include/public/arch-x86/xen-x86_64.h
>> +++ b/xen/include/public/arch-x86/xen-x86_64.h
>> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>>  typedef struct cpu_user_regs cpu_user_regs_t;
>>  DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
>>  
>> +struct xen_pmu_regs {
>> +    __DECL_REG(ip);
>> +    __DECL_REG(sp);
> 
> Do you really need __DECL_REG() here? I.e. can't these two fields
> be just xen_ulong_t e[is]p and the structure definition then be
> shared with 32-bit code (and hence moved altogether into pmu.h)?

Otoh - is cs useful at all on 64-bit?

And thinking of that - is esp without ss useful on 32-bit? And
are cs (and maybe ss) useful without knowing the execution
mode of the target?

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-09-29 14:16               ` Boris Ostrovsky
@ 2014-09-29 14:33                 ` Jan Beulich
  0 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 14:33 UTC (permalink / raw)
  To: Boris Ostrovsky, Konrad Rzeszutek Wilk
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 16:16, <boris.ostrovsky@oracle.com> wrote:
> On 09/29/2014 09:47 AM, Jan Beulich wrote:
>>>>> On 29.09.14 at 15:29, <boris.ostrovsky@oracle.com> wrote:
>>> I am not sure I understand how setting fields to zero would help with
>>> figuring out whether a new fields has been added. I can see how it can
>>> in some cases but not in general.
>> If you check that padding fields are zero now, meaning can be
>> assigned to them later on, while if you allow them to be uninitialized,
>> that's not an option.
> 
> What if the new field is meant to be zero? You can't guarantee that if 
> pad is zero it is still a pad on the "other side" of the call, can you?

I don't understand what you trying to tell me here.

> Besides, in this particular case, the structure is set up in the caller, 
> so presumably it should be the one clearing all pads (which it does, but 
> not for this specific reason).

Of course it's the caller to zero the field. And the callee (hypervisor)
to check that it's zero. I never said the hypervisor should clear it.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 14:17   ` Jan Beulich
  2014-09-29 14:30     ` Jan Beulich
@ 2014-09-29 14:57     ` Boris Ostrovsky
  2014-09-29 15:40       ` Jan Beulich
  1 sibling, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 14:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 10:17 AM, Jan Beulich wrote:
>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>> @@ -385,7 +389,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>   	 }
>>       }
>>   
>> -    ctxt = xzalloc(struct amd_vpmu_context);
>> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>>       if ( !ctxt )
>>       {
>>           gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>> @@ -394,7 +399,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>           return -ENOMEM;
>>       }
>>   
>> +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
>> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
> Is using the compile time count really necessary? I.e. is the runtime
> limit (which hopefully is going to be lower) not possible here? If not,
> why is doing so on the VMX side possible?


I can use runtime value.


>
>> @@ -228,6 +229,11 @@ void vpmu_initialise(struct vcpu *v)
>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>       uint8_t vendor = current_cpu_data.x86_vendor;
>>   
>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
>> +    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
> I'm having trouble finding where struct compat_pmu_regs gets defined
> (largely since you're not adding anything to xen/include/xlat.h).


It is generated into include/compat/arch-x86/xen-x86_32.h which is 
included via xen.h. And so is xlat.h -- but I didn't think I needed to 
add anything there (via xlat.lst) since I didn't see any reason for 
checking or translating it.


>
>>   #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>>                                             arch.hvm_vcpu.vpmu))
>> -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
> Is this really useful to delete, i.e. are absolutely sure that no future
> use will ever arise?

We never use it (and never had, apparently) so I didn't see any reason 
to carry it forward.

>
>> --- a/xen/include/public/arch-x86/xen-x86_64.h
>> +++ b/xen/include/public/arch-x86/xen-x86_64.h
>> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>>   typedef struct cpu_user_regs cpu_user_regs_t;
>>   DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
>>   
>> +struct xen_pmu_regs {
>> +    __DECL_REG(ip);
>> +    __DECL_REG(sp);
> Do you really need __DECL_REG() here? I.e. can't these two fields
> be just xen_ulong_t e[is]p and the structure definition then be
> shared with 32-bit code (and hence moved altogether into pmu.h)?

I wasn't sure which way to go. The reason I did it this way was because 
I was essentially following cpu_user_regs() implementation.


-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                     ` (2 preceding siblings ...)
  2014-09-29 13:25   ` Dietmar Hahn
@ 2014-09-29 15:14   ` Jan Beulich
  2014-09-29 15:34     ` Boris Ostrovsky
  2014-10-01  0:48   ` Tian, Kevin
  4 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 15:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> +static int vpmu_force_context_switch(void)
> +{
> +    unsigned i, j, allbutself_num, mycpu;
> +    static s_time_t start, now;

I don't think "now" needs to be static. In fact it would perhaps better
be moved into the more narrow scope below.

> +    struct tasklet **sync_task;
> +    struct vcpu *curr_vcpu = current;
> +    int ret = 0;
> +
> +    allbutself_num = num_online_cpus() - 1;
> +
> +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);

So while the allocation further down got broken up properly, this
till degenerates to an order-greater-0 allocation for beyond 512
CPUs. I think you'd be better of (ab)using per-CPU data here.

> +    if ( !sync_task )
> +    {
> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> +        return -ENOMEM;
> +    }
> +
> +    for ( i = 0; i < allbutself_num; i++ )
> +    {
> +        sync_task[i] = xmalloc(struct tasklet);
> +        if ( sync_task[i] == NULL )
> +        {
> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
> +            ret = -ENOMEM;
> +            goto out;
> +        }
> +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
> +    }
> +
> +    atomic_set(&vpmu_sched_counter, 0);
> +
> +    j = 0;
> +    mycpu = smp_processor_id();
> +    for_each_online_cpu( i )
> +    {
> +        if ( i != mycpu )
> +            tasklet_schedule_on_cpu(sync_task[j++], i);
> +    }
> +
> +    vpmu_save(curr_vcpu);
> +
> +    start = NOW();
> +
> +    /*
> +     * Note that we may fail here if a CPU is hot-plugged while we are
> +     * waiting. We will then time out.
> +     */
> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
> +    {
> +        cpu_relax();
> +
> +        now = NOW();
> +
> +        /* Give up after 5 seconds */
> +        if ( now > start + SECONDS(5) )
> +        {
> +            printk(XENLOG_WARNING
> +                   "vpmu_force_context_switch: failed to sync\n");
> +            ret = -EBUSY;
> +            break;
> +        }
> +
> +        /* Or after 2 milliseconds if need to be preempted */

/* Or after 2 ms (arbitrary value) if need to be preempted. */

> +        if ( (now > start + MILLISECS(2)) && hypercall_preempt_check() )
> +        {
> +            ret = -EAGAIN;
> +            break;
> +        }

And then this won't do what (I hope) you want on any continuation:
"start" doesn't get updated again in that case.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 14:30     ` Jan Beulich
@ 2014-09-29 15:19       ` Boris Ostrovsky
  2014-09-29 15:41         ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 15:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 10:30 AM, Jan Beulich wrote:
>>>> On 29.09.14 at 16:17, <JBeulich@suse.com> wrote:
>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/include/public/arch-x86/xen-x86_64.h
>>> +++ b/xen/include/public/arch-x86/xen-x86_64.h
>>> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>>>   typedef struct cpu_user_regs cpu_user_regs_t;
>>>   DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
>>>   
>>> +struct xen_pmu_regs {
>>> +    __DECL_REG(ip);
>>> +    __DECL_REG(sp);
>> Do you really need __DECL_REG() here? I.e. can't these two fields
>> be just xen_ulong_t e[is]p and the structure definition then be
>> shared with 32-bit code (and hence moved altogether into pmu.h)?
> Otoh - is cs useful at all on 64-bit?

perf uses user_mode() to figure which mode we are in and that requires CS.

>
> And thinking of that - is esp without ss useful on 32-bit? And
> are cs (and maybe ss) useful without knowing the execution
> mode of the target?

I don't know how exactly ESP is used (by perf, which is the only tool 
that I have been testing with). For performance counters it is not used 
at all, I added it only because it will clearly be needed for stack 
unwinding when this becomes supported. But presumably SS would also be 
necessary, yes.

CS is useful because only CPL field is looked at.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
  2014-09-26 22:16   ` Daniel De Graaf
@ 2014-09-29 15:25   ` Jan Beulich
  2014-09-29 15:41     ` Boris Ostrovsky
  2014-10-01  0:16   ` Tian, Kevin
  2 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 15:25 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> @@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
>  
> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
> -    if ( !ctxt )
> +    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
> +    if ( is_hvm_domain(v->domain) )
>      {
> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
> -            v->vcpu_id, v->domain->domain_id);
> -        return -ENOMEM;
> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
> +        if ( !ctxt )
> +        {
> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> +                "PMU feature is unavailable\n");
> +            return -ENOMEM;
> +        }
> +    }
> +    else
> +    {
> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )

This is a compile time constant condition - no reason to issue a
message and return failure at runtime, just BUILD_BUG_ON() instead.

> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -356,25 +356,45 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
>      uint64_t *p = NULL;
> +    unsigned int regs_size;
>  
> -    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
> -        return 0;
> -
> -    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> -    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +    p = xzalloc_bytes(sizeof(uint64_t));
> +    if ( !p )
>          goto out_err;
>  
> -    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> -        goto out_err;
> -    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> -
> -    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
> -                                   sizeof(uint64_t) * fixed_pmc_cnt +
> -                                   sizeof(struct xen_pmu_cntr_pair) *
> -                                   arch_pmc_cnt);
> -    p = xzalloc(uint64_t);
> -    if ( !core2_vpmu_cxt || !p )
> -        goto out_err;
> +    if ( has_hvm_container_domain(v->domain) )
> +    {
> +        if ( is_hvm_domain(v->domain) && !acquire_pmu_ownership(PMU_OWNER_HVM) )
> +            goto out_err;
> +
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +            goto out_err_hvm;
> +        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +            goto out_err_hvm;
> +        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +    }
> +
> +    regs_size = sizeof(uint64_t) * fixed_pmc_cnt +
> +                sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt;
> +    if ( is_hvm_domain(v->domain) )
> +    {
> +        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
> +                                       regs_size);
> +        if ( !core2_vpmu_cxt )
> +            goto out_err_hvm;
> +    }
> +    else
> +    {
> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
> +        {
> +            printk(XENLOG_WARNING

XENLOG_G_WARNING

Also, with the constituents of regs_size not changing at runtime,
issuing the message just once (and then perhaps in some __init
function) would seem the better approach.

> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    struct page_info *page;
> +    uint64_t gfn = params->val;
> +
> +    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> +         (d->vcpu[params->vcpu] == NULL) )
> +        return -EINVAL;
> +
> +    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> +    if ( !page )
> +        return -EINVAL;
> +
> +    if ( !get_page_type(page, PGT_writable_page) )
> +    {
> +        put_page(page);
> +        return -EINVAL;
> +    }
> +
> +    v = d->vcpu[params->vcpu];
> +    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
> +    if ( !v->arch.vpmu.xenpmu_data )
> +    {
> +        put_page_and_type(page);
> +        return -EINVAL;
> +    }
> +
> +    vpmu_initialise(v);
> +
> +    return 0;
> +}
> +
> +static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    uint64_t mfn;
> +
> +    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> +         (d->vcpu[params->vcpu] == NULL) )
> +        return;
> +
> +    v = d->vcpu[params->vcpu];
> +    if ( v != current )
> +        vcpu_pause(v);
> +
> +    if ( v->arch.vpmu.xenpmu_data )
> +    {
> +        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
> +        if ( mfn_valid(mfn) )

Isn't this a must knowing that v->arch.vpmu.xenpmu_data is not
NULL? I.e. ASSERT()?

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-29 15:14   ` Jan Beulich
@ 2014-09-29 15:34     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 15:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 11:14 AM, Jan Beulich wrote:
>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>> +static int vpmu_force_context_switch(void)
>> +{
>> +    unsigned i, j, allbutself_num, mycpu;
>> +    static s_time_t start, now;
> I don't think "now" needs to be static. In fact it would perhaps better
> be moved into the more narrow scope below.
>
>> +    struct tasklet **sync_task;
>> +    struct vcpu *curr_vcpu = current;
>> +    int ret = 0;
>> +
>> +    allbutself_num = num_online_cpus() - 1;
>> +
>> +    sync_task = xzalloc_array(struct tasklet *, allbutself_num);
> So while the allocation further down got broken up properly, this
> till degenerates to an order-greater-0 allocation for beyond 512
> CPUs. I think you'd be better of (ab)using per-CPU data here.
>
>> +    if ( !sync_task )
>> +    {
>> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +    {
>> +        sync_task[i] = xmalloc(struct tasklet);
>> +        if ( sync_task[i] == NULL )
>> +        {
>> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
>> +            ret = -ENOMEM;
>> +            goto out;
>> +        }
>> +        tasklet_init(sync_task[i], vpmu_sched_checkin, 0);
>> +    }
>> +
>> +    atomic_set(&vpmu_sched_counter, 0);
>> +
>> +    j = 0;
>> +    mycpu = smp_processor_id();
>> +    for_each_online_cpu( i )
>> +    {
>> +        if ( i != mycpu )
>> +            tasklet_schedule_on_cpu(sync_task[j++], i);
>> +    }
>> +
>> +    vpmu_save(curr_vcpu);
>> +
>> +    start = NOW();
>> +
>> +    /*
>> +     * Note that we may fail here if a CPU is hot-plugged while we are
>> +     * waiting. We will then time out.
>> +     */
>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>> +    {
>> +        cpu_relax();
>> +
>> +        now = NOW();
>> +
>> +        /* Give up after 5 seconds */
>> +        if ( now > start + SECONDS(5) )
>> +        {
>> +            printk(XENLOG_WARNING
>> +                   "vpmu_force_context_switch: failed to sync\n");
>> +            ret = -EBUSY;
>> +            break;
>> +        }
>> +
>> +        /* Or after 2 milliseconds if need to be preempted */
> /* Or after 2 ms (arbitrary value) if need to be preempted. */
>
>> +        if ( (now > start + MILLISECS(2)) && hypercall_preempt_check() )
>> +        {
>> +            ret = -EAGAIN;
>> +            break;
>> +        }
> And then this won't do what (I hope) you want on any continuation:
> "start" doesn't get updated again in that case.

Which is fine, 'start' doesn't need to be be updated, it's not (well, 
shouldn't be) a static any longer (just like 'now')

We don't have continuation any more, -EAGAIN will be passed directly to 
the user and the whole operation will need to be restarted by user. So 
'start' will get  reassigned as NOW().

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 14:57     ` Boris Ostrovsky
@ 2014-09-29 15:40       ` Jan Beulich
  2014-09-29 15:56         ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 15:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 16:57, <boris.ostrovsky@oracle.com> wrote:
> On 09/29/2014 10:17 AM, Jan Beulich wrote:
>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -228,6 +229,11 @@ void vpmu_initialise(struct vcpu *v)
>>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>>       uint8_t vendor = current_cpu_data.x86_vendor;
>>>   
>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
>>> +    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
>> I'm having trouble finding where struct compat_pmu_regs gets defined
>> (largely since you're not adding anything to xen/include/xlat.h).
> 
> 
> It is generated into include/compat/arch-x86/xen-x86_32.h which is 
> included via xen.h. And so is xlat.h -- but I didn't think I needed to 
> add anything there (via xlat.lst) since I didn't see any reason for 
> checking or translating it.

Right. In which case the check above is a bit pointless, and
possibly confusing (just like it happened to me now). If you
need the struct in a subsequent patch, move the check there.
If you don't ever need it till the end of your series, please
annotate the above stating that this is just for completeness
despite the hypervisor itself never using the structure.

>>>   #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>>>                                             arch.hvm_vcpu.vpmu))
>>> -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
>> Is this really useful to delete, i.e. are absolutely sure that no future
>> use will ever arise?
> 
> We never use it (and never had, apparently) so I didn't see any reason 
> to carry it forward.

Oh, if it was never used, then the change is unrelated here - I was
taking it being here as a sign that the patch replaced the last
existing user.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-29 15:25   ` Jan Beulich
@ 2014-09-29 15:41     ` Boris Ostrovsky
  2014-09-29 15:42       ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 15:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 11:25 AM, Jan Beulich wrote:
>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>> @@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>   	 }
>>       }
>>   
>> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>> -    if ( !ctxt )
>> +    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
>> +    if ( is_hvm_domain(v->domain) )
>>       {
>> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
>> -            v->vcpu_id, v->domain->domain_id);
>> -        return -ENOMEM;
>> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
>> +        if ( !ctxt )
>> +        {
>> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>> +                "PMU feature is unavailable\n");
>> +            return -ENOMEM;
>> +        }
>> +    }
>> +    else
>> +    {
>> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
> This is a compile time constant condition - no reason to issue a
> message and return failure at runtime, just BUILD_BUG_ON() instead.

It will not be if I replace AMD_MAX_COUNTERS with runtime register 
count, as you asked in an earlier comment.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 15:19       ` Boris Ostrovsky
@ 2014-09-29 15:41         ` Jan Beulich
  2014-09-29 15:48           ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 15:41 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 17:19, <boris.ostrovsky@oracle.com> wrote:
> On 09/29/2014 10:30 AM, Jan Beulich wrote:
>>>>> On 29.09.14 at 16:17, <JBeulich@suse.com> wrote:
>>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>>> --- a/xen/include/public/arch-x86/xen-x86_64.h
>>>> +++ b/xen/include/public/arch-x86/xen-x86_64.h
>>>> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>>>>   typedef struct cpu_user_regs cpu_user_regs_t;
>>>>   DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
>>>>   
>>>> +struct xen_pmu_regs {
>>>> +    __DECL_REG(ip);
>>>> +    __DECL_REG(sp);
>>> Do you really need __DECL_REG() here? I.e. can't these two fields
>>> be just xen_ulong_t e[is]p and the structure definition then be
>>> shared with 32-bit code (and hence moved altogether into pmu.h)?
>> Otoh - is cs useful at all on 64-bit?
> 
> perf uses user_mode() to figure which mode we are in and that requires CS.

But that won't be complete without also looking at EFLAGS.VM.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-29 15:41     ` Boris Ostrovsky
@ 2014-09-29 15:42       ` Jan Beulich
  2014-09-29 16:04         ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 15:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 17:41, <boris.ostrovsky@oracle.com> wrote:
> On 09/29/2014 11:25 AM, Jan Beulich wrote:
>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>>   	 }
>>>       }
>>>   
>>> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>>> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>>> -    if ( !ctxt )
>>> +    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
>>> +    if ( is_hvm_domain(v->domain) )
>>>       {
>>> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>>> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
>>> -            v->vcpu_id, v->domain->domain_id);
>>> -        return -ENOMEM;
>>> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
>>> +        if ( !ctxt )
>>> +        {
>>> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>>> +                "PMU feature is unavailable\n");
>>> +            return -ENOMEM;
>>> +        }
>>> +    }
>>> +    else
>>> +    {
>>> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
>> This is a compile time constant condition - no reason to issue a
>> message and return failure at runtime, just BUILD_BUG_ON() instead.
> 
> It will not be if I replace AMD_MAX_COUNTERS with runtime register 
> count, as you asked in an earlier comment.

For which case see the respective VMX side comment.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 15:41         ` Jan Beulich
@ 2014-09-29 15:48           ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 15:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 11:41 AM, Jan Beulich wrote:
>>>> On 29.09.14 at 17:19, <boris.ostrovsky@oracle.com> wrote:
>> On 09/29/2014 10:30 AM, Jan Beulich wrote:
>>>>>> On 29.09.14 at 16:17, <JBeulich@suse.com> wrote:
>>>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>>>> --- a/xen/include/public/arch-x86/xen-x86_64.h
>>>>> +++ b/xen/include/public/arch-x86/xen-x86_64.h
>>>>> @@ -174,6 +174,14 @@ struct cpu_user_regs {
>>>>>    typedef struct cpu_user_regs cpu_user_regs_t;
>>>>>    DEFINE_XEN_GUEST_HANDLE(cpu_user_regs_t);
>>>>>    
>>>>> +struct xen_pmu_regs {
>>>>> +    __DECL_REG(ip);
>>>>> +    __DECL_REG(sp);
>>>> Do you really need __DECL_REG() here? I.e. can't these two fields
>>>> be just xen_ulong_t e[is]p and the structure definition then be
>>>> shared with 32-bit code (and hence moved altogether into pmu.h)?
>>> Otoh - is cs useful at all on 64-bit?
>> perf uses user_mode() to figure which mode we are in and that requires CS.
> But that won't be complete without also looking at EFLAGS.VM.
>

True. In perf case, they explicitly state that v8086 mode is ruled out 
so EFLAGS check can be bypassed. But for completeness I should put 
EFLAGS in.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 13/20] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 13/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-09-29 15:52   ` Jan Beulich
  0 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 15:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> Save VPMU state during context switch for both HVM and PV(H) guests.
> 
> A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible
> for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs
> is_running to be correctly set/cleared. To prepare for that, call 
> context_saved()
> before vpmu_switch_to() is executed. (Note that while this change could have
> been dalayed until that later patch, the changes are harmless to existing 
> code
> and so we do it here)
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h
  2014-09-29 15:40       ` Jan Beulich
@ 2014-09-29 15:56         ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 15:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 11:40 AM, Jan Beulich wrote:
>>>> On 29.09.14 at 16:57, <boris.ostrovsky@oracle.com> wrote:
>> On 09/29/2014 10:17 AM, Jan Beulich wrote:
>>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>>> @@ -228,6 +229,11 @@ void vpmu_initialise(struct vcpu *v)
>>>>        struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>>>        uint8_t vendor = current_cpu_data.x86_vendor;
>>>>    
>>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
>>>> +    BUILD_BUG_ON(sizeof(struct compat_pmu_regs) > XENPMU_REGS_PAD_SZ);
>>> I'm having trouble finding where struct compat_pmu_regs gets defined
>>> (largely since you're not adding anything to xen/include/xlat.h).
>>
>> It is generated into include/compat/arch-x86/xen-x86_32.h which is
>> included via xen.h. And so is xlat.h -- but I didn't think I needed to
>> add anything there (via xlat.lst) since I didn't see any reason for
>> checking or translating it.
> Right. In which case the check above is a bit pointless, and
> possibly confusing (just like it happened to me now). If you
> need the struct in a subsequent patch, move the check there.
> If you don't ever need it till the end of your series, please
> annotate the above stating that this is just for completeness
> despite the hypervisor itself never using the structure.

Hypervisor does use but in a later patch (#16, interrupt handling). I'll 
move the check there.


>>>>    #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>>>>                                              arch.hvm_vcpu.vpmu))
>>>> -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
>>> Is this really useful to delete, i.e. are absolutely sure that no future
>>> use will ever arise?
>> We never use it (and never had, apparently) so I didn't see any reason
>> to carry it forward.
> Oh, if it was never used, then the change is unrelated here - I was
> taking it being here as a sign that the patch replaced the last
> existing user.

I'll add a comment about removal of this macro into commit message.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-29 15:42       ` Jan Beulich
@ 2014-09-29 16:04         ` Boris Ostrovsky
  2014-09-29 16:10           ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-29 16:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 09/29/2014 11:42 AM, Jan Beulich wrote:
>>>> On 29.09.14 at 17:41, <boris.ostrovsky@oracle.com> wrote:
>> On 09/29/2014 11:25 AM, Jan Beulich wrote:
>>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>>> @@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>>>    	 }
>>>>        }
>>>>    
>>>> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>>>> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>>>> -    if ( !ctxt )
>>>> +    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
>>>> +    if ( is_hvm_domain(v->domain) )
>>>>        {
>>>> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>>>> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
>>>> -            v->vcpu_id, v->domain->domain_id);
>>>> -        return -ENOMEM;
>>>> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
>>>> +        if ( !ctxt )
>>>> +        {
>>>> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>>>> +                "PMU feature is unavailable\n");
>>>> +            return -ENOMEM;
>>>> +        }
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
>>> This is a compile time constant condition - no reason to issue a
>>> message and return failure at runtime, just BUILD_BUG_ON() instead.
>> It will not be if I replace AMD_MAX_COUNTERS with runtime register
>> count, as you asked in an earlier comment.
> For which case see the respective VMX side comment.

Not sure I understand what you mean here. it will be almost exactly the 
same.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
  2014-09-26 16:34   ` Konrad Rzeszutek Wilk
@ 2014-09-29 16:04   ` Jan Beulich
  2014-10-01  0:17   ` Tian, Kevin
  2 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 16:04 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> Intercept accesses to PMU MSRs and process them in VPMU module.
> 
> Dump VPMU state for all domains (HVM and PV) when requested.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
provided ...

> @@ -2561,7 +2569,22 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>              if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
>                  wrmsrl(regs->_ecx, msr_content);
>              break;
> -
> +        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
> +        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +            {
> +                vpmu_msr = 1;
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +                if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
> +                {
> +                    if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
> +                        goto fail;
> +                }
> +                break;
> +            }
> +            /*FALLTHROUGH*/
>          default:
>              if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
>                  break;
> @@ -2593,6 +2616,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          break;
>  
>      case 0x32: /* RDMSR */
> +        vpmu_msr = 0;
>          switch ( (u32)regs->ecx )
>          {
>          case MSR_FS_BASE:
> @@ -2663,7 +2687,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>                              [regs->_ecx - MSR_AMD64_DR1_ADDRESS_MASK + 1];
>              regs->edx = 0;
>              break;
> +        case MSR_IA32_PERF_CAPABILITIES:
> +            /* No extra capabilities are supported */
> +            regs->eax = regs->edx = 0;
> +            break;
> +        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
> +        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +            {
> +                vpmu_msr = 1;
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +                if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
> +                {
> +                    if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
> +                        goto fail;
>  
> +                    regs->eax = (uint32_t)msr_content;
> +                    regs->edx = (uint32_t)(msr_content >> 32);
> +                }
> +                break;
> +            }
> +            /*FALLTHROUGH*/
>          default:
>              if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
>              {

... you retain the blank lines and suitably add another one in each
code chunk.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-29 16:04         ` Boris Ostrovsky
@ 2014-09-29 16:10           ` Jan Beulich
  0 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-29 16:10 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 29.09.14 at 18:04, <boris.ostrovsky@oracle.com> wrote:
> On 09/29/2014 11:42 AM, Jan Beulich wrote:
>>>>> On 29.09.14 at 17:41, <boris.ostrovsky@oracle.com> wrote:
>>> On 09/29/2014 11:25 AM, Jan Beulich wrote:
>>>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>>>> @@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>>>>    	 }
>>>>>        }
>>>>>    
>>>>> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>>>>> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>>>>> -    if ( !ctxt )
>>>>> +    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
>>>>> +    if ( is_hvm_domain(v->domain) )
>>>>>        {
>>>>> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>>>>> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
>>>>> -            v->vcpu_id, v->domain->domain_id);
>>>>> -        return -ENOMEM;
>>>>> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
>>>>> +        if ( !ctxt )
>>>>> +        {
>>>>> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
>>>>> +                "PMU feature is unavailable\n");
>>>>> +            return -ENOMEM;
>>>>> +        }
>>>>> +    }
>>>>> +    else
>>>>> +    {
>>>>> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
>>>> This is a compile time constant condition - no reason to issue a
>>>> message and return failure at runtime, just BUILD_BUG_ON() instead.
>>> It will not be if I replace AMD_MAX_COUNTERS with runtime register
>>> count, as you asked in an earlier comment.
>> For which case see the respective VMX side comment.
> 
> Not sure I understand what you mean here. it will be almost exactly the 
> same.

I said on the respective VMX code that issuing the message
repeatedly at normal runtime is pointless - it suffices to issue it
once during boot.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
  2014-09-26 22:09   ` Daniel De Graaf
@ 2014-09-30  8:11   ` Jan Beulich
  2014-09-30 15:07     ` Boris Ostrovsky
  1 sibling, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-30  8:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -80,44 +80,191 @@ static void __init parse_vpmu_param(char *s)
>  
>  void vpmu_lvtpc_update(uint32_t val)
>  {
> -    struct vpmu_struct *vpmu = vcpu_vpmu(current);
> +    struct vcpu *curr = current;
> +    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
>  
>      vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
> -    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> +
> +    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending 
> */
> +    if ( is_hvm_domain(curr->domain) ||

Please don't open-code is_hvm_vcpu() (more instances elsewhere).

> +         !(vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )

I think readability would benefit if you resolved the !(&&) to !||!
(making it the proper inverse of what you do in vpmu_do_wrmsr()
and vpmu_do_rdmsr()).

> +static struct vcpu *choose_hwdom_vcpu(void)
> +{
> +    struct vcpu *v;
> +    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
> +
> +    if ( hardware_domain->vcpu == NULL )
> +        return NULL;
> +
> +    v = hardware_domain->vcpu[idx];
> +
> +    /*
> +     * If index is not populated search downwards the vcpu array until
> +     * a valid vcpu can be found
> +     */
> +    while ( !v && idx-- )
> +        v = hardware_domain->vcpu[idx];

Each time I get here I wonder what case this is good for.

>  int vpmu_do_interrupt(struct cpu_user_regs *regs)
>  {
> -    struct vcpu *v = current;
> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
> +    struct vcpu *sampled = current, *sampling;
> +    struct vpmu_struct *vpmu;
> +
> +    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
> +    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
> +    {
> +        sampling = choose_hwdom_vcpu();
> +        if ( !sampling )
> +            return 0;
> +    }
> +    else
> +        sampling = sampled;
> +
> +    vpmu = vcpu_vpmu(sampling);
> +    if ( !is_hvm_domain(sampling->domain) )
> +    {
> +        /* PV(H) guest */
> +        const struct cpu_user_regs *cur_regs;
> +
> +        if ( !vpmu->xenpmu_data )
> +            return 0;
> +
> +        if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED )
> +            return 1;
> +
> +        if ( is_pvh_domain(sampled->domain) &&

Here and below - is this really the right condition? I.e. is the
opposite case (doing nothing here, but the one further down
having an else) really meant to cover both HVM and PV? The outer
!is_hvm_() doesn't count here as that acts on sampling, not
sampled.

> +             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
> +            return 0;
> +
> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
> +        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +        vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
> +        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
> +
> +        /* Store appropriate registers in xenpmu_data */
> +        if ( is_pv_32bit_domain(sampling->domain) )

I think this needs a 32-bit PVH fixme annotation?

> +        {
> +            /*
> +             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
> +             * and therefore we treat it the same way as a non-privileged
> +             * PV 32-bit domain.
> +             */
> +            struct compat_pmu_regs *cmp;
> +
> +            cur_regs = guest_cpu_user_regs();
> +
> +            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
> +            cmp->eip = cur_regs->rip;
> +            cmp->esp = cur_regs->rsp;
> +            cmp->cs = cur_regs->cs;
> +            if ( (cmp->cs & 3) == 1 )
> +                cmp->cs &= ~3;
> +        }
> +        else
> +        {
> +            struct xen_pmu_regs *r = &vpmu->xenpmu_data->pmu.r.regs;
> +
> +            /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
> +            if ( (vpmu_mode & XENPMU_MODE_SELF) ||
> +                 (!is_hardware_domain(sampled->domain) &&
> +                  !is_idle_vcpu(sampled)) )
> +                cur_regs = guest_cpu_user_regs();
> +            else
> +                cur_regs = regs;
> +
> +            r->rip = cur_regs->rip;
> +            r->rsp = cur_regs->rsp;
> +
> +            if ( !is_pvh_domain(sampled->domain) )

(This is the other instance.)

> +            {
> +                r->cs = cur_regs->cs;
> +                if ( sampled->arch.flags & TF_kernel_mode )
> +                    r->cs &= ~3;

And once again I wonder how the consumer of this data is to tell
apart guest kernel and hypervisor addresses.

> +            }
> +            else
> +            {
> +                struct segment_register seg_cs;
> +
> +                hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
> +                r->cs = seg_cs.sel;
> +            }
> +        }
> +
> +        vpmu->xenpmu_data->domain_id = DOMID_SELF;
> +        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
> +        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
> +
> +        vpmu->xenpmu_data->pmu_flags |= PMU_CACHED;
> +        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
> +        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> +
> +        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
> +
> +        return 1;
> +    }
>  
>      if ( vpmu->arch_vpmu_ops )

So getting here you assume sampling == sampled (because of
sampling being HVM) - is that worth an assertion (with comment) to
cover the case of HVM Dom0 becoming possible in the future?

> @@ -502,14 +652,6 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>          spin_unlock(&xenpmu_mode_lock);
>          break;
>  
> -    case XENPMU_lvtpc_set:
> -        if ( current->arch.vpmu.xenpmu_data == NULL )
> -            return -EINVAL;
> -        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
> -        ret = 0;
> -        break;
> -    }

Is there an actual point in moving this down:

> @@ -548,6 +690,21 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>          pvpmu_finish(current->domain, &pmu_params);
>          break;
>  
> +    case XENPMU_lvtpc_set:
> +        curr = current;
> +        if ( curr->arch.vpmu.xenpmu_data == NULL )
> +            return -EINVAL;
> +        vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
> +        break;
> +
> +    case XENPMU_flush:
> +        curr = current;
> +        curr->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
> +        vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
> +        vpmu_load(curr);
> +        break;
> +    }

And if there is - why can't this be put in its final place right when the
operation gets added?

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 17/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 17/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
@ 2014-09-30  8:13   ` Jan Beulich
  0 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-30  8:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> +static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
> +                                uint64_t supported)
> +{
> +    uint64_t val = msr_content;
> +    return vpmu_do_msr(msr, &val, supported, 1);
> +}

What is the local variable good for? The function parameter already
is a variable distinct from the caller's (since this is an inline function,
not a macro).

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-09-30  8:18   ` Jan Beulich
  2014-09-30 15:16     ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-30  8:18 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2579,6 +2579,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
>                  if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
>                  {
> +                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
> +                         !is_hardware_domain(v->domain) )
> +                        break;
> +
>                      if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
>                          goto fail;
>                  }
> @@ -2701,6 +2705,14 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
>                  if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
>                  {
> +                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
> +                         !is_hardware_domain(v->domain) )
> +                    {
> +                        /* Don't leak PMU MSRs to unprivileged domains */
> +                        regs->eax = regs->edx = 0;
> +                        break;
> +                    }
> +
>                      if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
>                          goto fail;
>  

Is ignoring writes and returning zeroes for reads really reasonable in
this case? I.e. is the guest validly being told that there is a (v)PMU?
Because if it's not, it has no business accessing these MSRs and
hence should probably get a #GP instead.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-09-30  8:37   ` Jan Beulich
  2014-10-01  0:18     ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-30  8:37 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>  static void __init parse_vpmu_param(char *s)
>  {
> -    switch ( parse_bool(s) )
> -    {
> -    case 0:
> -        break;
> -    default:
> -        if ( !strcmp(s, "bts") )
> -            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
> -        else if ( *s )
> +    char *ss;
> +
> +    vpmu_mode = XENPMU_MODE_SELF;
> +    if (*s == '\0')
> +        return;
> +
> +    do {
> +        ss = strchr(s, ',');
> +        if ( ss )
> +            *ss = '\0';
> +
> +        switch  ( parse_bool(s) )
>          {
> -            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
> -            break;
> +        case 0:
> +            vpmu_mode = XENPMU_MODE_OFF;
> +            /* FALLTHROUGH */
> +        case 1:
> +            return;

If you do this much of redundant code folding, then I think you
should also go the final step and fold above five lines into the
code below:

> +        default:
> +            if ( !strcmp(s, "nmi") )
> +                vpmu_interrupt_type = APIC_DM_NMI;
> +            else if ( !strcmp(s, "bts") )
> +                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
> +            else
> +            {
> +                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);

case 0:

> +                vpmu_mode = XENPMU_MODE_OFF;

case 1:

> @@ -91,6 +113,24 @@ void vpmu_lvtpc_update(uint32_t val)
>          apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
>  }
>  
> +static void vpmu_send_interrupt(struct vcpu *v)
> +{
> +    struct vlapic *vlapic;
> +    u32 vlapic_lvtpc;
> +
> +    ASSERT( is_hvm_vcpu(v) );
> +
> +    vlapic = vcpu_vlapic(v);
> +    if ( !is_vlapic_lvtpc_enabled(vlapic) )
> +        return;
> +
> +    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
> +    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
> +        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
> +    else
> +        v->nmi_pending = 1;

Is APIC_MODE_NMI guaranteed to be the only alternative to
APIC_MODE_FIXED here (even for a buggy guest)? I don't recall
having seen code preventing other modes to be set, but even if
such code exists, an ASSERT() here seems quite desirable to me
(perhaps after re-structuring this to a switch() this could also be
a debug log message).

> @@ -232,8 +273,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>                  if ( sampled->arch.flags & TF_kernel_mode )
>                      r->cs &= ~3;
>              }
> -            else
> +            else if ( !(vpmu_interrupt_type & APIC_DM_NMI) )

Even if right now only APIC_DM_FIXED and APIC_DM_NMI are
possible, this is a latent bug: APIC_DM_NMI is not by itself a mask
(also elsewhere).

> --- a/xen/include/xen/softirq.h
> +++ b/xen/include/xen/softirq.h
> @@ -8,6 +8,7 @@ enum {
>      NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
>      RCU_SOFTIRQ,
>      TASKLET_SOFTIRQ,
> +    PMU_SOFTIRQ,
>      NR_COMMON_SOFTIRQS
>  };

Shouldn't this be an arch specific softirq?

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 20/20] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
@ 2014-09-30  8:40   ` Jan Beulich
  0 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-09-30  8:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
> Since PMU is now not HVM specific we can move VPMU-related files up from
> arch/x86/hvm/ directory.
> 
> Specifically:
>     arch/x86/hvm/vpmu.c -> arch/x86/vpmu.c
>     arch/x86/hvm/svm/vpmu.c -> arch/x86/vpmu_amd.c
>     arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/vpmu_intel.c
>     include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/arch/x86/Makefile                               | 1 +
>  xen/arch/x86/hvm/Makefile                           | 1 -
>  xen/arch/x86/hvm/svm/Makefile                       | 1 -
>  xen/arch/x86/hvm/vlapic.c                           | 2 +-
>  xen/arch/x86/hvm/vmx/Makefile                       | 1 -
>  xen/arch/x86/oprofile/op_model_ppro.c               | 2 +-
>  xen/arch/x86/traps.c                                | 2 +-
>  xen/arch/x86/{hvm => }/vpmu.c                       | 2 +-
>  xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}         | 2 +-
>  xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} | 2 +-

I wonder whether those wouldn't better go into xen/arch/x86/cpu/
(along the lines of where Linux'es perf code sits).

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-30  8:11   ` Jan Beulich
@ 2014-09-30 15:07     ` Boris Ostrovsky
  2014-09-30 15:44       ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-30 15:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra


On 09/30/2014 04:11 AM, Jan Beulich wrote:
>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -80,44 +80,191 @@ static void __init parse_vpmu_param(char *s)
>>   
>>   void vpmu_lvtpc_update(uint32_t val)
>>   {
>> -    struct vpmu_struct *vpmu = vcpu_vpmu(current);
>> +    struct vcpu *curr = current;
>> +    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
>>   
>>       vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
>> -    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
>> +
>> +    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending
>> */
>> +    if ( is_hvm_domain(curr->domain) ||
> Please don't open-code is_hvm_vcpu() (more instances elsewhere).

Why is this open-coded?

>
>> +         !(vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )
> I think readability would benefit if you resolved the !(&&) to !||!
> (making it the proper inverse of what you do in vpmu_do_wrmsr()
> and vpmu_do_rdmsr()).
>
>> +static struct vcpu *choose_hwdom_vcpu(void)
>> +{
>> +    struct vcpu *v;
>> +    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
>> +
>> +    if ( hardware_domain->vcpu == NULL )
>> +        return NULL;
>> +
>> +    v = hardware_domain->vcpu[idx];
>> +
>> +    /*
>> +     * If index is not populated search downwards the vcpu array until
>> +     * a valid vcpu can be found
>> +     */
>> +    while ( !v && idx-- )
>> +        v = hardware_domain->vcpu[idx];
> Each time I get here I wonder what case this is good for.

I thought we can have a case when first hardware_domain->vcpu[idx] is 
NULL so we walk the array down until we find the first non-NULL vcpu. 
Hot unplug, for example (we may be calling this from NMI context).

>
>>   int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>   {
>> -    struct vcpu *v = current;
>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>> +    struct vcpu *sampled = current, *sampling;
>> +    struct vpmu_struct *vpmu;
>> +
>> +    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
>> +    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
>> +    {
>> +        sampling = choose_hwdom_vcpu();
>> +        if ( !sampling )
>> +            return 0;
>> +    }
>> +    else
>> +        sampling = sampled;
>> +
>> +    vpmu = vcpu_vpmu(sampling);
>> +    if ( !is_hvm_domain(sampling->domain) )
>> +    {
>> +        /* PV(H) guest */
>> +        const struct cpu_user_regs *cur_regs;
>> +
>> +        if ( !vpmu->xenpmu_data )
>> +            return 0;
>> +
>> +        if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED )
>> +            return 1;
>> +
>> +        if ( is_pvh_domain(sampled->domain) &&
> Here and below - is this really the right condition? I.e. is the
> opposite case (doing nothing here, but the one further down
> having an else) really meant to cover both HVM and PV? The outer
> !is_hvm_() doesn't count here as that acts on sampling, not
> sampled.

This is test for an error in do_interrupt() --- if it reported a failure 
then there is no reason to proceed further.

>
>> +             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>> +            return 0;
>> +
>>
>> +
>> +            /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
>> +            if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>> +                 (!is_hardware_domain(sampled->domain) &&
>> +                  !is_idle_vcpu(sampled)) )
>> +                cur_regs = guest_cpu_user_regs();
>> +            else
>> +                cur_regs = regs;
>> +
>> +            r->rip = cur_regs->rip;
>> +            r->rsp = cur_regs->rsp;
>> +
>> +            if ( !is_pvh_domain(sampled->domain) )
> (This is the other instance.)

This actually should be !has_hvm_container_domain(sampled->domain).

>
>> +            {
>> +                r->cs = cur_regs->cs;
>> +                if ( sampled->arch.flags & TF_kernel_mode )
>> +                    r->cs &= ~3;
> And once again I wonder how the consumer of this data is to tell
> apart guest kernel and hypervisor addresses.

Based on the RIP --- perf, for example, searches through various symbol 
tables.

I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF 
for guest and DOMID_XEN for the hypervisor.


>
>> +            }
>> +            else
>> +            {
>> +                struct segment_register seg_cs;
>> +
>> +                hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
>> +                r->cs = seg_cs.sel;
>> +            }
>> +        }
>> +
>> +        vpmu->xenpmu_data->domain_id = DOMID_SELF;
>> +        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
>> +        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
>> +
>> +        vpmu->xenpmu_data->pmu_flags |= PMU_CACHED;
>> +        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
>> +        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
>> +
>> +        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
>> +
>> +        return 1;
>> +    }
>>   

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode
  2014-09-30  8:18   ` Jan Beulich
@ 2014-09-30 15:16     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-30 15:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra


On 09/30/2014 04:18 AM, Jan Beulich wrote:
>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -2579,6 +2579,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>>           case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
>>                   if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
>>                   {
>> +                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
>> +                         !is_hardware_domain(v->domain) )
>> +                        break;
>> +
>>                       if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
>>                           goto fail;
>>                   }
>> @@ -2701,6 +2705,14 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>>           case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
>>                   if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
>>                   {
>> +                    if ( (vpmu_mode & XENPMU_MODE_ALL) &&
>> +                         !is_hardware_domain(v->domain) )
>> +                    {
>> +                        /* Don't leak PMU MSRs to unprivileged domains */
>> +                        regs->eax = regs->edx = 0;
>> +                        break;
>> +                    }
>> +
>>                       if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
>>                           goto fail;
>>   
> Is ignoring writes and returning zeroes for reads really reasonable in
> this case? I.e. is the guest validly being told that there is a (v)PMU?
> Because if it's not, it has no business accessing these MSRs and
> hence should probably get a #GP instead.

VPMU mode can be changed to XENPMU_MODE_ALL at any time so a guest that 
started with fully enabled PMU (e.g. when mode was XENPMU_MODE_SELF) may 
continue accessing the MSRs. I don't think it should suddenly start 
getting #GPs.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-30 15:07     ` Boris Ostrovsky
@ 2014-09-30 15:44       ` Jan Beulich
  2014-09-30 16:37         ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-09-30 15:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 30.09.14 at 17:07, <boris.ostrovsky@oracle.com> wrote:

> On 09/30/2014 04:11 AM, Jan Beulich wrote:
>>>>> On 25.09.14 at 21:28, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/arch/x86/hvm/vpmu.c
>>> +++ b/xen/arch/x86/hvm/vpmu.c
>>> @@ -80,44 +80,191 @@ static void __init parse_vpmu_param(char *s)
>>>   
>>>   void vpmu_lvtpc_update(uint32_t val)
>>>   {
>>> -    struct vpmu_struct *vpmu = vcpu_vpmu(current);
>>> +    struct vcpu *curr = current;
>>> +    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
>>>   
>>>       vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
>>> -    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
>>> +
>>> +    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending
>>> */
>>> +    if ( is_hvm_domain(curr->domain) ||
>> Please don't open-code is_hvm_vcpu() (more instances elsewhere).
> 
> Why is this open-coded?

The above should really be is_hvm_vcpu(curr).

>>> +         !(vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )
>> I think readability would benefit if you resolved the !(&&) to !||!
>> (making it the proper inverse of what you do in vpmu_do_wrmsr()
>> and vpmu_do_rdmsr()).
>>
>>> +static struct vcpu *choose_hwdom_vcpu(void)
>>> +{
>>> +    struct vcpu *v;
>>> +    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
>>> +
>>> +    if ( hardware_domain->vcpu == NULL )
>>> +        return NULL;
>>> +
>>> +    v = hardware_domain->vcpu[idx];
>>> +
>>> +    /*
>>> +     * If index is not populated search downwards the vcpu array until
>>> +     * a valid vcpu can be found
>>> +     */
>>> +    while ( !v && idx-- )
>>> +        v = hardware_domain->vcpu[idx];
>> Each time I get here I wonder what case this is good for.
> 
> I thought we can have a case when first hardware_domain->vcpu[idx] is 
> NULL so we walk the array down until we find the first non-NULL vcpu. 
> Hot unplug, for example (we may be calling this from NMI context).

Hot unplug of a vCPU is a guest thing - this doesn't destroy the
vCPU in the hypervisor.

>>>   int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>>   {
>>> -    struct vcpu *v = current;
>>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>> +    struct vcpu *sampled = current, *sampling;
>>> +    struct vpmu_struct *vpmu;
>>> +
>>> +    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
>>> +    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
>>> +    {
>>> +        sampling = choose_hwdom_vcpu();
>>> +        if ( !sampling )
>>> +            return 0;
>>> +    }
>>> +    else
>>> +        sampling = sampled;
>>> +
>>> +    vpmu = vcpu_vpmu(sampling);
>>> +    if ( !is_hvm_domain(sampling->domain) )
>>> +    {
>>> +        /* PV(H) guest */
>>> +        const struct cpu_user_regs *cur_regs;
>>> +
>>> +        if ( !vpmu->xenpmu_data )
>>> +            return 0;
>>> +
>>> +        if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED )
>>> +            return 1;
>>> +
>>> +        if ( is_pvh_domain(sampled->domain) &&
>> Here and below - is this really the right condition? I.e. is the
>> opposite case (doing nothing here, but the one further down
>> having an else) really meant to cover both HVM and PV? The outer
>> !is_hvm_() doesn't count here as that acts on sampling, not
>> sampled.
> 
> This is test for an error in do_interrupt() --- if it reported a failure 
> then there is no reason to proceed further.

That's not the question. Why is this done only for PVH?

>>> +            {
>>> +                r->cs = cur_regs->cs;
>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>> +                    r->cs &= ~3;
>> And once again I wonder how the consumer of this data is to tell
>> apart guest kernel and hypervisor addresses.
> 
> Based on the RIP --- perf, for example, searches through various symbol 
> tables.

That doesn't help when profiling HVM/PVH guests - addresses are
ambiguous in that case.

> I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF 
> for guest and DOMID_XEN for the hypervisor.

That's an option, but I'm really having reservations against simulating
ring-0 execution in PV guests here. It would certainly be better if we
could report reality here, but I can see reservations on the consumer
(perf) side against us doing so.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-30 15:44       ` Jan Beulich
@ 2014-09-30 16:37         ` Boris Ostrovsky
  2014-10-01  6:49           ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-09-30 16:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra


On 09/30/2014 11:44 AM, Jan Beulich wrote:
>
>>>> +static struct vcpu *choose_hwdom_vcpu(void)
>>>> +{
>>>> +    struct vcpu *v;
>>>> +    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
>>>> +
>>>> +    if ( hardware_domain->vcpu == NULL )
>>>> +        return NULL;
>>>> +
>>>> +    v = hardware_domain->vcpu[idx];
>>>> +
>>>> +    /*
>>>> +     * If index is not populated search downwards the vcpu array until
>>>> +     * a valid vcpu can be found
>>>> +     */
>>>> +    while ( !v && idx-- )
>>>> +        v = hardware_domain->vcpu[idx];
>>> Each time I get here I wonder what case this is good for.
>> I thought we can have a case when first hardware_domain->vcpu[idx] is
>> NULL so we walk the array down until we find the first non-NULL vcpu.
>> Hot unplug, for example (we may be calling this from NMI context).
> Hot unplug of a vCPU is a guest thing - this doesn't destroy the
> vCPU in the hypervisor.

OK, I don't need this loop then.

>
>>>>    int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>>>    {
>>>> -    struct vcpu *v = current;
>>>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>>> +    struct vcpu *sampled = current, *sampling;
>>>> +    struct vpmu_struct *vpmu;
>>>> +
>>>> +    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
>>>> +    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
>>>> +    {
>>>> +        sampling = choose_hwdom_vcpu();
>>>> +        if ( !sampling )
>>>> +            return 0;
>>>> +    }
>>>> +    else
>>>> +        sampling = sampled;
>>>> +
>>>> +    vpmu = vcpu_vpmu(sampling);
>>>> +    if ( !is_hvm_domain(sampling->domain) )
>>>> +    {
>>>> +        /* PV(H) guest */
>>>> +        const struct cpu_user_regs *cur_regs;
>>>> +
>>>> +        if ( !vpmu->xenpmu_data )
>>>> +            return 0;
>>>> +
>>>> +        if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED )
>>>> +            return 1;
>>>> +
>>>> +        if ( is_pvh_domain(sampled->domain) &&
>>> Here and below - is this really the right condition? I.e. is the
>>> opposite case (doing nothing here, but the one further down
>>> having an else) really meant to cover both HVM and PV? The outer
>>> !is_hvm_() doesn't count here as that acts on sampling, not
>>> sampled.
>> This is test for an error in do_interrupt() --- if it reported a failure
>> then there is no reason to proceed further.
> That's not the question. Why is this done only for PVH?

This should be sampling, i.e. the guest who is managing the HW PMU MSR. 
Not sampled.

>
>>>> +            {
>>>> +                r->cs = cur_regs->cs;
>>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>>> +                    r->cs &= ~3;
>>> And once again I wonder how the consumer of this data is to tell
>>> apart guest kernel and hypervisor addresses.
>> Based on the RIP --- perf, for example, searches through various symbol
>> tables.
> That doesn't help when profiling HVM/PVH guests - addresses are
> ambiguous in that case.

Hypervisor traces are only sent to dom0, which is currently PV only. The 
key here, of course, is the word 'currently'.

>
>> I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF
>> for guest and DOMID_XEN for the hypervisor.
> That's an option, but I'm really having reservations against simulating
> ring-0 execution in PV guests here. It would certainly be better if we
> could report reality here, but I can see reservations on the consumer
> (perf) side against us doing so.

Yes, perf will probably not like it --- as I mentioned in an earlier 
message, it calls user_mode(regs) which is essentially !!(regs->cs & 3).

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
  2014-09-26 22:16   ` Daniel De Graaf
  2014-09-29 15:25   ` Jan Beulich
@ 2014-10-01  0:16   ` Tian, Kevin
  2 siblings, 0 replies; 92+ messages in thread
From: Tian, Kevin @ 2014-10-01  0:16 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: Thursday, September 25, 2014 12:29 PM
> 
> Code for initializing/tearing down PMU for PV guests
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  tools/flask/policy/policy/modules/xen/xen.te |  4 ++
>  xen/arch/x86/hvm/hvm.c                       |  3 +-
>  xen/arch/x86/hvm/svm/svm.c                   |  4 +-
>  xen/arch/x86/hvm/svm/vpmu.c                  | 43 +++++++++-----
>  xen/arch/x86/hvm/vmx/vmx.c                   |  4 +-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c            | 83
> ++++++++++++++++++++--------
>  xen/arch/x86/hvm/vpmu.c                      | 81
> +++++++++++++++++++++++++++
>  xen/common/event_channel.c                   |  1 +
>  xen/include/asm-x86/hvm/vpmu.h               |  1 +
>  xen/include/public/pmu.h                     |  2 +
>  xen/include/public/xen.h                     |  1 +
>  xen/include/xsm/dummy.h                      |  3 +
>  xen/xsm/flask/hooks.c                        |  4 ++
>  xen/xsm/flask/policy/access_vectors          |  2 +
>  14 files changed, 195 insertions(+), 41 deletions(-)
> 
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te
> b/tools/flask/policy/policy/modules/xen/xen.te
> index fb761cd..6744c36 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -116,6 +116,10 @@ domain_comms(dom0_t, dom0_t)
>  # Allow all domains to use (unprivileged parts of) the tmem hypercall
>  allow domain_type xen_t:xen tmem_op;
> 
> +# Allow all domains to use PMU (but not to change its settings --- that's what
> +# pmu_ctrl is for)
> +allow domain_type xen_t:xen2 pmu_use;
> +
> 
> ################################################################
> ###############
>  #
>  # Domain creation
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index bb45593..ec4a021 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4832,7 +4832,8 @@ static hvm_hypercall_t *const
> pvh_hypercall64_table[NR_hypercalls] = {
>      [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t
> *)hvm_physdev_op,
>      HYPERCALL(hvm_op),
>      HYPERCALL(sysctl),
> -    HYPERCALL(domctl)
> +    HYPERCALL(domctl),
> +    HYPERCALL(xenpmu_op)
>  };
> 
>  int hvm_do_hypercall(struct cpu_user_regs *regs)
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 5d404ce..319e5da 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1157,7 +1157,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
>          return rc;
>      }
> 
> -    vpmu_initialise(v);
> +    /* PVH's VPMU is initialized via hypercall */
> +    if ( is_hvm_domain(v->domain) )
> +        vpmu_initialise(v);
> 
>      svm_guest_osvw_init(v);
> 
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index 37d8228..be3ab27 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -362,6 +362,7 @@ static int amd_vpmu_initialise(struct vcpu *v)
>      struct xen_pmu_amd_ctxt *ctxt;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
> +    unsigned int regs_size;
> 
>      if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return 0;
> @@ -389,14 +390,26 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
> 
> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
> -    if ( !ctxt )
> +    regs_size = 2 * sizeof(uint64_t) * AMD_MAX_COUNTERS;
> +    if ( is_hvm_domain(v->domain) )
>      {
> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
> -            v->vcpu_id, v->domain->domain_id);
> -        return -ENOMEM;
> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) + regs_size);
> +        if ( !ctxt )
> +        {
> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> +                "PMU feature is unavailable\n");
> +            return -ENOMEM;
> +        }
> +    }
> +    else
> +    {
> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
> +        {
> +            gdprintk(XENLOG_WARNING,
> +                    "Register bank does not fit into VPMU shared
> page\n");
> +            return -ENOSPC;
> +        }
> +        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
>      }
> 
>      ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
> @@ -415,17 +428,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
> 
> -    if ( has_hvm_container_domain(v->domain) &&
> is_msr_bitmap_on(vpmu) )
> -        amd_vpmu_unset_msr_bitmap(v);
> +    if ( has_hvm_container_domain(v->domain) )
> +    {
> +        if ( is_msr_bitmap_on(vpmu) )
> +            amd_vpmu_unset_msr_bitmap(v);
> 
> -    xfree(vpmu->context);
> -    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> +        if ( is_hvm_domain(v->domain) )
> +            xfree(vpmu->context);
> 
> -    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
> -    {
> -        vpmu_reset(vpmu, VPMU_RUNNING);
>          release_pmu_ownship(PMU_OWNER_HVM);
>      }
> +
> +    vpmu->context = NULL;
> +    vpmu_clear(vpmu);
>  }
> 
>  /* VPMU part of the 'q' keyhandler */
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 84119ed..bebe879 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -116,7 +116,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>          return rc;
>      }
> 
> -    vpmu_initialise(v);
> +    /* PVH's VPMU is initialized via hypercall */
> +    if ( is_hvm_domain(v->domain) )
> +        vpmu_initialise(v);
> 
>      vmx_install_vlapic_mapping(v);
> 
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index c0a45cd..5c0f99a 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -356,25 +356,45 @@ static int core2_vpmu_alloc_resource(struct vcpu
> *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
>      uint64_t *p = NULL;
> +    unsigned int regs_size;
> 
> -    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
> -        return 0;
> -
> -    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> -    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +    p = xzalloc_bytes(sizeof(uint64_t));
> +    if ( !p )
>          goto out_err;
> 
> -    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> -        goto out_err;
> -    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> -
> -    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
> -                                   sizeof(uint64_t) * fixed_pmc_cnt +
> -                                   sizeof(struct xen_pmu_cntr_pair) *
> -                                   arch_pmc_cnt);
> -    p = xzalloc(uint64_t);
> -    if ( !core2_vpmu_cxt || !p )
> -        goto out_err;
> +    if ( has_hvm_container_domain(v->domain) )
> +    {
> +        if ( is_hvm_domain(v->domain)
> && !acquire_pmu_ownership(PMU_OWNER_HVM) )
> +            goto out_err;
> +
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +            goto out_err_hvm;
> +        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> +            goto out_err_hvm;
> +        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +    }
> +
> +    regs_size = sizeof(uint64_t) * fixed_pmc_cnt +
> +                sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt;
> +    if ( is_hvm_domain(v->domain) )
> +    {
> +        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt)
> +
> +                                       regs_size);
> +        if ( !core2_vpmu_cxt )
> +            goto out_err_hvm;
> +    }
> +    else
> +    {
> +        if ( sizeof(struct xen_pmu_data) + regs_size > PAGE_SIZE )
> +        {
> +            printk(XENLOG_WARNING
> +                   "Register bank does not fit into VPMU share page\n");
> +            goto out_err_hvm;
> +        }
> +
> +        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
> +    }
> 
>      core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
>      core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
> @@ -387,10 +407,12 @@ static int core2_vpmu_alloc_resource(struct vcpu
> *v)
> 
>      return 1;
> 
> -out_err:
> -    release_pmu_ownship(PMU_OWNER_HVM);
> -
> + out_err_hvm:
>      xfree(core2_vpmu_cxt);
> +    if ( is_hvm_domain(v->domain) )
> +        release_pmu_ownship(PMU_OWNER_HVM);
> +
> + out_err:
>      xfree(p);
> 
>      printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
> @@ -756,6 +778,11 @@ static int core2_vpmu_initialise(struct vcpu *v)
>      arch_pmc_cnt = core2_get_arch_pmc_count();
>      fixed_pmc_cnt = core2_get_fixed_pmc_count();
>      check_pmc_quirk();
> +
> +    /* PV domains can allocate resources immediately */
> +    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
> +        return -EIO;
> +
>      return 0;
>  }
> 
> @@ -766,12 +793,20 @@ static void core2_vpmu_destroy(struct vcpu *v)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
> 
> -    xfree(vpmu->context);
> +    if ( has_hvm_container_domain(v->domain) )
> +    {
> +        if ( cpu_has_vmx_msr_bitmap )
> +
> core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
> +
> +        if ( is_hvm_domain(v->domain) )
> +            xfree(vpmu->context);
> +
> +        release_pmu_ownship(PMU_OWNER_HVM);
> +    }
> +
>      xfree(vpmu->priv_context);
> -    if ( has_hvm_container_domain(v->domain) &&
> cpu_has_vmx_msr_bitmap )
> -        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
> -    release_pmu_ownship(PMU_OWNER_HVM);
> -    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> +    vpmu->context = NULL;
> +    vpmu_clear(vpmu);
>  }
> 
>  struct arch_vpmu_ops core2_vpmu_ops = {
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 5fcee0e..dde3367 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -26,6 +26,7 @@
>  #include <asm/regs.h>
>  #include <asm/types.h>
>  #include <asm/msr.h>
> +#include <asm/p2m.h>
>  #include <asm/hvm/support.h>
>  #include <asm/hvm/vmx/vmx.h>
>  #include <asm/hvm/vmx/vmcs.h>
> @@ -256,6 +257,7 @@ void vpmu_initialise(struct vcpu *v)
>          vpmu_destroy(v);
>      vpmu_clear(vpmu);
>      vpmu->context = NULL;
> +    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
> 
>      switch ( vendor )
>      {
> @@ -282,7 +284,74 @@ void vpmu_destroy(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> 
>      if ( vpmu->arch_vpmu_ops &&
> vpmu->arch_vpmu_ops->arch_vpmu_destroy )
> +    {
> +        /* Unload VPMU first. This will stop counters */
> +        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
> +                         vpmu_save_force, v, 1);
> +
>          vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> +    }
> +}
> +
> +static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    struct page_info *page;
> +    uint64_t gfn = params->val;
> +
> +    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> +         (d->vcpu[params->vcpu] == NULL) )
> +        return -EINVAL;
> +
> +    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
> +    if ( !page )
> +        return -EINVAL;
> +
> +    if ( !get_page_type(page, PGT_writable_page) )
> +    {
> +        put_page(page);
> +        return -EINVAL;
> +    }
> +
> +    v = d->vcpu[params->vcpu];
> +    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
> +    if ( !v->arch.vpmu.xenpmu_data )
> +    {
> +        put_page_and_type(page);
> +        return -EINVAL;
> +    }
> +
> +    vpmu_initialise(v);
> +
> +    return 0;
> +}
> +
> +static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
> +{
> +    struct vcpu *v;
> +    uint64_t mfn;
> +
> +    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
> +         (d->vcpu[params->vcpu] == NULL) )
> +        return;
> +
> +    v = d->vcpu[params->vcpu];
> +    if ( v != current )
> +        vcpu_pause(v);
> +
> +    if ( v->arch.vpmu.xenpmu_data )
> +    {
> +        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
> +        if ( mfn_valid(mfn) )
> +        {
> +            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
> +            put_page_and_type(mfn_to_page(mfn));
> +        }
> +    }
> +    vpmu_destroy(v);
> +
> +    if ( v != current )
> +        vcpu_unpause(v);
>  }
> 
>  /* Dump some vpmu informations on console. Used in keyhandler
> dump_domains(). */
> @@ -460,6 +529,18 @@ long do_xenpmu_op(int op,
> XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>              return -EFAULT;
>          break;
> 
> +    case XENPMU_init:
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +        ret = pvpmu_init(current->domain, &pmu_params);
> +        break;
> +
> +    case XENPMU_finish:
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +        pvpmu_finish(current->domain, &pmu_params);
> +        break;
> +
>      default:
>          ret = -EINVAL;
>      }
> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
> index 7d6de54..a991b2d 100644
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -108,6 +108,7 @@ static int virq_is_global(uint32_t virq)
>      case VIRQ_TIMER:
>      case VIRQ_DEBUG:
>      case VIRQ_XENOPROF:
> +    case VIRQ_XENPMU:
>          rc = 0;
>          break;
>      case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
> diff --git a/xen/include/asm-x86/hvm/vpmu.h
> b/xen/include/asm-x86/hvm/vpmu.h
> index c612e1a..93f1fc2 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -62,6 +62,7 @@ struct vpmu_struct {
>      void *context;      /* May be shared with PV guest */
>      void *priv_context; /* hypervisor-only */
>      struct arch_vpmu_ops *arch_vpmu_ops;
> +    xen_pmu_data_t *xenpmu_data;
>  };
> 
>  /* VPMU states */
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index c2293be..b8c5682 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -25,6 +25,8 @@
>  #define XENPMU_mode_set        1
>  #define XENPMU_feature_get     2
>  #define XENPMU_feature_set     3
> +#define XENPMU_init            4
> +#define XENPMU_finish          5
>  /* ` } */
> 
>  /* Parameters structure for HYPERVISOR_xenpmu_op call */
> diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
> index 0766790..e4d0b79 100644
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured
> */
>  #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient
> */
>  #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory
> */
> +#define VIRQ_XENPMU     13 /* V.  PMC interrupt
> */
> 
>  /* Architecture-specific VIRQ definitions. */
>  #define VIRQ_ARCH_0    16
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index d423c1c..29dae2e 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -663,6 +663,9 @@ static XSM_INLINE int xsm_pmu_op
> (XSM_DEFAULT_ARG struct domain *d, int op)
>      case XENPMU_feature_set:
>      case XENPMU_feature_get:
>          return xsm_default_action(XSM_PRIV, d, current->domain);
> +    case XENPMU_init:
> +    case XENPMU_finish:
> +        return xsm_default_action(XSM_HOOK, d, current->domain);
>      default:
>          return -EPERM;
>      }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index b437a24..8bd4a3d 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1498,6 +1498,10 @@ static int flask_pmu_op (struct domain *d, int op)
>      case XENPMU_feature_get:
>          return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
>                              XEN2__PMU_CTRL, NULL);
> +    case XENPMU_init:
> +    case XENPMU_finish:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_XEN2,
> +                            XEN2__PMU_USE, NULL);
>      default:
>          return -EPERM;
>      }
> diff --git a/xen/xsm/flask/policy/access_vectors
> b/xen/xsm/flask/policy/access_vectors
> index 64c7378..36b69c6 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -83,6 +83,8 @@ class xen2
>      get_symbol
>  # PMU control
>      pmu_ctrl
> +# PMU use (anyone has access)
> +    pmu_use
>  }
> 
>  # Classes domain and domain2 consist of operations that a domain performs
> on
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
  2014-09-26 16:34   ` Konrad Rzeszutek Wilk
  2014-09-29 16:04   ` Jan Beulich
@ 2014-10-01  0:17   ` Tian, Kevin
  2 siblings, 0 replies; 92+ messages in thread
From: Tian, Kevin @ 2014-10-01  0:17 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: Thursday, September 25, 2014 12:29 PM
> 
> Intercept accesses to PMU MSRs and process them in VPMU module.
> 
> Dump VPMU state for all domains (HVM and PV) when requested.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/arch/x86/domain.c             |  3 +--
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 49
> ++++++++++++++++++++++++++++++++------
>  xen/arch/x86/hvm/vpmu.c           |  7 ++++++
>  xen/arch/x86/traps.c              | 50
> +++++++++++++++++++++++++++++++++++++--
>  xen/include/public/pmu.h          |  1 +
>  5 files changed, 99 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 57b3c80..0388913 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2030,8 +2030,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
>  {
>      paging_dump_vcpu_info(v);
> 
> -    if ( is_hvm_vcpu(v) )
> -        vpmu_dump(v);
> +    vpmu_dump(v);
>  }
> 
>  void domain_cpuid(
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 1f21297..0f605bd 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -27,6 +27,7 @@
>  #include <asm/regs.h>
>  #include <asm/types.h>
>  #include <asm/apic.h>
> +#include <asm/traps.h>
>  #include <asm/msr.h>
>  #include <asm/msr-index.h>
>  #include <asm/hvm/support.h>
> @@ -294,12 +295,18 @@ static inline void __core2_vpmu_save(struct vcpu
> *v)
>          rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
>      for ( i = 0; i < arch_pmc_cnt; i++ )
>          rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
> +
> +    if ( !has_hvm_container_domain(v->domain) )
> +        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
> core2_vpmu_cxt->global_status);
>  }
> 
>  static int core2_vpmu_save(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> 
> +    if ( !has_hvm_container_domain(v->domain) )
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +
>      if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE |
> VPMU_CONTEXT_LOADED) )
>          return 0;
> 
> @@ -337,6 +344,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
>      wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL,
> core2_vpmu_cxt->fixed_ctrl);
>      wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
>      wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
> +
> +    if ( !has_hvm_container_domain(v->domain) )
> +    {
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
> core2_vpmu_cxt->global_ovf_ctrl);
> +        core2_vpmu_cxt->global_ovf_ctrl = 0;
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> core2_vpmu_cxt->global_ctrl);
> +    }
>  }
> 
>  static void core2_vpmu_load(struct vcpu *v)
> @@ -447,7 +461,6 @@ static int core2_vpmu_msr_common_check(u32
> msr_index, int *type, int *index)
>  static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
>                                 uint64_t supported)
>  {
> -    u64 global_ctrl;
>      int i, tmp;
>      int type = -1, index = -1;
>      struct vcpu *v = current;
> @@ -491,7 +504,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>      switch ( msr )
>      {
>      case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +        if ( msr_content & ~(0xC000000000000000 |
> +                             (((1ULL << fixed_pmc_cnt) - 1) << 32) |
> +                             ((1ULL << arch_pmc_cnt) - 1)) )
> +            return 1;
>          core2_vpmu_cxt->global_status &= ~msr_content;
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
>          return 0;
>      case MSR_CORE_PERF_GLOBAL_STATUS:
>          gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
> @@ -519,14 +537,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>          gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
>          return 0;
>      case MSR_CORE_PERF_GLOBAL_CTRL:
> -        global_ctrl = msr_content;
> +        core2_vpmu_cxt->global_ctrl = msr_content;
>          break;
>      case MSR_CORE_PERF_FIXED_CTR_CTRL:
>          if ( msr_content &
>               ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
>              return 1;
> 
> -        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> &global_ctrl);
> +        if ( has_hvm_container_domain(v->domain) )
> +            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> +                               &core2_vpmu_cxt->global_ctrl);
> +        else
> +            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> core2_vpmu_cxt->global_ctrl);
>          *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
>          if ( msr_content != 0 )
>          {
> @@ -551,7 +573,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>              if ( msr_content & (~((1ull << 32) - 1)) )
>                  return 1;
> 
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> &global_ctrl);
> +            if ( has_hvm_container_domain(v->domain) )
> +                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> +                                   &core2_vpmu_cxt->global_ctrl);
> +            else
> +                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
> core2_vpmu_cxt->global_ctrl);
> 
>              if ( msr_content & (1ULL << 22) )
>                  *enabled_cntrs |= 1ULL << tmp;
> @@ -565,9 +591,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
> uint64_t msr_content,
>      if ( type != MSR_TYPE_GLOBAL )
>          wrmsrl(msr, msr_content);
>      else
> -        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> +    {
> +        if ( has_hvm_container_domain(v->domain) )
> +            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> +        else
> +            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +    }
> 
> -    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
> +    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
> +         (core2_vpmu_cxt->ds_area != 0) )
>          vpmu_set(vpmu, VPMU_RUNNING);
>      else
>          vpmu_reset(vpmu, VPMU_RUNNING);
> @@ -594,7 +626,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr,
> uint64_t *msr_content)
>              *msr_content = core2_vpmu_cxt->global_status;
>              break;
>          case MSR_CORE_PERF_GLOBAL_CTRL:
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> +            if ( has_hvm_container_domain(v->domain) )
> +                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
> msr_content);
> +            else
> +                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
>              break;
>          default:
>              rdmsrl(msr, *msr_content);
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index dde3367..542e23e 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -501,6 +501,13 @@ long do_xenpmu_op(int op,
> XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> 
>          spin_unlock(&xenpmu_mode_lock);
>          break;
> +
> +    case XENPMU_lvtpc_set:
> +        if ( current->arch.vpmu.xenpmu_data == NULL )
> +            return -EINVAL;
> +
> vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
> +        ret = 0;
> +        break;
>      }
> 
>      case XENPMU_mode_get:
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 10fc2ca..cc70514 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -72,6 +72,7 @@
>  #include <asm/apic.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/hpet.h>
> +#include <asm/hvm/vpmu.h>
>  #include <public/arch-x86/cpuid.h>
>  #include <xsm/xsm.h>
> 
> @@ -896,8 +897,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>          __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
>          break;
> 
> +    case 0x0000000a: /* Architectural Performance Monitor Features (Intel)
> */
> +        break;
> +
>      case 0x00000005: /* MONITOR/MWAIT */
> -    case 0x0000000a: /* Architectural Performance Monitor Features */
>      case 0x0000000b: /* Extended Topology Enumeration */
>      case 0x8000000a: /* SVM revision and features */
>      case 0x8000001b: /* Instruction Based Sampling */
> @@ -913,6 +916,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
>      }
> 
>   out:
> +    /* VPMU may decide to modify some of the leaves */
> +    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
> +
>      regs->eax = a;
>      regs->ebx = b;
>      regs->ecx = c;
> @@ -1935,6 +1941,7 @@ static int emulate_privileged_op(struct
> cpu_user_regs *regs)
>      char io_emul_stub[32];
>      void (*io_emul)(struct cpu_user_regs *)
> __attribute__((__regparm__(1)));
>      uint64_t val, msr_content;
> +    bool_t vpmu_msr;
> 
>      if ( !read_descriptor(regs->cs, v, regs,
>                            &code_base, &code_limit, &ar,
> @@ -2425,6 +2432,7 @@ static int emulate_privileged_op(struct
> cpu_user_regs *regs)
>          uint32_t eax = regs->eax;
>          uint32_t edx = regs->edx;
>          msr_content = ((uint64_t)edx << 32) | eax;
> +        vpmu_msr = 0;
>          switch ( (u32)regs->ecx )
>          {
>          case MSR_FS_BASE:
> @@ -2561,7 +2569,22 @@ static int emulate_privileged_op(struct
> cpu_user_regs *regs)
>              if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
>                  wrmsrl(regs->_ecx, msr_content);
>              break;
> -
> +        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
> +        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
> +        case
> MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case
> MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +            {
> +                vpmu_msr = 1;
> +        case
> MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +                if ( vpmu_msr || (boot_cpu_data.x86_vendor ==
> X86_VENDOR_AMD) )
> +                {
> +                    if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
> +                        goto fail;
> +                }
> +                break;
> +            }
> +            /*FALLTHROUGH*/
>          default:
>              if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
>                  break;
> @@ -2593,6 +2616,7 @@ static int emulate_privileged_op(struct
> cpu_user_regs *regs)
>          break;
> 
>      case 0x32: /* RDMSR */
> +        vpmu_msr = 0;
>          switch ( (u32)regs->ecx )
>          {
>          case MSR_FS_BASE:
> @@ -2663,7 +2687,29 @@ static int emulate_privileged_op(struct
> cpu_user_regs *regs)
>                              [regs->_ecx -
> MSR_AMD64_DR1_ADDRESS_MASK + 1];
>              regs->edx = 0;
>              break;
> +        case MSR_IA32_PERF_CAPABILITIES:
> +            /* No extra capabilities are supported */
> +            regs->eax = regs->edx = 0;
> +            break;
> +        case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
> +        case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
> +        case
> MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case
> MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +            {
> +                vpmu_msr = 1;
> +        case
> MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +                if ( vpmu_msr || (boot_cpu_data.x86_vendor ==
> X86_VENDOR_AMD) )
> +                {
> +                    if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
> +                        goto fail;
> 
> +                    regs->eax = (uint32_t)msr_content;
> +                    regs->edx = (uint32_t)(msr_content >> 32);
> +                }
> +                break;
> +            }
> +            /*FALLTHROUGH*/
>          default:
>              if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
>              {
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index b8c5682..68a5fb8 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -27,6 +27,7 @@
>  #define XENPMU_feature_set     3
>  #define XENPMU_init            4
>  #define XENPMU_finish          5
> +#define XENPMU_lvtpc_set       6
>  /* ` } */
> 
>  /* Parameters structure for HYPERVISOR_xenpmu_op call */
> --
> 1.8.1.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support
  2014-09-30  8:37   ` Jan Beulich
@ 2014-10-01  0:18     ` Boris Ostrovsky
  2014-10-01  7:32       ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-10-01  0:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra


On 09/30/2014 04:37 AM, Jan Beulich wrote:
>   
> +static void vpmu_send_interrupt(struct vcpu *v)
> +{
> +    struct vlapic *vlapic;
> +    u32 vlapic_lvtpc;
> +
> +    ASSERT( is_hvm_vcpu(v) );
> +
> +    vlapic = vcpu_vlapic(v);
> +    if ( !is_vlapic_lvtpc_enabled(vlapic) )
> +        return;
> +
> +    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
> +    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
> +        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
> +    else
> +        v->nmi_pending = 1;
> Is APIC_MODE_NMI guaranteed to be the only alternative to
> APIC_MODE_FIXED here (even for a buggy guest)? I don't recall
> having seen code preventing other modes to be set, but even if
> such code exists, an ASSERT() here seems quite desirable to me
> (perhaps after re-structuring this to a switch() this could also be
> a debug log message).

This was a simple code move from original VPMU implementation 
(vpmu_do_interrupt()) to a separate routine, only to avoid duplication 
of this code since it may now be called from different places.

This is HVM guest's view of LVTPC, not Xen's, and has nothing to do with 
VPMU interrupt mode (i.e. vector vs NMI) if that's what you were 
thinking about.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                     ` (3 preceding siblings ...)
  2014-09-29 15:14   ` Jan Beulich
@ 2014-10-01  0:48   ` Tian, Kevin
  2014-10-01  0:56     ` Boris Ostrovsky
  4 siblings, 1 reply; 92+ messages in thread
From: Tian, Kevin @ 2014-10-01  0:48 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: Thursday, September 25, 2014 12:29 PM
> 
> Add runtime interface for setting PMU mode and flags. Three main modes are
> provided:
> * XENPMU_MODE_OFF:  PMU is not virtualized
> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU
> interrupts.
> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged
> guests, dom0
>   can profile itself and the hypervisor.
> 
> Note that PMU modes are different from what can be provided at Xen's boot
> line
> with 'vpmu' argument. An 'off' (or '0') value is equivalent to
> XENPMU_MODE_OFF.
> Any other value, on the other hand, will cause VPMU mode to be set to
> XENPMU_MODE_SELF during boot.
> 
> For feature flags only Intel's BTS is currently supported.
> 
> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

In general it's OK to me:
Acked-by: Kevin Tian <kevin.tian@intel.com>

Just a small comment:

> +static void vpmu_sched_checkin(unsigned long unused)
> +{
> +    atomic_inc(&vpmu_sched_counter);
> +}
> +
> +static int vpmu_force_context_switch(void)
> +{
> +    unsigned i, j, allbutself_num, mycpu;
> +    static s_time_t start, now;

any reason of such static variables when there's no continuation any more?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-10-01  0:48   ` Tian, Kevin
@ 2014-10-01  0:56     ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-10-01  0:56 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, suravee.suthikulpanit,
	Aravind.Gopalakrishnan, dietmar.hahn, dgdegra
  Cc: andrew.cooper3, xen-devel, keir, Nakajima, Jun, tim

On 09/30/2014 08:48 PM, Tian, Kevin wrote:
>> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
>> Sent: Thursday, September 25, 2014 12:29 PM
>>
>> Add runtime interface for setting PMU mode and flags. Three main modes are
>> provided:
>> * XENPMU_MODE_OFF:  PMU is not virtualized
>> * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU
>> interrupts.
>> * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged
>> guests, dom0
>>    can profile itself and the hypervisor.
>>
>> Note that PMU modes are different from what can be provided at Xen's boot
>> line
>> with 'vpmu' argument. An 'off' (or '0') value is equivalent to
>> XENPMU_MODE_OFF.
>> Any other value, on the other hand, will cause VPMU mode to be set to
>> XENPMU_MODE_SELF during boot.
>>
>> For feature flags only Intel's BTS is currently supported.
>>
>> Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> In general it's OK to me:
> Acked-by: Kevin Tian <kevin.tian@intel.com>
>
> Just a small comment:
>
>> +static void vpmu_sched_checkin(unsigned long unused)
>> +{
>> +    atomic_inc(&vpmu_sched_counter);
>> +}
>> +
>> +static int vpmu_force_context_switch(void)
>> +{
>> +    unsigned i, j, allbutself_num, mycpu;
>> +    static s_time_t start, now;
> any reason of such static variables when there's no continuation any more?

No reason at all. They are gone in v13.

Thanks.
-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-09-30 16:37         ` Boris Ostrovsky
@ 2014-10-01  6:49           ` Jan Beulich
  2014-10-01 12:53             ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-10-01  6:49 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 30.09.14 at 18:37, <boris.ostrovsky@oracle.com> wrote:
> On 09/30/2014 11:44 AM, Jan Beulich wrote:
>>>>> +            {
>>>>> +                r->cs = cur_regs->cs;
>>>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>>>> +                    r->cs &= ~3;
>>>> And once again I wonder how the consumer of this data is to tell
>>>> apart guest kernel and hypervisor addresses.
>>> Based on the RIP --- perf, for example, searches through various symbol
>>> tables.
>> That doesn't help when profiling HVM/PVH guests - addresses are
>> ambiguous in that case.
> 
> Hypervisor traces are only sent to dom0, which is currently PV only. The 
> key here, of course, is the word 'currently'.

So you completely ignore PVH Dom0? Experimental or not, I don't
think that's the way to go. Furthermore the check around this is
once again using sampled, not sampling.

Looking at the separation of hypervisor vs guest context to report
again

            /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
            if ( (vpmu_mode & XENPMU_MODE_SELF) ||
                 (!is_hardware_domain(sampled->domain) &&
                  !is_idle_vcpu(sampled)) )
                cur_regs = guest_cpu_user_regs();
            else
                cur_regs = regs;

I now additionally wonder why the condition here isn't just the SELF
check: If the interrupt happened while in the hypervisor, why would
you override this unconditionally to report a guest sample instead?
Shouldn't the profiling domain tell you what it wants in that case
(global vs guest local view)?

>>> I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF
>>> for guest and DOMID_XEN for the hypervisor.
>> That's an option, but I'm really having reservations against simulating
>> ring-0 execution in PV guests here. It would certainly be better if we
>> could report reality here, but I can see reservations on the consumer
>> (perf) side against us doing so.
> 
> Yes, perf will probably not like it --- as I mentioned in an earlier 
> message, it calls user_mode(regs) which is essentially !!(regs->cs & 3).

So you're crippling the Xen implementation in order to please one
of potentially many consumers... Along the lines of what I said
above, I think this ought to be controlled by the consumer of the
interface, defaulting to not doing any masking here.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support
  2014-10-01  0:18     ` Boris Ostrovsky
@ 2014-10-01  7:32       ` Jan Beulich
  0 siblings, 0 replies; 92+ messages in thread
From: Jan Beulich @ 2014-10-01  7:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 01.10.14 at 02:18, <boris.ostrovsky@oracle.com> wrote:

> On 09/30/2014 04:37 AM, Jan Beulich wrote:
>>   
>> +static void vpmu_send_interrupt(struct vcpu *v)
>> +{
>> +    struct vlapic *vlapic;
>> +    u32 vlapic_lvtpc;
>> +
>> +    ASSERT( is_hvm_vcpu(v) );
>> +
>> +    vlapic = vcpu_vlapic(v);
>> +    if ( !is_vlapic_lvtpc_enabled(vlapic) )
>> +        return;
>> +
>> +    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
>> +    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
>> +        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
>> +    else
>> +        v->nmi_pending = 1;
>> Is APIC_MODE_NMI guaranteed to be the only alternative to
>> APIC_MODE_FIXED here (even for a buggy guest)? I don't recall
>> having seen code preventing other modes to be set, but even if
>> such code exists, an ASSERT() here seems quite desirable to me
>> (perhaps after re-structuring this to a switch() this could also be
>> a debug log message).
> 
> This was a simple code move from original VPMU implementation 
> (vpmu_do_interrupt()) to a separate routine, only to avoid duplication 
> of this code since it may now be called from different places.
> 
> This is HVM guest's view of LVTPC, not Xen's, and has nothing to do with 
> VPMU interrupt mode (i.e. vector vs NMI) if that's what you were 
> thinking about.

Indeed I subsequently realized that you just moved broken code.
But I'd really have expected you to notice the brokenness and
either fix it along with moving it, or include a separate prereq
patch to address this (which I have now done:
http://lists.xenproject.org/archives/html/xen-devel/2014-10/msg00014.html).

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-01  6:49           ` Jan Beulich
@ 2014-10-01 12:53             ` Boris Ostrovsky
  2014-10-01 13:18               ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-10-01 12:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 10/01/2014 02:49 AM, Jan Beulich wrote:
>>>> On 30.09.14 at 18:37, <boris.ostrovsky@oracle.com> wrote:
>> On 09/30/2014 11:44 AM, Jan Beulich wrote:
>>>>>> +            {
>>>>>> +                r->cs = cur_regs->cs;
>>>>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>>>>> +                    r->cs &= ~3;
>>>>> And once again I wonder how the consumer of this data is to tell
>>>>> apart guest kernel and hypervisor addresses.
>>>> Based on the RIP --- perf, for example, searches through various symbol
>>>> tables.
>>> That doesn't help when profiling HVM/PVH guests - addresses are
>>> ambiguous in that case.
>> Hypervisor traces are only sent to dom0, which is currently PV only. The
>> key here, of course, is the word 'currently'.
> So you completely ignore PVH Dom0? Experimental or not, I don't
> think that's the way to go.

As I mentioned in an earlier reply, I will set domain_id in the reported 
structure to DOMID_XEN when we are reporting hypervisor sample.

> Furthermore the check around this is
> once again using sampled, not sampling.

Which check are you referring to?

>
> Looking at the separation of hypervisor vs guest context to report
> again
>
>              /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
>              if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>                   (!is_hardware_domain(sampled->domain) &&
>                    !is_idle_vcpu(sampled)) )
>                  cur_regs = guest_cpu_user_regs();
>              else
>                  cur_regs = regs;
>
> I now additionally wonder why the condition here isn't just the SELF
> check: If the interrupt happened while in the hypervisor, why would
> you override this unconditionally to report a guest sample instead?
> Shouldn't the profiling domain tell you what it wants in that case
> (global vs guest local view)?

The second part of the check (!is_hardware_domain(sampled->domain) && 
!is_idle_vcpu(sampled)) is to prevent sending hypervisor sample to a 
non-privileged guest. vpmu_mode may be, for example, XENPMU_MODE_HV but 
that only means that dom0 can get hypervisor samples.

Perhaps the comment is confusing in that it may imply that each domain 
has its own XENPMU_MODE. Which is not true --- vpmu_mode is a global. I 
should have said that "Non-privileged domains are *effectively* always 
in XENPMU_MODE_SELF mode".

>
>>>> I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF
>>>> for guest and DOMID_XEN for the hypervisor.
>>> That's an option, but I'm really having reservations against simulating
>>> ring-0 execution in PV guests here. It would certainly be better if we
>>> could report reality here, but I can see reservations on the consumer
>>> (perf) side against us doing so.
>> Yes, perf will probably not like it --- as I mentioned in an earlier
>> message, it calls user_mode(regs) which is essentially !!(regs->cs & 3).
> So you're crippling the Xen implementation in order to please one
> of potentially many consumers... Along the lines of what I said
> above, I think this ought to be controlled by the consumer of the
> interface, defaulting to not doing any masking here.

I can add a return value (flags, for example) to indicate whether we are 
in user or kernel mode. I don't want to provide another control 
interface for this.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-01 12:53             ` Boris Ostrovsky
@ 2014-10-01 13:18               ` Jan Beulich
  2014-10-01 14:08                 ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-10-01 13:18 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 01.10.14 at 14:53, <boris.ostrovsky@oracle.com> wrote:
> On 10/01/2014 02:49 AM, Jan Beulich wrote:
>>>>> On 30.09.14 at 18:37, <boris.ostrovsky@oracle.com> wrote:
>>> On 09/30/2014 11:44 AM, Jan Beulich wrote:
>>>>>>> +            {
>>>>>>> +                r->cs = cur_regs->cs;
>>>>>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>>>>>> +                    r->cs &= ~3;
>>>>>> And once again I wonder how the consumer of this data is to tell
>>>>>> apart guest kernel and hypervisor addresses.
>>>>> Based on the RIP --- perf, for example, searches through various symbol
>>>>> tables.
>>>> That doesn't help when profiling HVM/PVH guests - addresses are
>>>> ambiguous in that case.
>>> Hypervisor traces are only sent to dom0, which is currently PV only. The
>>> key here, of course, is the word 'currently'.
>> So you completely ignore PVH Dom0? Experimental or not, I don't
>> think that's the way to go.
> 
> As I mentioned in an earlier reply, I will set domain_id in the reported 
> structure to DOMID_XEN when we are reporting hypervisor sample.
> 
>> Furthermore the check around this is
>> once again using sampled, not sampling.
> 
> Which check are you referring to?

The if() right outside (above) the still visible patch context.

>> Looking at the separation of hypervisor vs guest context to report
>> again
>>
>>              /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
>>              if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>>                   (!is_hardware_domain(sampled->domain) &&
>>                    !is_idle_vcpu(sampled)) )
>>                  cur_regs = guest_cpu_user_regs();
>>              else
>>                  cur_regs = regs;
>>
>> I now additionally wonder why the condition here isn't just the SELF
>> check: If the interrupt happened while in the hypervisor, why would
>> you override this unconditionally to report a guest sample instead?
>> Shouldn't the profiling domain tell you what it wants in that case
>> (global vs guest local view)?
> 
> The second part of the check (!is_hardware_domain(sampled->domain) && 
> !is_idle_vcpu(sampled)) is to prevent sending hypervisor sample to a 
> non-privileged guest. vpmu_mode may be, for example, XENPMU_MODE_HV but 
> that only means that dom0 can get hypervisor samples.

Right, but that's not what the code above does: Instead of sending
the hypervisor sample to Dom0 it converts it to a guest mode one.

>>>>> I suppose I can set xenpmu_data->domain_id below to either DOMID_SELF
>>>>> for guest and DOMID_XEN for the hypervisor.
>>>> That's an option, but I'm really having reservations against simulating
>>>> ring-0 execution in PV guests here. It would certainly be better if we
>>>> could report reality here, but I can see reservations on the consumer
>>>> (perf) side against us doing so.
>>> Yes, perf will probably not like it --- as I mentioned in an earlier
>>> message, it calls user_mode(regs) which is essentially !!(regs->cs & 3).
>> So you're crippling the Xen implementation in order to please one
>> of potentially many consumers... Along the lines of what I said
>> above, I think this ought to be controlled by the consumer of the
>> interface, defaulting to not doing any masking here.
> 
> I can add a return value (flags, for example) to indicate whether we are 
> in user or kernel mode. I don't want to provide another control 
> interface for this.

That would be fine too, I think.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-01 13:18               ` Jan Beulich
@ 2014-10-01 14:08                 ` Boris Ostrovsky
  2014-10-01 14:26                   ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-10-01 14:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 10/01/2014 09:18 AM, Jan Beulich wrote:
>>>> On 01.10.14 at 14:53, <boris.ostrovsky@oracle.com> wrote:
>> On 10/01/2014 02:49 AM, Jan Beulich wrote:
>>>>>> On 30.09.14 at 18:37, <boris.ostrovsky@oracle.com> wrote:
>>>> On 09/30/2014 11:44 AM, Jan Beulich wrote:
>>>>>>>> +            {
>>>>>>>> +                r->cs = cur_regs->cs;
>>>>>>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>>>>>>> +                    r->cs &= ~3;
>>>>>>> And once again I wonder how the consumer of this data is to tell
>>>>>>> apart guest kernel and hypervisor addresses.
>>>>>> Based on the RIP --- perf, for example, searches through various symbol
>>>>>> tables.
>>>>> That doesn't help when profiling HVM/PVH guests - addresses are
>>>>> ambiguous in that case.
>>>> Hypervisor traces are only sent to dom0, which is currently PV only. The
>>>> key here, of course, is the word 'currently'.
>>> So you completely ignore PVH Dom0? Experimental or not, I don't
>>> think that's the way to go.
>> As I mentioned in an earlier reply, I will set domain_id in the reported
>> structure to DOMID_XEN when we are reporting hypervisor sample.
>>
>>> Furthermore the check around this is
>>> once again using sampled, not sampling.
>> Which check are you referring to?
> The if() right outside (above) the still visible patch context.

Why should it be 'sampling'? I am collecting registers from sampled vcpu 
so I need to look at that domain's flags to determine the mode, don't I?


>
>>> Looking at the separation of hypervisor vs guest context to report
>>> again
>>>
>>>               /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
>>>               if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>>>                    (!is_hardware_domain(sampled->domain) &&
>>>                     !is_idle_vcpu(sampled)) )
>>>                   cur_regs = guest_cpu_user_regs();
>>>               else
>>>                   cur_regs = regs;
>>>
>>> I now additionally wonder why the condition here isn't just the SELF
>>> check: If the interrupt happened while in the hypervisor, why would
>>> you override this unconditionally to report a guest sample instead?
>>> Shouldn't the profiling domain tell you what it wants in that case
>>> (global vs guest local view)?
>> The second part of the check (!is_hardware_domain(sampled->domain) &&
>> !is_idle_vcpu(sampled)) is to prevent sending hypervisor sample to a
>> non-privileged guest. vpmu_mode may be, for example, XENPMU_MODE_HV but
>> that only means that dom0 can get hypervisor samples.
> Right, but that's not what the code above does: Instead of sending
> the hypervisor sample to Dom0 it converts it to a guest mode one.

Oh, I see --- when we get interrupted while in a non-privileged guest's 
context (but in hypervisor) I send guest's registers, not Xen's.

I think just SELF check is not sufficient though, we need to make sure 
that we are not sending hypervisor sample to non-dom0. So
     if ( (vpmu_mode & XENPMU_MODE_SELF) || 
!is_hardware_domain(sampling->domain) )

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-01 14:08                 ` Boris Ostrovsky
@ 2014-10-01 14:26                   ` Jan Beulich
  2014-10-01 18:06                     ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-10-01 14:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 01.10.14 at 16:08, <boris.ostrovsky@oracle.com> wrote:
> On 10/01/2014 09:18 AM, Jan Beulich wrote:
>>>>> On 01.10.14 at 14:53, <boris.ostrovsky@oracle.com> wrote:
>>> On 10/01/2014 02:49 AM, Jan Beulich wrote:
>>>>>>> On 30.09.14 at 18:37, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 09/30/2014 11:44 AM, Jan Beulich wrote:
>>>>>>>>> +            {
>>>>>>>>> +                r->cs = cur_regs->cs;
>>>>>>>>> +                if ( sampled->arch.flags & TF_kernel_mode )
>>>>>>>>> +                    r->cs &= ~3;
>>>>>>>> And once again I wonder how the consumer of this data is to tell
>>>>>>>> apart guest kernel and hypervisor addresses.
>>>>>>> Based on the RIP --- perf, for example, searches through various symbol
>>>>>>> tables.
>>>>>> That doesn't help when profiling HVM/PVH guests - addresses are
>>>>>> ambiguous in that case.
>>>>> Hypervisor traces are only sent to dom0, which is currently PV only. The
>>>>> key here, of course, is the word 'currently'.
>>>> So you completely ignore PVH Dom0? Experimental or not, I don't
>>>> think that's the way to go.
>>> As I mentioned in an earlier reply, I will set domain_id in the reported
>>> structure to DOMID_XEN when we are reporting hypervisor sample.
>>>
>>>> Furthermore the check around this is
>>>> once again using sampled, not sampling.
>>> Which check are you referring to?
>> The if() right outside (above) the still visible patch context.
> 
> Why should it be 'sampling'? I am collecting registers from sampled vcpu 
> so I need to look at that domain's flags to determine the mode, don't I?

You're right - I don't know what I was thinking (or whether I
misplaced the question).

>>>> Looking at the separation of hypervisor vs guest context to report
>>>> again
>>>>
>>>>               /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
>>>>               if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>>>>                    (!is_hardware_domain(sampled->domain) &&
>>>>                     !is_idle_vcpu(sampled)) )
>>>>                   cur_regs = guest_cpu_user_regs();
>>>>               else
>>>>                   cur_regs = regs;
>>>>
>>>> I now additionally wonder why the condition here isn't just the SELF
>>>> check: If the interrupt happened while in the hypervisor, why would
>>>> you override this unconditionally to report a guest sample instead?
>>>> Shouldn't the profiling domain tell you what it wants in that case
>>>> (global vs guest local view)?
>>> The second part of the check (!is_hardware_domain(sampled->domain) &&
>>> !is_idle_vcpu(sampled)) is to prevent sending hypervisor sample to a
>>> non-privileged guest. vpmu_mode may be, for example, XENPMU_MODE_HV but
>>> that only means that dom0 can get hypervisor samples.
>> Right, but that's not what the code above does: Instead of sending
>> the hypervisor sample to Dom0 it converts it to a guest mode one.
> 
> Oh, I see --- when we get interrupted while in a non-privileged guest's 
> context (but in hypervisor) I send guest's registers, not Xen's.
> 
> I think just SELF check is not sufficient though, we need to make sure 
> that we are not sending hypervisor sample to non-dom0. So
>      if ( (vpmu_mode & XENPMU_MODE_SELF) || 
> !is_hardware_domain(sampling->domain) )

Actually I think instead the determination of sampling needs to
depend on the register context rather than solely on the current
domain's ID.

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-01 14:26                   ` Jan Beulich
@ 2014-10-01 18:06                     ` Boris Ostrovsky
  2014-10-02  6:56                       ` Jan Beulich
  0 siblings, 1 reply; 92+ messages in thread
From: Boris Ostrovsky @ 2014-10-01 18:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 10/01/2014 10:26 AM, Jan Beulich wrote:
>
>>>>> Looking at the separation of hypervisor vs guest context to report
>>>>> again
>>>>>
>>>>>                /* Non-privileged domains are always in XENPMU_MODE_SELF mode */
>>>>>                if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>>>>>                     (!is_hardware_domain(sampled->domain) &&
>>>>>                      !is_idle_vcpu(sampled)) )
>>>>>                    cur_regs = guest_cpu_user_regs();
>>>>>                else
>>>>>                    cur_regs = regs;
>>>>>
>>>>> I now additionally wonder why the condition here isn't just the SELF
>>>>> check: If the interrupt happened while in the hypervisor, why would
>>>>> you override this unconditionally to report a guest sample instead?
>>>>> Shouldn't the profiling domain tell you what it wants in that case
>>>>> (global vs guest local view)?
>>>> The second part of the check (!is_hardware_domain(sampled->domain) &&
>>>> !is_idle_vcpu(sampled)) is to prevent sending hypervisor sample to a
>>>> non-privileged guest. vpmu_mode may be, for example, XENPMU_MODE_HV but
>>>> that only means that dom0 can get hypervisor samples.
>>> Right, but that's not what the code above does: Instead of sending
>>> the hypervisor sample to Dom0 it converts it to a guest mode one.
>> Oh, I see --- when we get interrupted while in a non-privileged guest's
>> context (but in hypervisor) I send guest's registers, not Xen's.
>>
>> I think just SELF check is not sufficient though, we need to make sure
>> that we are not sending hypervisor sample to non-dom0. So
>>       if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>> !is_hardware_domain(sampling->domain) )
> Actually I think instead the determination of sampling needs to
> depend on the register context rather than solely on the current
> domain's ID.

Not sure I follow this --- we do need to take domainID into account to 
avoid sending non-dom0 a hypervisor sample.

Or are you saying that what to send depends on both RIP and domainID?, 
Something like

     if ( (vpmu_mode & XENPMU_MODE_SELF) )
                 cur_regs = guest_cpu_user_regs();
     else if ( (regs->rip >= XEN_VIRT_START) && (regs->rip < 
XEN_VIRT_END) && is_hardware_domain(sampling->domain))
                 cur_regs = regs;
     else
                 cur_regs = guest_cpu_user_regs();

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-01 18:06                     ` Boris Ostrovsky
@ 2014-10-02  6:56                       ` Jan Beulich
  2014-10-02 13:53                         ` Boris Ostrovsky
  0 siblings, 1 reply; 92+ messages in thread
From: Jan Beulich @ 2014-10-02  6:56 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

>>> On 01.10.14 at 20:06, <boris.ostrovsky@oracle.com> wrote:
> On 10/01/2014 10:26 AM, Jan Beulich wrote:
>>
>>>>>> Looking at the separation of hypervisor vs guest context to report
>>>>>> again
>>>>>>
>>>>>>                /* Non-privileged domains are always in XENPMU_MODE_SELF mode 
> */
>>>>>>                if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>>>>>>                     (!is_hardware_domain(sampled->domain) &&
>>>>>>                      !is_idle_vcpu(sampled)) )
>>>>>>                    cur_regs = guest_cpu_user_regs();
>>>>>>                else
>>>>>>                    cur_regs = regs;
>>>>>>
>>>>>> I now additionally wonder why the condition here isn't just the SELF
>>>>>> check: If the interrupt happened while in the hypervisor, why would
>>>>>> you override this unconditionally to report a guest sample instead?
>>>>>> Shouldn't the profiling domain tell you what it wants in that case
>>>>>> (global vs guest local view)?
>>>>> The second part of the check (!is_hardware_domain(sampled->domain) &&
>>>>> !is_idle_vcpu(sampled)) is to prevent sending hypervisor sample to a
>>>>> non-privileged guest. vpmu_mode may be, for example, XENPMU_MODE_HV but
>>>>> that only means that dom0 can get hypervisor samples.
>>>> Right, but that's not what the code above does: Instead of sending
>>>> the hypervisor sample to Dom0 it converts it to a guest mode one.
>>> Oh, I see --- when we get interrupted while in a non-privileged guest's
>>> context (but in hypervisor) I send guest's registers, not Xen's.
>>>
>>> I think just SELF check is not sufficient though, we need to make sure
>>> that we are not sending hypervisor sample to non-dom0. So
>>>       if ( (vpmu_mode & XENPMU_MODE_SELF) ||
>>> !is_hardware_domain(sampling->domain) )
>> Actually I think instead the determination of sampling needs to
>> depend on the register context rather than solely on the current
>> domain's ID.
> 
> Not sure I follow this --- we do need to take domainID into account to 
> avoid sending non-dom0 a hypervisor sample.
> 
> Or are you saying that what to send depends on both RIP and domainID?, 

Yes - that's what I'm trying to say. Or, as said above, make the
determination of "sampling" dependent on register state.

And in the end, considering that a model where there's both a
local and a global profiler active, one sample referring to
hypervisor context could easily result in two events: One
(with the hypervisor register state) to the global profiled, and
a second (with the surrounding guest register state) to the
guest one. But iiuc your current implementation doesn't allow
that (yet).

Jan

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-10-02  6:56                       ` Jan Beulich
@ 2014-10-02 13:53                         ` Boris Ostrovsky
  0 siblings, 0 replies; 92+ messages in thread
From: Boris Ostrovsky @ 2014-10-02 13:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	dietmar.hahn, xen-devel, Aravind.Gopalakrishnan, jun.nakajima,
	dgdegra

On 10/02/2014 02:56 AM, Jan Beulich wrote:
>
> And in the end, considering that a model where there's both a
> local and a global profiler active, one sample referring to
> hypervisor context could easily result in two events: One
> (with the hypervisor register state) to the global profiled, and
> a second (with the surrounding guest register state) to the
> guest one. But iiuc your current implementation doesn't allow
> that (yet).

Right, it doesn't.

And I don't think this will work since PMU registers need to be 
controlled by one domain only (either dom0 or the guest) for each 
profiling session. If we send the interrupt to both one of them will 
likely get confused about why it is getting the interrupt that it didn't 
request.

-boris

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2014-10-02 13:53 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-25 19:28 [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2014-09-26 14:58   ` Konrad Rzeszutek Wilk
2014-09-26 15:10     ` Jan Beulich
2014-09-26 16:49       ` Konrad Rzeszutek Wilk
2014-09-29  6:43         ` Jan Beulich
2014-09-29 13:29           ` Boris Ostrovsky
2014-09-29 13:47             ` Jan Beulich
2014-09-29 14:16               ` Boris Ostrovsky
2014-09-29 14:33                 ` Jan Beulich
2014-09-26 21:43   ` Daniel De Graaf
2014-09-26 22:12     ` Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
2014-09-26 14:49   ` Konrad Rzeszutek Wilk
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
2014-09-26 14:59   ` Konrad Rzeszutek Wilk
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 06/20] vmx: Merge MSR management routines Boris Ostrovsky
2014-09-26 20:48   ` Tian, Kevin
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2014-09-26 20:49   ` Tian, Kevin
2014-09-29 14:17   ` Jan Beulich
2014-09-29 14:30     ` Jan Beulich
2014-09-29 15:19       ` Boris Ostrovsky
2014-09-29 15:41         ` Jan Beulich
2014-09-29 15:48           ` Boris Ostrovsky
2014-09-29 14:57     ` Boris Ostrovsky
2014-09-29 15:40       ` Jan Beulich
2014-09-29 15:56         ` Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2014-09-26 21:04   ` Tian, Kevin
2014-09-26 21:24     ` Boris Ostrovsky
2014-09-26 22:00   ` Daniel De Graaf
2014-09-26 22:26     ` Boris Ostrovsky
2014-09-29 13:25   ` Dietmar Hahn
2014-09-29 13:56     ` Boris Ostrovsky
2014-09-29 14:03       ` Dietmar Hahn
2014-09-29 13:59     ` Jan Beulich
2014-09-29 14:05       ` Dietmar Hahn
2014-09-29 15:14   ` Jan Beulich
2014-09-29 15:34     ` Boris Ostrovsky
2014-10-01  0:48   ` Tian, Kevin
2014-10-01  0:56     ` Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
2014-09-26 22:16   ` Daniel De Graaf
2014-09-26 22:23     ` Boris Ostrovsky
2014-09-29 15:25   ` Jan Beulich
2014-09-29 15:41     ` Boris Ostrovsky
2014-09-29 15:42       ` Jan Beulich
2014-09-29 16:04         ` Boris Ostrovsky
2014-09-29 16:10           ` Jan Beulich
2014-10-01  0:16   ` Tian, Kevin
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 13/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2014-09-29 15:52   ` Jan Beulich
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 14/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 15/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
2014-09-26 16:34   ` Konrad Rzeszutek Wilk
2014-09-26 16:44     ` Boris Ostrovsky
2014-09-26 16:49       ` Konrad Rzeszutek Wilk
2014-09-29 16:04   ` Jan Beulich
2014-10-01  0:17   ` Tian, Kevin
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 16/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2014-09-26 22:09   ` Daniel De Graaf
2014-09-30  8:11   ` Jan Beulich
2014-09-30 15:07     ` Boris Ostrovsky
2014-09-30 15:44       ` Jan Beulich
2014-09-30 16:37         ` Boris Ostrovsky
2014-10-01  6:49           ` Jan Beulich
2014-10-01 12:53             ` Boris Ostrovsky
2014-10-01 13:18               ` Jan Beulich
2014-10-01 14:08                 ` Boris Ostrovsky
2014-10-01 14:26                   ` Jan Beulich
2014-10-01 18:06                     ` Boris Ostrovsky
2014-10-02  6:56                       ` Jan Beulich
2014-10-02 13:53                         ` Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 17/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
2014-09-30  8:13   ` Jan Beulich
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 18/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2014-09-30  8:18   ` Jan Beulich
2014-09-30 15:16     ` Boris Ostrovsky
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
2014-09-30  8:37   ` Jan Beulich
2014-10-01  0:18     ` Boris Ostrovsky
2014-10-01  7:32       ` Jan Beulich
2014-09-25 19:28 ` [PATCH v12 for-xen-4.5 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
2014-09-30  8:40   ` Jan Beulich
2014-09-26 17:03 ` [PATCH v12 for-xen-4.5 00/20] x86/PMU: Xen PMU PV(H) support Konrad Rzeszutek Wilk
2014-09-29 13:28 ` Dietmar Hahn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.