All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support
@ 2014-08-08 16:55 Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
                   ` (20 more replies)
  0 siblings, 21 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Here is the ninth version of PV(H) PMU patches.

Changes in v9:

* Restore VPMU context after context_saved() is called in
  context_switch(). This is needed because vpmu_load() may end up
  calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
  to be correctly set/cleared. (patch 18, dropped review acks)
* Added patch 2 to properly manage VPMU_CONTEXT_LOADED
* Addressed most of Jan's comments.
  o Keep track of time in vpmu_force_context_switch() to properly break
    out of a loop when using hypercall continuations
  o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
  o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
    make it more clear which vcpu we are using
  o Cleaned up vpmu_do_wrmsr()
  o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
    amd_vpmu_initialise(): throughout the code registers are declared as
    uint64_t and if we are to add a new type (e.g. reg_t) this should be
    done in a separate patch, unrelated to this series.
  o Various more minor cleanups and code style fixes
  
Changes in v8:

* Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params
* Added compat checks for vpmu structures
* Converted vpmu flag manipulation macros to inline routines
* Reimplemented vpmu_unload_all() to avoid long loops
* Reworked PMU fault generation and handling (new patch #12)
* Added checks for domain->vcpu[] non-NULLness
* Added more comments, renamed some routines and macros, code style cleanup


Changes in v7:

* When reading hypervisor symbols make the caller pass buffer length
  (as opposed to having this length be part of the API). Make the
  hypervisor buffer static, make xensyms_read() return zero-length
  string on end-of-symbols. Make 'type' field of xenpf_symdata a char,
  drop compat_pf_symdata definition.
* Spread PVH support across patches as opposed to lumping it into a
  separate patch
* Rename vpmu_is_set_all() to vpmu_are_all_set()
* Split VPMU cleanup patch in two
* Use memmove when copying VMX guest and host MSRs
* Make padding of xen_arch_pmu's context union a constand that does not
  depend on arch context size.
* Set interface version to 0.1
* Check pointer validity in pvpmu_init/destroy()
* Fixed crash in core2_vpmu_dump()
* Fixed crash in vmx_add_msr()
* Break handling of Intel and AMD MSRs in traps.c into separate cases
* Pass full CS selector to guests
* Add lock in pvpmu init code to prevent potential race


Changes in v6:

* Two new patches:
  o Merge VMX MSR add/remove routines in vmcs.c (patch 5)
  o Merge VPMU read/write MSR routines in vpmu.c (patch 14)
* Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled
  guest from overwriting sampled_vcpu written by de-scheduled VPCU.
* Keep track of enabled counters on Intel. This was removed in earlier patches and
  was a mistake. As result of this change struct vpmu will have a pointer to private
  context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer
  on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests
  earlier).
  Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed
  agan (core2_vpmu_do_wrmsr() routine, mostly)
* Replaced references to dom0 with hardware_domain (and is_control_domain with
  is_hardware_domain for consistency)
* Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE
* Reverted unnecessary changes in vpmu_initialise()'s switch statement
* Fixed comment in vpmu_do_interrupt


Changes in v5:

* Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()")
  as no longer needed
* Added patch number 2 that marks context as loaded before PMU registers are
  loaded. This prevents situation where a PMU interrupt may occur while context
  is still viewed as not loaded. (This is really a bug fix for exsiting VPMU
  code)
* Renamed xenpmu.h files to pmu.h
* More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and
  has_hvm_container_domain(). Also explicitly disabled support for PVH until
  patch 16 to make distinction between usage of the above macros more clear.
* Added support for disabling VPMU support during runtime.
* Disable VPMUs for non-privileged domains when switching to privileged
  profiling mode
* Added ARM stub for xen_arch_pmu_t
* Separated vpmu_mode from vpmu_features
* Moved CS register query to make sure we use appropriate query mechanism for
  various guest types.
* LVTPC is now set from value in shared area, not copied from dom0
* Various code and comments cleanup as suggested by Jan.

Changes in v4:

* Added support for PVH guests:
  o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10
  o more careful use of is_hvm_domain
  o Additional patch (16)
* Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling
* Fixed dom0's VCPU selection in privileged mode
* Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt.
  (don't want to expose compat_cpu_user_regs in a public header)
* Renamed public structures by prefixing them with "xen_"
* Added an entry for xenpf_symdata in xlat.lst
* Fixed pv_cpuid check for vpmu-specific cpuid adjustments
* Varios code style fixes
* Eliminated anonymous unions
* Added more verbiage to NMI patch description


Changes in v3:

* Moved PMU MSR banks out from architectural context data structures to allow
for future expansion without protocol changes
* PMU interrupts can be either NMIs or regular vector interrupts (the latter
is the default)
* Context is now marked as PMU_CACHED by the hypervisor code to avoid certain
race conditions with the guest
* Fixed races with PV guest in MSR access handlers
* More Intel VPMU cleanup
* Moved NMI-unsafe code from NMI handler
* Dropped changes to vcpu->is_running
* Added LVTPC apic handling (cached for PV guests)
* Separated privileged profiling mode into a standalone patch
* Separated NMI handling into a standalone patch


Changes in v2:

* Xen symbols are exported as data structure (as opoosed to a set of formatted
strings in v1). Even though one symbol per hypercall is returned performance
appears to be acceptable: reading whole file from dom0 userland takes on average
about twice as long as reading /proc/kallsyms
* More cleanup of Intel VPMU code to simplify publicly exported structures
* There is an architecture-independent and x86-specific public include files (ARM
has a stub)
* General cleanup of public include files to make them more presentable (and
to make auto doc generation better)
* Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making
changes to common/schedule.c architecture-independent). Note that this is not
tested since I don't have access to ARM hardware.
* PCPU ID of interrupted processor is now passed to PV guest


The following patch series adds PMU support in Xen for PV(H)
guests. There is a companion patchset for Linux kernel. In addition,
another set of changes will be provided (later) for userland perf
code.

This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that have running dom0 VCPUs
on them.
* No backtrace support.
* Will fail to load under XSM: we ran out of bits in permissions vector and
this needs to be fixed separately

A few notes that may help reviewing:

* A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor
CPU is used for passing registers' values as well as PMU state at the time of
PMU interrupt.
* PMU interrupts are taken by hypervisor either as NMIs or regular vector
interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests
and as virtual interrupts to PV(H) guests
* PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it
accesses xenpmu_data_t and flushes it to HW it before returning.
* PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags}
in addition to 'vpmu' boot option (which is preserved for back compatibility).
The following modes are provided:
  * disable: VPMU is off
  * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen
  * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling
    in guests is suspended.
* /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to
/proc/kallsyms)
* VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved
up from hvm subtree


Boris Ostrovsky (20):
  common/symbols: Export hypervisor symbols to privileged guest
  x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  x86/VPMU: Make vpmu macros a bit more efficient
  intel/VPMU: Clean up Intel VPMU code
  vmx: Merge MSR management routines
  x86/VPMU: Handle APIC_LVTPC accesses
  intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  x86/VPMU: Add public xenpmu.h
  x86/VPMU: Make vpmu not HVM-specific
  x86/VPMU: Interface for setting PMU mode and flags
  x86/VPMU: Initialize PMU for PV(H) guests
  x86/VPMU: When handling MSR accesses, leave fault injection to callers
  x86/VPMU: Add support for PMU register handling on PV guests
  x86/VPMU: Handle PMU interrupts for PV guests
  x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  x86/VPMU: Add privileged PMU mode
  x86/VPMU: Save VPMU state for PV guests during context switch
  x86/VPMU: NMI-based VPMU support
  x86/VPMU: Move VPMU files up from hvm/ directory

 xen/arch/x86/Makefile                    |   1 +
 xen/arch/x86/domain.c                    |  21 +-
 xen/arch/x86/hvm/Makefile                |   1 -
 xen/arch/x86/hvm/hvm.c                   |   3 +-
 xen/arch/x86/hvm/svm/Makefile            |   1 -
 xen/arch/x86/hvm/svm/svm.c               |  14 +-
 xen/arch/x86/hvm/svm/vpmu.c              | 496 ----------------
 xen/arch/x86/hvm/vlapic.c                |   5 +-
 xen/arch/x86/hvm/vmx/Makefile            |   1 -
 xen/arch/x86/hvm/vmx/vmcs.c              | 114 ++--
 xen/arch/x86/hvm/vmx/vmx.c               |  41 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 937 -------------------------------
 xen/arch/x86/hvm/vpmu.c                  | 265 ---------
 xen/arch/x86/oprofile/op_model_ppro.c    |   8 +-
 xen/arch/x86/platform_hypercall.c        |  33 ++
 xen/arch/x86/traps.c                     |  60 +-
 xen/arch/x86/vpmu.c                      | 828 +++++++++++++++++++++++++++
 xen/arch/x86/vpmu_amd.c                  | 517 +++++++++++++++++
 xen/arch/x86/vpmu_intel.c                | 934 ++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/entry.S       |   4 +
 xen/arch/x86/x86_64/entry.S              |   4 +
 xen/common/event_channel.c               |   1 +
 xen/common/symbols.c                     |  54 ++
 xen/include/Makefile                     |   1 +
 xen/include/asm-x86/domain.h             |   2 +
 xen/include/asm-x86/hvm/vcpu.h           |   3 -
 xen/include/asm-x86/hvm/vmx/vmcs.h       |  10 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  51 --
 xen/include/asm-x86/hvm/vpmu.h           | 104 ----
 xen/include/asm-x86/vpmu.h               | 118 ++++
 xen/include/public/arch-arm.h            |   3 +
 xen/include/public/arch-x86/pmu.h        |  75 +++
 xen/include/public/platform.h            |  19 +
 xen/include/public/pmu.h                 |  93 +++
 xen/include/public/xen.h                 |   2 +
 xen/include/xen/hypercall.h              |   4 +
 xen/include/xen/softirq.h                |   1 +
 xen/include/xen/symbols.h                |   3 +
 xen/include/xlat.lst                     |   5 +
 39 files changed, 2906 insertions(+), 1931 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Export Xen's symbols as {<address><type><name>} triplet via new XENPF_get_symbol
hypercall

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/platform_hypercall.c | 33 ++++++++++++++++++++++++
 xen/common/symbols.c              | 54 +++++++++++++++++++++++++++++++++++++++
 xen/include/public/platform.h     | 19 ++++++++++++++
 xen/include/xen/symbols.h         |  3 +++
 xen/include/xlat.lst              |  1 +
 5 files changed, 110 insertions(+)

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 2162811..68bc6d9 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include <xen/cpu.h>
 #include <xen/pmstat.h>
 #include <xen/irq.h>
+#include <xen/symbols.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -601,6 +602,38 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_get_symbol:
+    {
+        static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
+        XEN_GUEST_HANDLE(char) nameh;
+        uint32_t namelen, copylen;
+
+        guest_from_compat_handle(nameh, op->u.symdata.name);
+
+        ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
+                           &op->u.symdata.address, name);
+
+        namelen = strlen(name) + 1;
+
+        if ( namelen > op->u.symdata.namelen )
+        {
+            /* Caller's buffer is too small for the whole string */
+            if ( op->u.symdata.namelen )
+                name[op->u.symdata.namelen] = '\0';
+            copylen = op->u.symdata.namelen;
+        }
+        else
+            copylen = namelen;
+
+        op->u.symdata.namelen = namelen;
+
+        if ( !ret && copy_to_guest(nameh, name, copylen) )
+            ret = -EFAULT;
+        if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
+            ret = -EFAULT;
+    }
+    break;
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index bc2fde6..2c0942d 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <public/platform.h>
+#include <xen/guest_access.h>
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -148,3 +150,55 @@ const char *symbols_lookup(unsigned long addr,
     *offset = addr - symbols_address(low);
     return namebuf;
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+    /*
+     * Get just the first code, look it up in the token table,
+     * and return the first char from this token.
+     */
+    return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name)
+{
+    /*
+     * Symbols are most likely accessed sequentially so we remember position
+     * from previous read. This can help us avoid the extra call to
+     * get_symbol_offset().
+     */
+    static uint64_t next_symbol, next_offset;
+    static DEFINE_SPINLOCK(symbols_mutex);
+
+    if ( *symnum > symbols_num_syms )
+        return -ERANGE;
+    if ( *symnum == symbols_num_syms )
+    {
+        /* No more symbols */
+        name[0] = '\0';
+        return 0;
+    }
+
+    spin_lock(&symbols_mutex);
+
+    if ( *symnum == 0 )
+        next_offset = next_symbol = 0;
+    if ( next_symbol != *symnum )
+        /* Non-sequential access */
+        next_offset = get_symbol_offset(*symnum);
+
+    *type = symbols_get_symbol_type(next_offset);
+    next_offset = symbols_expand_symbol(next_offset, name);
+    *address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
+
+    next_symbol = ++*symnum;
+
+    spin_unlock(&symbols_mutex);
+
+    return 0;
+}
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 053b9fa..abf12a1 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -527,6 +527,24 @@ struct xenpf_core_parking {
 typedef struct xenpf_core_parking xenpf_core_parking_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 
+#define XENPF_get_symbol   61
+struct xenpf_symdata {
+    /* IN/OUT variables */
+    uint32_t namelen; /* IN:  size of name buffer                       */
+                      /* OUT: strlen(name) of hypervisor symbol (may be */
+                      /*      larger than what's been copied to guest)  */
+    uint32_t symnum;  /* IN:  Symbol to read                            */
+                      /* OUT: Next available symbol. If same as IN then */
+                      /*      we reached the end                        */
+
+    /* OUT variables */
+    char type;
+    uint64_t address;
+    XEN_GUEST_HANDLE(char) name;
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -553,6 +571,7 @@ struct xen_platform_op {
         struct xenpf_cpu_hotadd        cpu_add;
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
+        struct xenpf_symdata           symdata;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 87cd77d..1fa0537 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -11,4 +11,7 @@ const char *symbols_lookup(unsigned long addr,
                            unsigned long *offset,
                            char *namebuf);
 
+int xensyms_read(uint32_t *symnum, char *type,
+                 uint64_t *address, char *name);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9a35dd7..c8fafef 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -86,6 +86,7 @@
 ?	processor_px			platform.h
 !	psd_package			platform.h
 ?	xenpf_enter_acpi_sleep		platform.h
+!	xenpf_symdata			platform.h
 ?	xenpf_pcpuinfo			platform.h
 ?	xenpf_pcpu_version		platform.h
 !	sched_poll			sched.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-11 13:28   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

vpmu_load() may leave VPMU_CONTEXT_SAVE set after calling vpmu_save_force() if
the context is not loaded.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vpmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 63765fa..26e971b 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -130,6 +130,8 @@ static void vpmu_save_force(void *arg)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         return;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
     if ( vpmu->arch_vpmu_ops )
         (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
 
@@ -178,7 +180,6 @@ void vpmu_load(struct vcpu *v)
          */
         if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
             on_selected_cpus(cpumask_of(vpmu->last_pcpu),
                              vpmu_save_force, (void *)v, 1);
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
@@ -195,7 +196,6 @@ void vpmu_load(struct vcpu *v)
         vpmu = vcpu_vpmu(prev);
 
         /* Someone ran here before us */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
         vpmu_save_force(prev);
         vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

In preparation for making VPMU code shared with PV make sure that we we update
MSR bitmaps only for HVM/PVH guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 21 +++++++++++++--------
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  8 +++++---
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 3ac7d53..4742e72 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -244,7 +244,8 @@ static int amd_vpmu_save(struct vcpu *v)
 
     context_save(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -284,8 +285,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     /* For all counters, enable guest only mode for HVM guest */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        !(is_guest_mode(msr_content)) )
+    if ( has_hvm_container_domain(v->domain) &&
+         (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+         !is_guest_mode(msr_content) )
     {
         set_guest_mode(msr_content);
     }
@@ -300,8 +302,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
-        if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_set_msr_bitmap(v);
+        if ( has_hvm_container_domain(v->domain) &&
+             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             amd_vpmu_set_msr_bitmap(v);
     }
 
     /* stop saving & restore if guest stops first counter */
@@ -311,8 +314,9 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
+        if ( has_hvm_container_domain(v->domain) &&
+             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+             amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
 
@@ -403,7 +407,8 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_domain(v->domain) &&
+         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index ccd14d9..9664035 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -335,7 +335,8 @@ static int core2_vpmu_save(struct vcpu *v)
     __core2_vpmu_save(v);
 
     /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
+         has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
 
     return 1;
@@ -448,7 +449,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
     {
         __core2_vpmu_load(current);
         vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap )
+        if ( has_hvm_container_domain(current->domain) &&
+             cpu_has_vmx_msr_bitmap )
             core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
     }
     return 1;
@@ -815,7 +817,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
     xfree(core2_vpmu_cxt->pmu_enable);
     xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap )
+    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
     vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 04/20] x86/VPMU: Make vpmu macros a bit more efficient
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Introduce vpmu_are_all_set that allows testing multiple bits at once. Convert macros
into inlines for better compiler checking.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  5 +----
 xen/arch/x86/hvm/vpmu.c           |  3 +--
 xen/include/asm-x86/hvm/vpmu.h    | 25 +++++++++++++++++++++----
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 9664035..5e44745 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -326,10 +326,7 @@ static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        return 0;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) 
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
     __core2_vpmu_save(v);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 26e971b..42cd57e 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -145,8 +145,7 @@ void vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int pcpu = smp_processor_id();
 
-    if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) &&
-           vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) )
+    if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
        return;
 
     vpmu->last_pcpu = pcpu;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 40f63fb..9df14dd 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -81,10 +81,27 @@ struct vpmu_struct {
 #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
 
 
-#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
-#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
+static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
+{
+    vpmu->flags |= mask;
+}
+static inline void vpmu_reset(struct vpmu_struct *vpmu, const u32 mask)
+{
+    vpmu->flags &= ~mask;
+}
+static inline void vpmu_clear(struct vpmu_struct *vpmu)
+{
+    vpmu->flags = 0;
+}
+static inline bool_t vpmu_is_set(const struct vpmu_struct *vpmu, const u32 mask)
+{
+    return !!(vpmu->flags & mask);
+}
+static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
+                                      const u32 mask)
+{
+    return !!((vpmu->flags & mask) == mask);
+}
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-11 13:45   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 06/20] vmx: Merge MSR management routines Boris Ostrovsky
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures with
fields in core2_vpmu_context.

Call core2_get_pmc_count() once, during initialization.

Properly clean up when core2_vpmu_alloc_resource() fails and add routines
to remove MSRs from VMCS.


Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c              |  51 ++++
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 385 ++++++++++++++-----------------
 xen/include/asm-x86/hvm/vmx/vmcs.h       |   2 +
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
 4 files changed, 228 insertions(+), 229 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 8ffc562..1f9ac39 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1208,6 +1208,32 @@ int vmx_add_guest_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_guest_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+
+    --msr_count;
+    memmove(&msr_area[idx], &msr_area[idx + 1],
+            sizeof(struct vmx_msr_entry) * (msr_count - idx));
+    msr_area[msr_count].index = 0;
+
+    curr->arch.hvm_vmx.msr_count = msr_count;
+    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
+    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
+}
+
 int vmx_add_host_load_msr(u32 msr)
 {
     struct vcpu *curr = current;
@@ -1238,6 +1264,31 @@ int vmx_add_host_load_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_host_load_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int idx,  msr_count = curr->arch.hvm_vmx.host_msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+
+    --msr_count;
+    memmove(&msr_area[idx], &msr_area[idx + 1],
+            sizeof(struct vmx_msr_entry) * (msr_count - idx));
+    msr_area[msr_count].index = 0;
+
+    curr->arch.hvm_vmx.host_msr_count = msr_count;
+    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+}
+
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
 {
     if ( !test_and_set_bit(vector, v->arch.hvm_vmx.eoi_exit_bitmap) )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 5e44745..f3f382d 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -69,6 +69,27 @@
 static bool_t __read_mostly full_width_write;
 
 /*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+#define VPMU_CORE2_MAX_FIXED_PMCS     4
+struct core2_vpmu_context {
+    u64 fixed_ctrl;
+    u64 ds_area;
+    u64 pebs_enable;
+    u64 global_ovf_status;
+    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
+    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
+    struct arch_msr_pair arch_msr_pair[1];
+};
+
+/* Number of general-purpose and fixed performance counters */
+static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
+
+/*
  * QUIRK to workaround an issue on various family 6 cpus.
  * The issue leads to endless PMC interrupt loops on the processor.
  * If the interrupt handler is running and a pmc reaches the value 0, this
@@ -88,11 +109,8 @@ static void check_pmc_quirk(void)
         is_pmc_quirk = 0;    
 }
 
-static int core2_get_pmc_count(void);
 static void handle_pmc_quirk(u64 msr_content)
 {
-    int num_gen_pmc = core2_get_pmc_count();
-    int num_fix_pmc  = 3;
     int i;
     u64 val;
 
@@ -100,7 +118,7 @@ static void handle_pmc_quirk(u64 msr_content)
         return;
 
     val = msr_content;
-    for ( i = 0; i < num_gen_pmc; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -112,7 +130,7 @@ static void handle_pmc_quirk(u64 msr_content)
         val >>= 1;
     }
     val = msr_content >> 32;
-    for ( i = 0; i < num_fix_pmc; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -125,128 +143,91 @@ static void handle_pmc_quirk(u64 msr_content)
     }
 }
 
-static const u32 core2_fix_counters_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR0,
-    MSR_CORE_PERF_FIXED_CTR1,
-    MSR_CORE_PERF_FIXED_CTR2
-};
-
 /*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
  */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */
-#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
-
-/* Core 2 Non-architectual Performance Control MSRs. */
-static const u32 core2_ctrls_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR_CTRL,
-    MSR_IA32_PEBS_ENABLE,
-    MSR_IA32_DS_AREA
-};
-
-struct pmumsr {
-    unsigned int num;
-    const u32 *msr;
-};
-
-static const struct pmumsr core2_fix_counters = {
-    VPMU_CORE2_NUM_FIXED,
-    core2_fix_counters_msr
-};
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax;
 
-static const struct pmumsr core2_ctrls = {
-    VPMU_CORE2_NUM_CTRLS,
-    core2_ctrls_msr
-};
-static int arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return MASK_EXTR(eax, PMU_GENERAL_NR_MASK);
+}
 
 /*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
  */
-static int core2_get_pmc_count(void)
+static int core2_get_fixed_pmc_count(void)
 {
-    u32 eax, ebx, ecx, edx;
-
-    if ( arch_pmc_cnt == 0 )
-    {
-        cpuid(0xa, &eax, &ebx, &ecx, &edx);
-        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT;
-    }
+    u32 eax;
 
-    return arch_pmc_cnt;
+    eax = cpuid_eax(0xa);
+    return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
 }
 
 static u64 core2_calc_intial_glb_ctrl_msr(void)
 {
-    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
-    u64 fix_pmc_bits  = (1 << 3) - 1;
-    return ((fix_pmc_bits << 32) | arch_pmc_bits);
+    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
+    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
+
+    return (fix_pmc_bits << 32) | arch_pmc_bits;
 }
 
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
-    u32 eax, ebx, ecx, edx;
+    u32 edx;
 
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
+    edx = cpuid_edx(0xa);
+    return MASK_EXTR(edx, PMU_FIXED_WIDTH_MASK);
 }
 
 static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
 {
-    int i;
     u32 msr_index_pmc;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    switch ( msr_index )
     {
-        if ( core2_fix_counters.msr[i] == msr_index )
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+    case MSR_IA32_DS_AREA:
+    case MSR_IA32_PEBS_ENABLE:
+        *type = MSR_TYPE_CTRL;
+        return 1;
+
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        *type = MSR_TYPE_GLOBAL;
+        return 1;
+
+    default:
+
+        if ( (msr_index >= MSR_CORE_PERF_FIXED_CTR0) &&
+             (msr_index < MSR_CORE_PERF_FIXED_CTR0 + fixed_pmc_cnt) )
         {
+            *index = msr_index - MSR_CORE_PERF_FIXED_CTR0;
             *type = MSR_TYPE_COUNTER;
-            *index = i;
             return 1;
         }
-    }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-    {
-        if ( core2_ctrls.msr[i] == msr_index )
+        if ( (msr_index >= MSR_P6_EVNTSEL0) &&
+             (msr_index < MSR_P6_EVNTSEL0 + arch_pmc_cnt) )
         {
-            *type = MSR_TYPE_CTRL;
-            *index = i;
+            *index = msr_index - MSR_P6_EVNTSEL0;
+            *type = MSR_TYPE_ARCH_CTRL;
             return 1;
         }
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
-    {
-        *type = MSR_TYPE_GLOBAL;
-        return 1;
-    }
 
-    msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
-    if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
-         (msr_index_pmc < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) )
-    {
-        *type = MSR_TYPE_ARCH_COUNTER;
-        *index = msr_index_pmc - MSR_IA32_PERFCTR0;
-        return 1;
-    }
-
-    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + core2_get_pmc_count())) )
-    {
-        *type = MSR_TYPE_ARCH_CTRL;
-        *index = msr_index - MSR_P6_EVNTSEL0;
-        return 1;
+        msr_index_pmc = msr_index & MSR_PMC_ALIAS_MASK;
+        if ( (msr_index_pmc >= MSR_IA32_PERFCTR0) &&
+             (msr_index_pmc < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
+        {
+            *type = MSR_TYPE_ARCH_COUNTER;
+            *index = msr_index_pmc - MSR_IA32_PERFCTR0;
+            return 1;
+        }
+        return 0;
     }
-
-    return 0;
 }
 
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
@@ -254,13 +235,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     int i;
 
     /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                   msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
@@ -275,26 +256,28 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
     }
 
     /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
 
         if ( full_width_write )
@@ -305,10 +288,12 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0 + i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static inline void __core2_vpmu_save(struct vcpu *v)
@@ -316,10 +301,10 @@ static inline void __core2_vpmu_save(struct vcpu *v)
     int i;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        rdmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -344,20 +329,22 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     unsigned int i, pmc_start;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        wrmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
     else
         pmc_start = MSR_IA32_PERFCTR0;
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
         wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, core2_vpmu_cxt->arch_msr_pair[i].control);
+    }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -376,56 +363,39 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct core2_vpmu_context *core2_vpmu_cxt;
-    struct core2_pmu_enable *pmu_enable;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
     if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
 
-    pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) +
-                               core2_get_pmc_count() - 1);
-    if ( !pmu_enable )
-        goto out1;
-
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (core2_get_pmc_count()-1)*sizeof(struct arch_msr_pair));
+                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
     if ( !core2_vpmu_cxt )
-        goto out2;
-    core2_vpmu_cxt->pmu_enable = pmu_enable;
+        goto out_err;
+
     vpmu->context = (void *)core2_vpmu_cxt;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
     return 1;
- out2:
-    xfree(pmu_enable);
- out1:
-    gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU feature is "
-             "unavailable on domain %d vcpu %d.\n",
-             v->vcpu_id, v->domain->domain_id);
-    return 0;
-}
 
-static void core2_vpmu_save_msr_context(struct vcpu *v, int type,
-                                       int index, u64 msr_data)
-{
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+out_err:
+    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    release_pmu_ownship(PMU_OWNER_HVM);
 
-    switch ( type )
-    {
-    case MSR_TYPE_CTRL:
-        core2_vpmu_cxt->ctrls[index] = msr_data;
-        break;
-    case MSR_TYPE_ARCH_CTRL:
-        core2_vpmu_cxt->arch_msr_pair[index].control = msr_data;
-        break;
-    }
+    printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
 }
 
 static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
@@ -436,10 +406,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
         return 0;
 
     if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-	 (vpmu->context != NULL ||
-	  !core2_vpmu_alloc_resource(current)) )
+         !core2_vpmu_alloc_resource(current) )
         return 0;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
     /* Do the lazy load staff. */
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
@@ -455,8 +423,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
-    u64 global_ctrl, non_global_ctrl;
-    char pmu_enable = 0;
+    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -501,6 +468,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
         return 1;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
@@ -513,57 +481,48 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
-            core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0;
+            core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 1;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
-        for ( i = 0; i < core2_get_pmc_count(); i++ )
-        {
-            rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] =
-                    global_ctrl & (non_global_ctrl >> 22) & 1;
-            global_ctrl >>= 1;
-        }
-
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
-        global_ctrl = msr_content >> 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
-        {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
-            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
-            global_ctrl >>= 1;
-        }
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        non_global_ctrl = msr_content;
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        global_ctrl >>= 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        core2_vpmu_cxt->enabled_cntrs &=
+                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        if ( msr_content != 0 )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] =
-                (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
-            non_global_ctrl >>= 4;
-            global_ctrl >>= 1;
+            u64 val = msr_content;
+            for ( i = 0; i < fixed_pmc_cnt; i++ )
+            {
+                if ( val & 3 )
+                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                val >>= FIXED_CTR_CTRL_BITS;
+            }
         }
+
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
         break;
     default:
         tmp = msr - MSR_P6_EVNTSEL0;
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        if ( tmp >= 0 && tmp < core2_get_pmc_count() )
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] =
-                (global_ctrl >> tmp) & (msr_content >> 22) & 1;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+
+            if ( msr_content & (1ULL << 22) )
+                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+            else
+                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+
+            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+        }
     }
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i];
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i];
-    pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable;
-    if ( pmu_enable )
+    if ( (global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0)  )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -581,7 +540,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
     }
 
-    core2_vpmu_save_msr_context(v, type, index, msr_content);
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -597,7 +555,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             if  ( msr == MSR_IA32_DS_AREA )
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (VPMU_CORE2_NUM_FIXED * FIXED_CTR_CTRL_BITS)) - 1);
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
             if (msr_content & mask)
                 inject_gp = 1;
             break;
@@ -682,7 +640,7 @@ static void core2_vpmu_do_cpuid(unsigned int input,
 static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i, num;
+    unsigned int i;
     const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
     u64 val;
 
@@ -700,27 +658,25 @@ static void core2_vpmu_dump(const struct vcpu *v)
 
     printk("    vPMU running\n");
     core2_vpmu_cxt = vpmu->context;
-    num = core2_get_pmc_count();
+
     /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < num; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
 
-        if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] )
-            printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-                   i, msr_pair[i].counter, msr_pair[i].control);
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+               i, msr_pair[i].counter, msr_pair[i].control);
     }
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
      */
-    val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] )
-            printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-                   i, core2_vpmu_cxt->fix_counters[i],
-                   val & FIXED_CTR_CTRL_MASK);
+        printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
+               i, core2_vpmu_cxt->fix_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
 }
@@ -738,7 +694,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
         core2_vpmu_cxt->global_ovf_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
     else
@@ -801,18 +757,27 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         }
     }
 func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
+    {
+        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
+        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
+               fixed_pmc_cnt);
+    }
     check_pmc_quirk();
+
     return 0;
 }
 
 static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
 
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
-    xfree(core2_vpmu_cxt->pmu_enable);
+
     xfree(vpmu->context);
     if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 215d93c..4cfcd47 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -480,7 +480,9 @@ void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
 int vmx_add_guest_msr(u32 msr);
+void vmx_rm_guest_msr(u32 msr);
 int vmx_add_host_load_msr(u32 msr);
+void vmx_rm_host_load_msr(u32 msr);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
index 60b05fd..410372d 100644
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
@@ -23,29 +23,10 @@
 #ifndef __ASM_X86_HVM_VPMU_CORE_H_
 #define __ASM_X86_HVM_VPMU_CORE_H_
 
-/* Currently only 3 fixed counters are supported. */
-#define VPMU_CORE2_NUM_FIXED 3
-/* Currently only 3 Non-architectual Performance Control MSRs */
-#define VPMU_CORE2_NUM_CTRLS 3
-
 struct arch_msr_pair {
     u64 counter;
     u64 control;
 };
 
-struct core2_pmu_enable {
-    char ds_area_enable;
-    char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
-    char arch_pmc_enable[1];
-};
-
-struct core2_vpmu_context {
-    struct core2_pmu_enable *pmu_enable;
-    u64 fix_counters[VPMU_CORE2_NUM_FIXED];
-    u64 ctrls[VPMU_CORE2_NUM_CTRLS];
-    u64 global_ovf_status;
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 06/20] vmx: Merge MSR management routines
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

vmx_add_host_load_msr()/vmx_rm_guest_msr() and vmx_add_guest_msr()/vmx_rm_guest_msr()
share fair amount of code. Merge them to simplify code maintenance.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 155 +++++++++++++++++--------------------
 xen/arch/x86/hvm/vmx/vmx.c         |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c  |   8 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  10 ++-
 4 files changed, 85 insertions(+), 92 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 1f9ac39..80f73ea 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1176,117 +1176,108 @@ int vmx_write_guest_msr(u32 msr, u64 val)
     return -ESRCH;
 }
 
-int vmx_add_guest_msr(u32 msr)
+int vmx_add_msr(u32 msr, u8 type)
 {
     struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+    unsigned int idx, *msr_count;
+    struct vmx_msr_entry **msr_area, *msr_area_elem;
+
+    if ( type == VMX_GUEST_MSR )
+    {
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+    }
+    else
+    {
+        ASSERT(type == VMX_HOST_MSR);
+        msr_count = &curr->arch.hvm_vmx.host_msr_count;
+        msr_area = &curr->arch.hvm_vmx.host_msr_area;
+    }
 
-    if ( msr_area == NULL )
+    if ( *msr_area == NULL )
     {
-        if ( (msr_area = alloc_xenheap_page()) == NULL )
+        if ( (*msr_area = alloc_xenheap_page()) == NULL )
             return -ENOMEM;
-        curr->arch.hvm_vmx.msr_area = msr_area;
-        __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(msr_area));
-        __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(msr_area));
+
+        if ( type == VMX_GUEST_MSR )
+        {
+            __vmwrite(VM_EXIT_MSR_STORE_ADDR, virt_to_maddr(*msr_area));
+            __vmwrite(VM_ENTRY_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
+        }
+        else
+            __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(*msr_area));
     }
 
-    for ( i = 0; i < msr_count; i++ )
-        if ( msr_area[i].index == msr )
+    for ( idx = 0; idx < *msr_count; idx++ )
+        if ( (*msr_area)[idx].index == msr )
             return 0;
 
-    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
+    if ( *msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
         return -ENOSPC;
 
-    msr_area[msr_count].index = msr;
-    msr_area[msr_count].mbz   = 0;
-    msr_area[msr_count].data  = 0;
-    curr->arch.hvm_vmx.msr_count = ++msr_count;
-    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
-    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
+    msr_area_elem = *msr_area + *msr_count;
+    msr_area_elem->index = msr;
+    msr_area_elem->mbz = 0;
 
-    return 0;
-}
-
-void vmx_rm_guest_msr(u32 msr)
-{
-    struct vcpu *curr = current;
-    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
-
-    if ( msr_area == NULL )
-        return;
+    ++*msr_count;
 
-    for ( idx = 0; idx < msr_count; idx++ )
-        if ( msr_area[idx].index == msr )
-            break;
-
-    if ( idx == msr_count )
-        return;
-
-    --msr_count;
-    memmove(&msr_area[idx], &msr_area[idx + 1],
-            sizeof(struct vmx_msr_entry) * (msr_count - idx));
-    msr_area[msr_count].index = 0;
-
-    curr->arch.hvm_vmx.msr_count = msr_count;
-    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
-    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
-}
-
-int vmx_add_host_load_msr(u32 msr)
-{
-    struct vcpu *curr = current;
-    unsigned int i, msr_count = curr->arch.hvm_vmx.host_msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
-
-    if ( msr_area == NULL )
+    if ( type == VMX_GUEST_MSR )
     {
-        if ( (msr_area = alloc_xenheap_page()) == NULL )
-            return -ENOMEM;
-        curr->arch.hvm_vmx.host_msr_area = msr_area;
-        __vmwrite(VM_EXIT_MSR_LOAD_ADDR, virt_to_maddr(msr_area));
+        msr_area_elem->data = 0;
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+    }
+    else
+    {
+        rdmsrl(msr, msr_area_elem->data);
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
     }
-
-    for ( i = 0; i < msr_count; i++ )
-        if ( msr_area[i].index == msr )
-            return 0;
-
-    if ( msr_count == (PAGE_SIZE / sizeof(struct vmx_msr_entry)) )
-        return -ENOSPC;
-
-    msr_area[msr_count].index = msr;
-    msr_area[msr_count].mbz   = 0;
-    rdmsrl(msr, msr_area[msr_count].data);
-    curr->arch.hvm_vmx.host_msr_count = ++msr_count;
-    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
 
     return 0;
 }
 
-void vmx_rm_host_load_msr(u32 msr)
+void vmx_rm_msr(u32 msr, u8 type)
 {
     struct vcpu *curr = current;
-    unsigned int idx,  msr_count = curr->arch.hvm_vmx.host_msr_count;
-    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
+    unsigned int idx, *msr_count;
+    struct vmx_msr_entry **msr_area;
 
-    if ( msr_area == NULL )
+    if ( type == VMX_GUEST_MSR )
+    {
+        msr_count = &curr->arch.hvm_vmx.msr_count;
+        msr_area = &curr->arch.hvm_vmx.msr_area;
+    }
+    else
+    {
+        ASSERT(type == VMX_HOST_MSR);
+        msr_count = &curr->arch.hvm_vmx.host_msr_count;
+        msr_area = &curr->arch.hvm_vmx.host_msr_area;
+    }
+
+    if ( *msr_area == NULL )
         return;
 
-    for ( idx = 0; idx < msr_count; idx++ )
-        if ( msr_area[idx].index == msr )
+    for ( idx = 0; idx < *msr_count; idx++ )
+        if ( (*msr_area)[idx].index == msr )
             break;
 
-    if ( idx == msr_count )
+    if ( idx == *msr_count )
         return;
 
-    --msr_count;
-    memmove(&msr_area[idx], &msr_area[idx + 1],
-            sizeof(struct vmx_msr_entry) * (msr_count - idx));
-    msr_area[msr_count].index = 0;
+    --*msr_count;
+    memmove(&(*msr_area)[idx], &(*msr_area)[idx + 1],
+            sizeof(struct vmx_msr_entry) * (*msr_count - idx));
+    msr_area[*msr_count]->index = 0;
 
-    curr->arch.hvm_vmx.host_msr_count = msr_count;
-    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+    if ( type == VMX_GUEST_MSR )
+    {
+        __vmwrite(VM_EXIT_MSR_STORE_COUNT, *msr_count);
+        __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, *msr_count);
+    }
+    else
+    {
+        __vmwrite(VM_EXIT_MSR_LOAD_COUNT, *msr_count);
+    }
 }
 
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2caa04a..22b3325 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2262,12 +2262,12 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
 
             for ( ; (rc == 0) && lbr->count; lbr++ )
                 for ( i = 0; (rc == 0) && (i < lbr->count); i++ )
-                    if ( (rc = vmx_add_guest_msr(lbr->base + i)) == 0 )
+                    if ( (rc = vmx_add_msr(lbr->base + i, VMX_GUEST_MSR)) == 0 )
                         vmx_disable_intercept_for_msr(v, lbr->base + i, MSR_TYPE_R | MSR_TYPE_W);
         }
 
         if ( (rc < 0) ||
-             (vmx_add_host_load_msr(msr) < 0) )
+             (vmx_add_msr(msr, VMX_HOST_MSR) < 0) )
             hvm_inject_hw_exception(TRAP_machine_check, 0);
         else
         {
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index f3f382d..f277c2a 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -368,10 +368,10 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
         goto out_err;
 
-    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
@@ -388,8 +388,8 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     return 1;
 
 out_err:
-    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
-    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
+    vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
     release_pmu_ownship(PMU_OWNER_HVM);
 
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 4cfcd47..80bc998 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -475,14 +475,16 @@ enum vmcs_field {
 
 #define MSR_TYPE_R 1
 #define MSR_TYPE_W 2
+
+#define VMX_GUEST_MSR 0
+#define VMX_HOST_MSR  1
+
 void vmx_disable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
-int vmx_add_guest_msr(u32 msr);
-void vmx_rm_guest_msr(u32 msr);
-int vmx_add_host_load_msr(u32 msr);
-void vmx_rm_host_load_msr(u32 msr);
+int vmx_add_msr(u32 msr, u8 type);
+void vmx_rm_msr(u32 msr, u8 type);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 06/20] vmx: Merge MSR management routines Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-11 13:49   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Update APIC_LVTPC vector when HVM guest writes to it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |  4 ----
 xen/arch/x86/hvm/vlapic.c         |  5 ++++-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 17 -----------------
 xen/arch/x86/hvm/vpmu.c           |  8 ++++++++
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 5 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 4742e72..08301c5 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -299,8 +299,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
         if ( has_hvm_container_domain(v->domain) &&
              !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
@@ -311,8 +309,6 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
         if ( has_hvm_container_domain(v->domain) &&
              ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index cd7e872..f1b543c 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,6 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
@@ -734,8 +735,10 @@ static int vlapic_reg_write(struct vcpu *v,
             vlapic_adjust_i8259_target(v->domain);
             pt_may_unmask_irq(v->domain, NULL);
         }
-        if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
+        else if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
             pt_may_unmask_irq(NULL, &vlapic->pt);
+        else if ( offset == APIC_LVTPC )
+            vpmu_lvtpc_update(val);
         break;
 
     case APIC_TMICT:
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index f277c2a..965f2f9 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -527,19 +527,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    /* Setup LVTPC in local apic */
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
-    }
-    else
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
-    }
-
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -705,10 +692,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
             return 0;
     }
 
-    /* HW sets the MASK bit when performance counter interrupt occurs*/
-    vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED;
-    apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-
     return 1;
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 42cd57e..99ba766 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -64,6 +64,14 @@ static void __init parse_vpmu_param(char *s)
     }
 }
 
+void vpmu_lvtpc_update(uint32_t val)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 9df14dd..f58d356 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -103,6 +103,7 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
     return !!((vpmu->flags & mask) == mask);
 }
 
+void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

MSR_CORE_PERF_GLOBAL_CTRL register should be set zero initially. It is up to
the guest to set it so that counters are enabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 965f2f9..449f805 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -165,14 +165,6 @@ static int core2_get_fixed_pmc_count(void)
     return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
 }
 
-static u64 core2_calc_intial_glb_ctrl_msr(void)
-{
-    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
-    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
-
-    return (fix_pmc_bits << 32) | arch_pmc_bits;
-}
-
 /* edx bits 5-12: Bit width of fixed-function performance counters  */
 static int core2_get_bitwidth_fix_count(void)
 {
@@ -373,8 +365,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
         goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                 core2_calc_intial_glb_ctrl_msr());
+    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
                     (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-11 14:08   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c              |  84 ++++++++++++----------
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 117 +++++++++++++++++--------------
 xen/arch/x86/hvm/vpmu.c                  |   4 ++
 xen/arch/x86/oprofile/op_model_ppro.c    |   6 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 ---------
 xen/include/asm-x86/hvm/vpmu.h           |  16 ++---
 xen/include/public/arch-arm.h            |   3 +
 xen/include/public/arch-x86/pmu.h        |  68 ++++++++++++++++++
 xen/include/public/pmu.h                 |  38 ++++++++++
 9 files changed, 236 insertions(+), 132 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 08301c5..cf56995 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vpmu.h>
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include <public/pmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,10 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+#define AMD_MAX_COUNTERS    6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
     MSR_K7_PERFCTR0,
@@ -83,12 +84,14 @@ static const u32 AMD_F15H_CTRLS[] = {
     MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-    u64 counters[MAX_NUM_COUNTERS];
-    u64 ctrls[MAX_NUM_COUNTERS];
-    bool_t msr_bitmap_set;
-};
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)    do {                                    \
+                                   (vpmu)->priv_context = (void *)-1L; \
+                               } while (0)
+#define msr_bitmap_off(vpmu)   do {                                    \
+                                   (vpmu)->priv_context = NULL;        \
+                               } while (0)
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
 
 static inline int get_pmu_reg_type(u32 addr)
 {
@@ -142,7 +145,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -150,14 +152,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
     }
 
-    ctxt->msr_bitmap_set = 1;
+    msr_bitmap_on(vpmu);
 }
 
 static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
 
     for ( i = 0; i < num_counters; i++ )
     {
@@ -165,7 +166,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
         svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
     }
 
-    ctxt->msr_bitmap_set = 0;
+    msr_bitmap_off(vpmu);
 }
 
 static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
@@ -177,19 +178,22 @@ static inline void context_load(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     for ( i = 0; i < num_counters; i++ )
     {
-        wrmsrl(counters[i], ctxt->counters[i]);
-        wrmsrl(ctrls[i], ctxt->ctrls[i]);
+        wrmsrl(counters[i], counter_regs[i]);
+        wrmsrl(ctrls[i], ctrl_regs[i]);
     }
 }
 
 static void amd_vpmu_load(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     vpmu_reset(vpmu, VPMU_FROZEN);
 
@@ -198,7 +202,7 @@ static void amd_vpmu_load(struct vcpu *v)
         unsigned int i;
 
         for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctxt->ctrls[i]);
+            wrmsrl(ctrls[i], ctrl_regs[i]);
 
         return;
     }
@@ -212,17 +216,17 @@ static inline void context_save(struct vcpu *v)
 {
     unsigned int i;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
 
     /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
     for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], ctxt->counters[i]);
+        rdmsrl(counters[i], counter_regs[i]);
 }
 
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctx = vpmu->context;
     unsigned int i;
 
     /*
@@ -245,7 +249,7 @@ static int amd_vpmu_save(struct vcpu *v)
     context_save(v);
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         has_hvm_container_domain(v->domain) && ctx->msr_bitmap_set )
+         has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -256,7 +260,9 @@ static void context_update(unsigned int msr, u64 msr_content)
     unsigned int i;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
+    struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
     if ( k7_counters_mirrored &&
         ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
@@ -268,12 +274,12 @@ static void context_update(unsigned int msr, u64 msr_content)
     {
        if ( msr == ctrls[i] )
        {
-           ctxt->ctrls[i] = msr_content;
+           ctrl_regs[i] = msr_content;
            return;
        }
         else if (msr == counters[i] )
         {
-            ctxt->counters[i] = msr_content;
+            counter_regs[i] = msr_content;
             return;
         }
     }
@@ -300,8 +306,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
 
-        if ( has_hvm_container_domain(v->domain) &&
-             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -310,8 +315,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( has_hvm_container_domain(v->domain) &&
-             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
              amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -352,7 +356,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
 static int amd_vpmu_initialise(struct vcpu *v)
 {
-    struct amd_vpmu_context *ctxt;
+    struct xen_pmu_amd_ctxt *ctxt;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
 
@@ -382,7 +386,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc(struct amd_vpmu_context);
+    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
+                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
     if ( !ctxt )
     {
         gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
@@ -391,7 +396,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
         return -ENOMEM;
     }
 
+    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
+    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
+
     vpmu->context = ctxt;
+    vpmu->priv_context = NULL;
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
     return 0;
 }
@@ -403,8 +412,7 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( has_hvm_container_domain(v->domain) &&
-         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
@@ -421,7 +429,9 @@ static void amd_vpmu_destroy(struct vcpu *v)
 static void amd_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    const struct amd_vpmu_context *ctxt = vpmu->context;
+    const struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+    const uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+    const uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
     unsigned int i;
 
     printk("    VPMU state: 0x%x ", vpmu->flags);
@@ -451,8 +461,8 @@ static void amd_vpmu_dump(const struct vcpu *v)
         rdmsrl(ctrls[i], ctrl);
         rdmsrl(counters[i], cntr);
         printk("      %#x: %#lx (%#lx in HW)    %#x: %#lx (%#lx in HW)\n",
-               ctrls[i], ctxt->ctrls[i], ctrl,
-               counters[i], ctxt->counters[i], cntr);
+               ctrls[i], ctrl_regs[i], ctrl,
+               counters[i], counter_regs[i], cntr);
     }
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 449f805..6ace964 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -35,8 +35,8 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <public/sched.h>
 #include <public/hvm/save.h>
+#include <public/pmu.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
@@ -68,6 +68,10 @@
 #define MSR_PMC_ALIAS_MASK       (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_A_PERFCTR0))
 static bool_t __read_mostly full_width_write;
 
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
 /*
  * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
  * counters. 4 bits for every counter.
@@ -75,17 +79,6 @@ static bool_t __read_mostly full_width_write;
 #define FIXED_CTR_CTRL_BITS 4
 #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
 
-#define VPMU_CORE2_MAX_FIXED_PMCS     4
-struct core2_vpmu_context {
-    u64 fixed_ctrl;
-    u64 ds_area;
-    u64 pebs_enable;
-    u64 global_ovf_status;
-    u64 enabled_cntrs;  /* Follows PERF_GLOBAL_CTRL MSR format */
-    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 /* Number of general-purpose and fixed performance counters */
 static unsigned int __read_mostly arch_pmc_cnt, fixed_pmc_cnt;
 
@@ -222,6 +215,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
     }
 }
 
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
@@ -291,12 +285,15 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 static inline void __core2_vpmu_save(struct vcpu *v)
 {
     int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
+        rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -319,10 +316,13 @@ static int core2_vpmu_save(struct vcpu *v)
 static inline void __core2_vpmu_load(struct vcpu *v)
 {
     unsigned int i, pmc_start;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+    uint64_t *fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
+    struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+        vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
     for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, core2_vpmu_cxt->fix_counters[i]);
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 
     if ( full_width_write )
         pmc_start = MSR_IA32_A_PERFCTR0;
@@ -330,8 +330,8 @@ static inline void __core2_vpmu_load(struct vcpu *v)
         pmc_start = MSR_IA32_PERFCTR0;
     for ( i = 0; i < arch_pmc_cnt; i++ )
     {
-        wrmsrl(pmc_start + i, core2_vpmu_cxt->arch_msr_pair[i].counter);
-        wrmsrl(MSR_P6_EVNTSEL0 + i, core2_vpmu_cxt->arch_msr_pair[i].control);
+        wrmsrl(pmc_start + i, xen_pmu_cntr_pair[i].counter);
+        wrmsrl(MSR_P6_EVNTSEL0 + i, xen_pmu_cntr_pair[i].control);
     }
 
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
@@ -354,7 +354,8 @@ static void core2_vpmu_load(struct vcpu *v)
 static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *p = NULL;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
@@ -367,12 +368,20 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
         goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
-    if ( !core2_vpmu_cxt )
+    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                   sizeof(uint64_t) * fixed_pmc_cnt +
+                                   sizeof(struct xen_pmu_cntr_pair) *
+                                   arch_pmc_cnt);
+    p = xzalloc(uint64_t);
+    if ( !core2_vpmu_cxt || !p )
         goto out_err;
 
-    vpmu->context = (void *)core2_vpmu_cxt;
+    core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
+    core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
+                                    sizeof(uint64_t) * fixed_pmc_cnt;
+
+    vpmu->context = core2_vpmu_cxt;
+    vpmu->priv_context = p;
 
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
@@ -383,6 +392,9 @@ out_err:
     vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
     release_pmu_ownship(PMU_OWNER_HVM);
 
+    xfree(core2_vpmu_cxt);
+    xfree(p);
+
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
            v->vcpu_id, v->domain->domain_id);
 
@@ -419,7 +431,8 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
+    uint64_t *enabled_cntrs;
 
     if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -445,10 +458,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     }
 
     core2_vpmu_cxt = vpmu->context;
+    enabled_cntrs = vpmu->priv_context;
     switch ( msr )
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_status &= ~msr_content;
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -482,15 +496,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        core2_vpmu_cxt->enabled_cntrs &=
-                ~(((1ULL << VPMU_CORE2_MAX_FIXED_PMCS) - 1) << 32);
+        *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
             u64 val = msr_content;
             for ( i = 0; i < fixed_pmc_cnt; i++ )
             {
                 if ( val & 3 )
-                    core2_vpmu_cxt->enabled_cntrs |= (1ULL << 32) << i;
+                    *enabled_cntrs |= (1ULL << 32) << i;
                 val >>= FIXED_CTR_CTRL_BITS;
             }
         }
@@ -501,19 +514,21 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         tmp = msr - MSR_P6_EVNTSEL0;
         if ( tmp >= 0 && tmp < arch_pmc_cnt )
         {
+            struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
+                vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
-                core2_vpmu_cxt->enabled_cntrs |= 1ULL << tmp;
+                *enabled_cntrs |= 1ULL << tmp;
             else
-                core2_vpmu_cxt->enabled_cntrs &= ~(1ULL << tmp);
+                *enabled_cntrs &= ~(1ULL << tmp);
 
-            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            xen_pmu_cntr_pair[tmp].control = msr_content;
         }
     }
 
-    if ( (global_ctrl & core2_vpmu_cxt->enabled_cntrs) ||
-         (core2_vpmu_cxt->ds_area != 0)  )
+    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -559,7 +574,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
     int type = -1, index = -1;
     struct vcpu *v = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
 
     if ( core2_vpmu_msr_common_check(msr, &type, &index) )
     {
@@ -570,7 +585,7 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = 0;
             break;
         case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_ovf_status;
+            *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
             vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
@@ -619,10 +634,12 @@ static void core2_vpmu_dump(const struct vcpu *v)
 {
     const struct vpmu_struct *vpmu = vcpu_vpmu(v);
     unsigned int i;
-    const struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    const struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
     u64 val;
+    uint64_t *fixed_counters;
+    struct xen_pmu_cntr_pair *cntr_pair;
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    if ( !core2_vpmu_cxt || !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
          return;
 
     if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
@@ -635,16 +652,15 @@ static void core2_vpmu_dump(const struct vcpu *v)
     }
 
     printk("    vPMU running\n");
-    core2_vpmu_cxt = vpmu->context;
+
+    cntr_pair = vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
+    fixed_counters = vpmu_reg_pointer(core2_vpmu_cxt, fixed_counters);
 
     /* Print the contents of the counter and its configuration msr. */
     for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        const struct arch_msr_pair *msr_pair = core2_vpmu_cxt->arch_msr_pair;
-
         printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, msr_pair[i].counter, msr_pair[i].control);
-    }
+            i, cntr_pair[i].counter, cntr_pair[i].control);
+
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
@@ -653,7 +669,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         printk("      fixed_%d:   0x%016lx ctrl: %#lx\n",
-               i, core2_vpmu_cxt->fix_counters[i],
+               i, fixed_counters[i],
                val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
@@ -664,14 +680,14 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     u64 msr_content;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
+    struct xen_pmu_intel_ctxt *core2_vpmu_cxt = vpmu->context;
 
     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
     if ( msr_content )
     {
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_ovf_status |= msr_content;
+        core2_vpmu_cxt->global_status |= msr_content;
         msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
@@ -734,12 +750,6 @@ func_out:
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
-    {
-        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
-        printk(XENLOG_G_WARNING "Limiting number of fixed counters to %d\n",
-               fixed_pmc_cnt);
-    }
     check_pmc_quirk();
 
     return 0;
@@ -753,6 +763,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
 
     xfree(vpmu->context);
+    xfree(vpmu->priv_context);
     if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 99ba766..7a48a4b 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <public/pmu.h>
 
 /*
  * "vpmu" :     vpmu generally enabled
@@ -228,6 +229,9 @@ void vpmu_initialise(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t vendor = current_cpu_data.x86_vendor;
 
+    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
+    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
+
     if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         vpmu_destroy(v);
     vpmu_clear(vpmu);
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index 8b9f3f6..9e3510a 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -20,11 +20,15 @@
 #include <asm/regs.h>
 #include <asm/current.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
 
+struct arch_msr_pair {
+    u64 counter;
+    u64 control;
+};
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
deleted file mode 100644
index 410372d..0000000
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-
-/*
- * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_CORE_H_
-#define __ASM_X86_HVM_VPMU_CORE_H_
-
-struct arch_msr_pair {
-    u64 counter;
-    u64 control;
-};
-
-#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
-
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index f58d356..d73340f 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -22,6 +22,8 @@
 #ifndef __ASM_X86_HVM_VPMU_H_
 #define __ASM_X86_HVM_VPMU_H_
 
+#include <public/pmu.h>
+
 /*
  * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
  * See arch/x86/hvm/vpmu.c.
@@ -29,12 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.hvm_vcpu.vpmu))
-#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
@@ -42,6 +41,9 @@
 #define MSR_TYPE_ARCH_COUNTER       3
 #define MSR_TYPE_ARCH_CTRL          4
 
+/* Start of PMU register bank */
+#define vpmu_reg_pointer(ctxt, offset) ((void *)((uintptr_t)ctxt + \
+                                                 (uintptr_t)ctxt->offset))
 
 /* Arch specific operations shared by all vpmus */
 struct arch_vpmu_ops {
@@ -64,7 +66,8 @@ struct vpmu_struct {
     u32 flags;
     u32 last_pcpu;
     u32 hw_lapic_lvtpc;
-    void *context;
+    void *context;      /* May be shared with PV guest */
+    void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
 };
 
@@ -76,11 +79,6 @@ struct vpmu_struct {
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
 
-/* VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-
 static inline void vpmu_set(struct vpmu_struct *vpmu, const u32 mask)
 {
     vpmu->flags |= mask;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index ac54cd6..5b7dbc8 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -407,6 +407,9 @@ typedef uint64_t xen_callback_t;
 
 #endif
 
+/* Stub definition of PMU structure */
+typedef struct xen_arch_pmu {} xen_arch_pmu_t;
+
 #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
 
 /*
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
new file mode 100644
index 0000000..9617a4e
--- /dev/null
+++ b/xen/include/public/arch-x86/pmu.h
@@ -0,0 +1,68 @@
+#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
+#define __XEN_PUBLIC_ARCH_X86_PMU_H__
+
+/* x86-specific PMU definitions */
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+    /* Offsets to counter and control MSRs (relative to xen_arch_pmu.c.amd) */
+    uint32_t counters;
+    uint32_t ctrls;
+};
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+    uint64_t counter;
+    uint64_t control;
+};
+
+struct xen_pmu_intel_ctxt {
+    uint64_t global_ctrl;
+    uint64_t global_ovf_ctrl;
+    uint64_t global_status;
+    uint64_t fixed_ctrl;
+    uint64_t ds_area;
+    uint64_t pebs_enable;
+    uint64_t debugctl;
+    /*
+     * Offsets to fixed and architectural counter MSRs (relative to
+     * xen_arch_pmu.c.intel)
+     */
+    uint32_t fixed_counters;
+    uint32_t arch_counters;
+};
+
+struct xen_arch_pmu {
+    union {
+        struct cpu_user_regs regs;
+        uint8_t pad[256];
+    } r;
+    union {
+        uint32_t lapic_lvtpc;
+        uint64_t pad;
+    } l;
+    union {
+        struct xen_pmu_amd_ctxt amd;
+        struct xen_pmu_intel_ctxt intel;
+
+        /*
+         * Padding for contexts (fixed parts only, does not include MSR banks
+         * that are specified by offsets
+         */
+#define XENPMU_CTXT_PAD_SZ  128
+        uint8_t pad[XENPMU_CTXT_PAD_SZ];
+    } c;
+};
+typedef struct xen_arch_pmu xen_arch_pmu_t;
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
new file mode 100644
index 0000000..d237c25
--- /dev/null
+++ b/xen/include/public/pmu.h
@@ -0,0 +1,38 @@
+#ifndef __XEN_PUBLIC_PMU_H__
+#define __XEN_PUBLIC_PMU_H__
+
+#include "xen.h"
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/pmu.h"
+#elif defined (__arm__) || defined (__aarch64__)
+#include "arch-arm.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    1
+
+
+/* Shared between hypervisor and PV domain */
+struct xen_pmu_data {
+    uint32_t domain_id;
+    uint32_t vcpu_id;
+    uint32_t pcpu_id;
+    uint32_t pmu_flags;
+
+    xen_arch_pmu_t pmu;
+};
+typedef struct xen_pmu_data xen_pmu_data_t;
+
+#endif /* __XEN_PUBLIC_PMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 10/20] x86/VPMU: Make vpmu not HVM-specific
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 5 ++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index abf55fb..07b7ba8 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -403,6 +403,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    struct vpmu_struct vpmu;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index db37232..be4bed0 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -150,9 +150,6 @@ struct hvm_vcpu {
     u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
 
-    /* VPMU */
-    struct vpmu_struct  vpmu;
-
     union {
         struct arch_vmx_struct vmx;
         struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index d73340f..2d8d623 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,8 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (9 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 10:37   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* PMU off
* PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0
  profiles itself and the hypervisor.
* dom0-only PMU: dom0 collects samples for both itself and guests.

For feature flags only Intel's BTS is currently supported.

Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/domain.c              |   4 +-
 xen/arch/x86/hvm/svm/vpmu.c        |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c  |  10 +-
 xen/arch/x86/hvm/vpmu.c            | 188 +++++++++++++++++++++++++++++++++++--
 xen/arch/x86/x86_64/compat/entry.S |   4 +
 xen/arch/x86/x86_64/entry.S        |   4 +
 xen/include/Makefile               |   1 +
 xen/include/asm-x86/hvm/vpmu.h     |  14 +--
 xen/include/public/arch-x86/pmu.h  |  11 ++-
 xen/include/public/pmu.h           |  44 ++++++++-
 xen/include/public/xen.h           |   1 +
 xen/include/xen/hypercall.h        |   4 +
 xen/include/xlat.lst               |   4 +
 13 files changed, 264 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e896210..04bfbff 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
     if ( is_hvm_vcpu(prev) )
     {
-        if (prev != next)
+        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
             vpmu_save(prev);
 
         if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
@@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if (is_hvm_vcpu(next) && (prev != next) )
+    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
         /* Must be done with interrupts enabled */
         vpmu_load(next);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index cf56995..dfd5759 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -476,14 +476,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
     .arch_vpmu_dump = amd_vpmu_dump
 };
 
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int svm_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
     int ret = 0;
 
     /* vpmu enabled? */
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 6ace964..f56672e 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -702,13 +702,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
     return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+static int core2_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     u64 msr_content;
     struct cpuinfo_x86 *c = &current_cpu_data;
 
-    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+    if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
         goto func_out;
     /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
     if ( cpu_has(c, X86_FEATURE_DS) )
@@ -820,7 +820,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
     .do_cpuid = core2_no_vpmu_do_cpuid,
 };
 
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int vmx_vpmu_initialise(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     uint8_t family = current_cpu_data.x86;
@@ -828,7 +828,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
     int ret = 0;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( !vpmu_flags )
+    if ( vpmu_mode == XENPMU_MODE_OFF )
         return 0;
 
     if ( family == 6 )
@@ -871,7 +871,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
         /* future: */
         case 0x3d:
         case 0x4e:
-            ret = core2_vpmu_initialise(v, vpmu_flags);
+            ret = core2_vpmu_initialise(v);
             if ( !ret )
                 vpmu->arch_vpmu_ops = &core2_vpmu_ops;
             return ret;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 7a48a4b..794c1c3 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,8 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -32,13 +34,21 @@
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
 #include <public/pmu.h>
+#include <xen/tasklet.h>
+
+#include <compat/pmu.h>
+CHECK_pmu_params;
+CHECK_pmu_intel_ctxt;
+CHECK_pmu_amd_ctxt;
+CHECK_pmu_cntr_pair;
 
 /*
  * "vpmu" :     vpmu generally enabled
  * "vpmu=off" : vpmu generally disabled
  * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
  */
-static unsigned int __read_mostly opt_vpmu_enabled;
+uint64_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+uint64_t __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
@@ -52,7 +62,7 @@ static void __init parse_vpmu_param(char *s)
         break;
     default:
         if ( !strcmp(s, "bts") )
-            opt_vpmu_enabled |= VPMU_BOOT_BTS;
+            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
         else if ( *s )
         {
             printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
@@ -60,7 +70,7 @@ static void __init parse_vpmu_param(char *s)
         }
         /* fall through */
     case 1:
-        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
+        vpmu_mode = XENPMU_MODE_SELF;
         break;
     }
 }
@@ -77,6 +87,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( !(vpmu_mode & XENPMU_MODE_SELF) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
         return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
     return 0;
@@ -86,6 +99,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( !(vpmu_mode & XENPMU_MODE_SELF) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
         return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
     return 0;
@@ -240,19 +256,19 @@ void vpmu_initialise(struct vcpu *v)
     switch ( vendor )
     {
     case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( svm_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( vmx_vpmu_initialise(v) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     default:
         printk("VPMU: Initialization failed. "
                "Unknown CPU vendor %d\n", vendor);
-        opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
         break;
     }
 }
@@ -274,3 +290,159 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+static atomic_t vpmu_sched_counter;
+
+static void vpmu_sched_checkin(unsigned long unused)
+{
+    atomic_inc(&vpmu_sched_counter);
+}
+
+static int
+vpmu_force_context_switch(XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    unsigned i, j, allbutself_num;
+    cpumask_t allbutself;
+    static s_time_t start;
+    static struct tasklet *sync_task;
+    int ret = 0;
+
+    allbutself_num = num_online_cpus() - 1;
+
+    if ( sync_task ) /* if true, we are in hypercall continuation */
+        goto cont_wait;
+
+    cpumask_andnot(&allbutself, &cpu_online_map,
+                   cpumask_of(smp_processor_id()));
+
+    sync_task = xmalloc_array(struct tasklet, allbutself_num);
+    if ( !sync_task )
+    {
+        printk("vpmu_force_context_switch: out of memory\n");
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < allbutself_num; i++ )
+        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
+
+    atomic_set(&vpmu_sched_counter, 0);
+
+    j = 0;
+    for_each_cpu ( i, &allbutself )
+        tasklet_schedule_on_cpu(&sync_task[j++], i);
+
+    vpmu_save(current);
+
+    start = NOW();
+
+ cont_wait:
+    /*
+     * Note that we may fail here if a CPU is hot-unplugged while we are
+     * waiting. We will then time out.
+     */
+    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
+    {
+        /* Give up after 5 seconds */
+        if ( NOW() > start + SECONDS(5) )
+        {
+            printk("vpmu_force_context_switch: failed to sync\n");
+            ret = -EBUSY;
+            break;
+        }
+        cpu_relax();
+        if ( hypercall_preempt_check() )
+            return hypercall_create_continuation(
+                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
+    }
+
+    for ( i = 0; i < allbutself_num; i++ )
+        tasklet_kill(&sync_task[i]);
+    xfree(sync_task);
+    sync_task = NULL;
+
+    return ret;
+}
+
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
+{
+    int ret = -EINVAL;
+    xen_pmu_params_t pmu_params;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+    {
+        static DEFINE_SPINLOCK(xenpmu_mode_lock);
+        uint32_t current_mode;
+
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.val & ~XENPMU_MODE_SELF )
+            return -EINVAL;
+
+        /*
+         * Return error is someone else is in the middle of changing mode ---
+         * this is most likely indication of two system administrators
+         * working against each other
+         */
+        if ( !spin_trylock(&xenpmu_mode_lock) )
+            return -EAGAIN;
+
+        current_mode = vpmu_mode;
+        vpmu_mode = pmu_params.val;
+
+        if ( vpmu_mode == XENPMU_MODE_OFF )
+        {
+            /*
+             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
+             * can be achieved by having all physical processors go through
+             * context_switch().
+             */
+            ret = vpmu_force_context_switch(arg);
+            if ( ret )
+                vpmu_mode = current_mode;
+        }
+        else
+            ret = 0;
+
+        spin_unlock(&xenpmu_mode_lock);
+        break;
+    }
+
+    case XENPMU_mode_get:
+        pmu_params.val = vpmu_mode;
+        pmu_params.version.maj = XENPMU_VER_MAJ;
+        pmu_params.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_feature_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        if ( pmu_params.val & ~XENPMU_FEATURE_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_features = pmu_params.val;
+
+        ret = 0;
+        break;
+
+    case XENPMU_feature_get:
+        pmu_params.val = vpmu_mode;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+     }
+
+    return ret;
+}
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index ac594c9..8587c46 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -417,6 +417,8 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall           /* reserved for XenClient */
+        .quad do_xenpmu_op              /* 40 */
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -465,6 +467,8 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 0 /* reserved for XenClient   */
+        .byte 2 /* do_xenpmu_op             */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index a3ed216..704f4d1 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -762,6 +762,8 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall       /* reserved for XenClient */
+        .quad do_xenpmu_op          /* 40 */
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -810,6 +812,8 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 0 /* reserved for XenClient */
+        .byte 2 /* do_xenpmu_op         */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/include/Makefile b/xen/include/Makefile
index f7ccbc9..de8a842 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -26,6 +26,7 @@ headers-y := \
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-mca.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-$(compat-arch-y).h
+headers-$(CONFIG_X86)     += compat/pmu.h compat/arch-x86/pmu.h
 headers-y                 += compat/arch-$(compat-arch-y).h compat/xlat.h
 headers-$(FLASK_ENABLE)   += compat/xsm/flask_op.h
 
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 2d8d623..4a0410d 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -24,13 +24,6 @@
 
 #include <public/pmu.h>
 
-/*
- * Flag bits given as a string on the hypervisor boot parameter 'vpmu'.
- * See arch/x86/hvm/vpmu.c.
- */
-#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
-#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
-
 #define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
 #define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
@@ -58,8 +51,8 @@ struct arch_vpmu_ops {
     void (*arch_vpmu_dump)(const struct vcpu *);
 };
 
-int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
-int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
+int vmx_vpmu_initialise(struct vcpu *);
+int svm_vpmu_initialise(struct vcpu *);
 
 struct vpmu_struct {
     u32 flags;
@@ -115,5 +108,8 @@ void vpmu_dump(struct vcpu *v);
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
+extern uint64_t vpmu_mode;
+extern uint64_t vpmu_features;
+
 #endif /* __ASM_X86_HVM_VPMU_H_*/
 
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
index 9617a4e..4ffdb51 100644
--- a/xen/include/public/arch-x86/pmu.h
+++ b/xen/include/public/arch-x86/pmu.h
@@ -9,12 +9,16 @@ struct xen_pmu_amd_ctxt {
     uint32_t counters;
     uint32_t ctrls;
 };
+typedef struct xen_pmu_amd_ctxt xen_pmu_amd_ctxt_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_amd_ctxt_t);
 
 /* Intel PMU registers and structures */
 struct xen_pmu_cntr_pair {
     uint64_t counter;
     uint64_t control;
 };
+typedef struct xen_pmu_cntr_pair xen_pmu_cntr_pair_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_cntr_pair_t);
 
 struct xen_pmu_intel_ctxt {
     uint64_t global_ctrl;
@@ -31,8 +35,10 @@ struct xen_pmu_intel_ctxt {
     uint32_t fixed_counters;
     uint32_t arch_counters;
 };
+typedef struct xen_pmu_intel_ctxt xen_pmu_intel_ctxt_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_intel_ctxt_t);
 
-struct xen_arch_pmu {
+struct xen_pmu_arch {
     union {
         struct cpu_user_regs regs;
         uint8_t pad[256];
@@ -53,7 +59,8 @@ struct xen_arch_pmu {
         uint8_t pad[XENPMU_CTXT_PAD_SZ];
     } c;
 };
-typedef struct xen_arch_pmu xen_arch_pmu_t;
+typedef struct xen_pmu_arch xen_pmu_arch_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
 
 #endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
 /*
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index d237c25..0855005 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -13,6 +13,48 @@
 #define XENPMU_VER_MAJ    0
 #define XENPMU_VER_MIN    1
 
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_feature_get     2
+#define XENPMU_feature_set     3
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xen_pmu_params {
+    /* IN/OUT parameters */
+    struct {
+        uint32_t maj;
+        uint32_t min;
+    } version;
+    uint64_t val;
+
+    /* IN parameters */
+    uint64_t vcpu;
+};
+typedef struct xen_pmu_params xen_pmu_params_t;
+DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_SELF:  Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ */
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_SELF          (1<<0)
+
+/*
+ * PMU features:
+ * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FEATURE_INTEL_BTS  1
 
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
@@ -21,7 +63,7 @@ struct xen_pmu_data {
     uint32_t pcpu_id;
     uint32_t pmu_flags;
 
-    xen_arch_pmu_t pmu;
+    xen_pmu_arch_t pmu;
 };
 typedef struct xen_pmu_data xen_pmu_data_t;
 
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index a6a2092..0766790 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index a9e5229..cf34547 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -14,6 +14,7 @@
 #include <public/event_channel.h>
 #include <public/tmem.h>
 #include <public/version.h>
+#include <public/pmu.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -139,6 +140,9 @@ do_tmem_op(
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+extern long
+do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index c8fafef..5809c60 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -101,6 +101,10 @@
 !	vcpu_set_singleshot_timer	vcpu.h
 ?	xenoprof_init			xenoprof.h
 ?	xenoprof_passive		xenoprof.h
+?	pmu_params			pmu.h
+?	pmu_intel_ctxt			arch-x86/pmu.h
+?	pmu_amd_ctxt			arch-x86/pmu.h
+?	pmu_cntr_pair			arch-x86/pmu.h
 ?	flask_access			xsm/flask_op.h
 !	flask_boolean			xsm/flask_op.h
 ?	flask_cache_stats		xsm/flask_op.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (10 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 11:59   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/hvm.c            |  3 +-
 xen/arch/x86/hvm/svm/svm.c        |  8 +++-
 xen/arch/x86/hvm/svm/vpmu.c       | 37 ++++++++++-------
 xen/arch/x86/hvm/vmx/vmx.c        |  9 ++++-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 69 +++++++++++++++++++++----------
 xen/arch/x86/hvm/vpmu.c           | 85 ++++++++++++++++++++++++++++++++++++++-
 xen/common/event_channel.c        |  1 +
 xen/include/asm-x86/hvm/vpmu.h    |  1 +
 xen/include/public/pmu.h          |  2 +
 xen/include/public/xen.h          |  1 +
 10 files changed, 174 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ef2411c..11da5fc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4768,7 +4768,8 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = {
     [ __HYPERVISOR_physdev_op ]      = (hvm_hypercall_t *)hvm_physdev_op,
     HYPERCALL(hvm_op),
     HYPERCALL(sysctl),
-    HYPERCALL(domctl)
+    HYPERCALL(domctl),
+    HYPERCALL(xenpmu_op)
 };
 
 int hvm_do_hypercall(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 76616ac..555e5f7 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1158,7 +1158,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     svm_guest_osvw_init(v);
 
@@ -1167,7 +1169,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
 
 static void svm_vcpu_destroy(struct vcpu *v)
 {
-    vpmu_destroy(v);
+    /* PVH's VPMU destroyed via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_destroy(v);
     svm_destroy_vmcb(v);
     passive_domain_destroy(v);
 }
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index dfd5759..6f21e39 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -386,14 +386,21 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
-                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
-    if ( !ctxt )
+    if ( is_hvm_domain(v->domain) )
     {
-        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-            " PMU feature is unavailable on domain %d vcpu %d.\n",
-            v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
+        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
+                             2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                " PMU feature is unavailable on domain %d vcpu %d.\n",
+                v->vcpu_id, v->domain->domain_id);
+            return -ENOMEM;
+        }
+    }
+    else
+    {
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
     }
 
     ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
@@ -412,17 +419,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
-        amd_vpmu_unset_msr_bitmap(v);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( is_msr_bitmap_on(vpmu) )
+            amd_vpmu_unset_msr_bitmap(v);
 
-    xfree(vpmu->context);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
 
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        vpmu_reset(vpmu, VPMU_RUNNING);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 22b3325..1533d98 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -116,7 +116,9 @@ static int vmx_vcpu_initialise(struct vcpu *v)
         return rc;
     }
 
-    vpmu_initialise(v);
+    /* PVH's VPMU is initialized via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_initialise(v);
 
     vmx_install_vlapic_mapping(v);
 
@@ -130,7 +132,10 @@ static int vmx_vcpu_initialise(struct vcpu *v)
 static void vmx_vcpu_destroy(struct vcpu *v)
 {
     vmx_destroy_vmcs(v);
-    vpmu_destroy(v);
+
+    /* PVH's VPMU is destroyed via hypercall */
+    if ( is_hvm_domain(v->domain) )
+        vpmu_destroy(v);
     passive_domain_destroy(v);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index f56672e..d964444 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -357,24 +357,36 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     struct xen_pmu_intel_ctxt *core2_vpmu_cxt = NULL;
     uint64_t *p = NULL;
 
-    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-        return 0;
-
-    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
+    p = xzalloc_bytes(sizeof(uint64_t));
+    if ( !p )
         goto out_err;
 
-    if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
-        goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
-                                   sizeof(uint64_t) * fixed_pmc_cnt +
-                                   sizeof(struct xen_pmu_cntr_pair) *
-                                   arch_pmc_cnt);
-    p = xzalloc(uint64_t);
-    if ( !core2_vpmu_cxt || !p )
-        goto out_err;
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR) )
+            goto out_err_hvm;
+        if ( vmx_add_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR) )
+            goto out_err_hvm;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+    }
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct xen_pmu_intel_ctxt) +
+                                       sizeof(uint64_t) * fixed_pmc_cnt +
+                                       sizeof(struct xen_pmu_cntr_pair) *
+                                       arch_pmc_cnt);
+        if ( !core2_vpmu_cxt )
+            goto out_err_hvm;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.c.intel;
+    }
 
     core2_vpmu_cxt->fixed_counters = sizeof(struct xen_pmu_intel_ctxt);
     core2_vpmu_cxt->arch_counters = core2_vpmu_cxt->fixed_counters +
@@ -387,7 +399,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 
     return 1;
 
-out_err:
+out_err_hvm:
     vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
     vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
     release_pmu_ownship(PMU_OWNER_HVM);
@@ -395,6 +407,7 @@ out_err:
     xfree(core2_vpmu_cxt);
     xfree(p);
 
+out_err:
     printk("Failed to allocate VPMU resources for domain %u vcpu %u\n",
            v->vcpu_id, v->domain->domain_id);
 
@@ -752,6 +765,10 @@ func_out:
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
     check_pmc_quirk();
 
+    /* PV domains can allocate resources immediately */
+    if ( is_pv_domain(v->domain) && !core2_vpmu_alloc_resource(v) )
+        return 1;
+
     return 0;
 }
 
@@ -762,12 +779,20 @@ static void core2_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    xfree(vpmu->context);
+    if ( has_hvm_container_domain(v->domain) )
+    {
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+        if ( is_hvm_domain(v->domain) )
+            xfree(vpmu->context);
+
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
     xfree(vpmu->priv_context);
-    if ( has_hvm_container_domain(v->domain) && cpu_has_vmx_msr_bitmap )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-    release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 struct arch_vpmu_ops core2_vpmu_ops = {
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 794c1c3..ab9c118 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -22,10 +22,13 @@
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
 #include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
 #include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
+#include <asm/p2m.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
@@ -252,6 +255,7 @@ void vpmu_initialise(struct vcpu *v)
         vpmu_destroy(v);
     vpmu_clear(vpmu);
     vpmu->context = NULL;
+    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
 
     switch ( vendor )
     {
@@ -278,7 +282,74 @@ void vpmu_destroy(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, v, 1);
+
         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
+}
+
+static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    struct page_info *page;
+    uint64_t gfn = params->val;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return -EINVAL;
+
+    page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( !get_page_type(page, PGT_writable_page) )
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+    if ( !v->arch.vpmu.xenpmu_data )
+    {
+        put_page_and_type(page);
+        return -EINVAL;
+    }
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
+         (d->vcpu[params->vcpu] == NULL) )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if ( v != current )
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page_and_type(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if ( v != current )
+        vcpu_unpause(v);
 }
 
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */
@@ -442,7 +513,19 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         ret = 0;
         break;
-     }
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+    }
 
     return ret;
 }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 6853842..7f9b8c9 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -108,6 +108,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_XENPMU:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 4a0410d..25954c6 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -61,6 +61,7 @@ struct vpmu_struct {
     void *context;      /* May be shared with PV guest */
     void *priv_context; /* hypervisor-only */
     struct arch_vpmu_ops *arch_vpmu_ops;
+    xen_pmu_data_t *xenpmu_data;
 };
 
 /* VPMU states */
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 0855005..3d55eb6 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -25,6 +25,8 @@
 #define XENPMU_mode_set        1
 #define XENPMU_feature_get     2
 #define XENPMU_feature_set     3
+#define XENPMU_init            4
+#define XENPMU_finish          5
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 0766790..e4d0b79 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
+#define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (11 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 12:45   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

This is done in preparation to subsequent PV patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/svm.c        |  6 ++--
 xen/arch/x86/hvm/svm/vpmu.c       |  6 ++--
 xen/arch/x86/hvm/vmx/vmx.c        | 26 +++++++++++-----
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 65 +++++++++++++++++----------------------
 4 files changed, 54 insertions(+), 49 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 555e5f7..da5af5c 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1642,7 +1642,8 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_rdmsr(msr, msr_content);
+        if ( vpmu_do_rdmsr(msr, msr_content) )
+            goto gpf;
         break;
 
     case MSR_AMD64_DR0_ADDRESS_MASK:
@@ -1793,7 +1794,8 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        vpmu_do_wrmsr(msr, msr_content);
+        if ( vpmu_do_wrmsr(msr, msr_content) )
+            goto gpf;
         break;
 
     case MSR_IA32_MCx_MISC(4): /* Threshold register */
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 6f21e39..07dccb6 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -303,7 +303,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
     {
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 1;
+            return 0;
         vpmu_set(vpmu, VPMU_RUNNING);
 
         if ( has_hvm_container_domain(v->domain) && is_msr_bitmap_on(vpmu) )
@@ -333,7 +333,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 
     /* Write to hw counters */
     wrmsrl(msr, msr_content);
-    return 1;
+    return 0;
 }
 
 static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -351,7 +351,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 
     rdmsrl(msr, *msr_content);
 
-    return 1;
+    return 0;
 }
 
 static int amd_vpmu_initialise(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1533d98..c8dbe80 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2078,12 +2078,17 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
                        MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
         /* Perhaps vpmu will change some bits. */
+        /* FALLTHROUGH */
+    case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+    case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_IA32_PEBS_ENABLE:
+    case MSR_IA32_DS_AREA:
         if ( vpmu_do_rdmsr(msr, msr_content) )
-            goto done;
+            goto gp_fault;
         break;
     default:
-        if ( vpmu_do_rdmsr(msr, msr_content) )
-            break;
         if ( passive_domain_do_rdmsr(msr, msr_content) )
             goto done;
         switch ( long_mode_do_msr_read(msr, msr_content) )
@@ -2256,8 +2261,8 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( msr_content & ~supported )
         {
             /* Perhaps some other bits are supported in vpmu. */
-            if ( !vpmu_do_wrmsr(msr, msr_content) )
-                break;
+            if ( vpmu_do_wrmsr(msr, msr_content) )
+                goto gp_fault;
         }
         if ( msr_content & IA32_DEBUGCTLMSR_LBR )
         {
@@ -2286,9 +2291,16 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( !nvmx_msr_write_intercept(msr, msr_content) )
             goto gp_fault;
         break;
-    default:
+    case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+    case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+    case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+    case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+    case MSR_IA32_PEBS_ENABLE:
+    case MSR_IA32_DS_AREA:
         if ( vpmu_do_wrmsr(msr, msr_content) )
-            return X86EMUL_OKAY;
+            goto gp_fault;
+        break;
+    default:
         if ( passive_domain_do_wrmsr(msr, msr_content) )
             return X86EMUL_OKAY;
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index d964444..60e0474 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -458,13 +458,13 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
                 supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
                              IA32_DEBUGCTLMSR_BTS_OFF_USR;
-            if ( msr_content & supported )
+
+            if ( !(msr_content & supported ) ||
+                 !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
             {
-                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                    return 1;
-                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
-                return 0;
+                gdprintk(XENLOG_WARNING,
+                         "Debug Store is not supported on this cpu\n");
+                return 1;
             }
         }
         return 0;
@@ -476,18 +476,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_status &= ~msr_content;
-        return 1;
+        return 0;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
                  "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        hvm_inject_hw_exception(TRAP_gp_fault, 0);
         return 1;
     case MSR_IA32_PEBS_ENABLE:
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, "
                      "which is not supported.\n");
         core2_vpmu_cxt->pebs_enable = msr_content;
-        return 1;
+        return 0;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
         {
@@ -496,14 +495,13 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
                 gdprintk(XENLOG_WARNING,
                          "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n",
                          msr_content);
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
             core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
-        return 1;
+        return 0;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
         break;
@@ -541,45 +539,43 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         }
     }
 
-    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
-        vpmu_set(vpmu, VPMU_RUNNING);
-    else
-        vpmu_reset(vpmu, VPMU_RUNNING);
-
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
-        int inject_gp = 0;
+
         switch ( type )
         {
         case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
             mask = ~((1ull << 32) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
+            if ( msr_content & mask )
+                return 1;
             break;
         case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
             if  ( msr == MSR_IA32_DS_AREA )
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
             mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
+            if ( msr_content & mask )
+                return 1;
             break;
         case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
             mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
+            if ( msr_content & mask )
+                return 1;
             break;
         }
-        if (inject_gp)
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
-        else
-            wrmsrl(msr, msr_content);
+
+        wrmsrl(msr, msr_content);
     }
     else
         vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
 
-    return 1;
+    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
+        vpmu_set(vpmu, VPMU_RUNNING);
+    else
+        vpmu_reset(vpmu, VPMU_RUNNING);
+
+    return 0;
 }
 
 static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -607,19 +603,14 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             rdmsrl(msr, *msr_content);
         }
     }
-    else
+    else if ( msr == MSR_IA32_MISC_ENABLE )
     {
         /* Extension for BTS */
-        if ( msr == MSR_IA32_MISC_ENABLE )
-        {
-            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-        }
-        else
-            return 0;
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
     }
 
-    return 1;
+    return 0;
 }
 
 static void core2_vpmu_do_cpuid(unsigned int input,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (12 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 12:55   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Intercept accesses to PMU MSRs and process them in VPMU module.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/domain.c             |  3 +--
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 45 ++++++++++++++++++++++++++++------
 xen/arch/x86/hvm/vpmu.c           |  7 ++++++
 xen/arch/x86/traps.c              | 51 +++++++++++++++++++++++++++++++++++++--
 xen/include/public/pmu.h          |  1 +
 5 files changed, 96 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 04bfbff..a810d1c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2012,8 +2012,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
     paging_dump_vcpu_info(v);
 
-    if ( is_hvm_vcpu(v) )
-        vpmu_dump(v);
+    vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 60e0474..08b9b8a 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/apic.h>
+#include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
 #include <asm/hvm/support.h>
@@ -294,12 +295,18 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
         rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+    if ( !has_hvm_container_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+    if ( !has_hvm_container_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
     if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
@@ -337,6 +344,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
     wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
     wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    if ( !has_hvm_container_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+        core2_vpmu_cxt->global_ovf_ctrl = 0;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -439,7 +453,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
 
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
-    u64 global_ctrl;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -476,6 +489,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_status &= ~msr_content;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
         return 0;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -503,10 +517,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
         return 0;
     case MSR_CORE_PERF_GLOBAL_CTRL:
-        global_ctrl = msr_content;
+        core2_vpmu_cxt->global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        if ( has_hvm_container_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                               &core2_vpmu_cxt->global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
         *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
         if ( msr_content != 0 )
         {
@@ -528,7 +546,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
             struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
                 vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
 
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                                   &core2_vpmu_cxt->global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
 
             if ( msr_content & (1ULL << 22) )
                 *enabled_cntrs |= 1ULL << tmp;
@@ -568,9 +590,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
         wrmsrl(msr, msr_content);
     }
     else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    {
+        if ( has_hvm_container_domain(v->domain) )
+            vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+        else
+            wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
 
-    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
+    if ( (core2_vpmu_cxt->global_ctrl & *enabled_cntrs) ||
+         (core2_vpmu_cxt->ds_area != 0) )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
@@ -597,7 +625,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
             *msr_content = core2_vpmu_cxt->global_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            if ( has_hvm_container_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
         default:
             rdmsrl(msr, *msr_content);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index ab9c118..399e195 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -525,6 +525,13 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
             return -EFAULT;
         pvpmu_finish(current->domain, &pmu_params);
         break;
+
+    case XENPMU_lvtpc_set:
+        if ( current->arch.vpmu.xenpmu_data == NULL )
+            return -EINVAL;
+        vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 677074b..f1830d5 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,6 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
+#include <asm/hvm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
@@ -891,8 +892,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
         __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
         break;
 
+    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
+        break;
+
     case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
     case 0x0000000b: /* Extended Topology Enumeration */
     case 0x8000000a: /* SVM revision and features */
     case 0x8000001b: /* Instruction Based Sampling */
@@ -908,6 +911,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
     }
 
  out:
+    /* VPMU may decide to modify some of the leaves */
+    vpmu_do_cpuid(regs->eax, &a, &b, &c, &d);
+
     regs->eax = a;
     regs->ebx = b;
     regs->ecx = c;
@@ -1930,6 +1936,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
     char io_emul_stub[32];
     void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1)));
     uint64_t val, msr_content;
+    bool_t vpmu_msr;
 
     if ( !read_descriptor(regs->cs, v, regs,
                           &code_base, &code_limit, &ar,
@@ -2420,6 +2427,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         uint32_t eax = regs->eax;
         uint32_t edx = regs->edx;
         msr_content = ((uint64_t)edx << 32) | eax;
+        vpmu_msr = 0;
         switch ( (u32)regs->ecx )
         {
         case MSR_FS_BASE:
@@ -2556,7 +2564,22 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
                 wrmsrl(regs->_ecx, msr_content);
             break;
-
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+                vpmu_msr = 1;
+            /* FALLTHROUGH */
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( vpmu_msr ||
+                 ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !vpmu_msr) )
+            {
+                if ( vpmu_do_wrmsr(regs->ecx, msr_content) )
+                    goto fail;
+                break;
+            }
+            /* FALLTHROUGH */
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
                 break;
@@ -2588,6 +2611,8 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         break;
 
     case 0x32: /* RDMSR */
+        vpmu_msr = 0;
+
         switch ( (u32)regs->ecx )
         {
         case MSR_FS_BASE:
@@ -2658,7 +2683,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                             [regs->_ecx - MSR_AMD64_DR1_ADDRESS_MASK + 1];
             regs->edx = 0;
             break;
+        case MSR_IA32_PERF_CAPABILITIES:
+            /* No extra capabilities are supported */
+            regs->eax = regs->edx = 0;
+            break;
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+                vpmu_msr = 1;
+            /* FALLTHROUGH */
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( vpmu_msr ||
+                 ((boot_cpu_data.x86_vendor != X86_VENDOR_AMD) && !vpmu_msr) )
+            {
+                if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
+                    goto fail;
 
+                regs->eax = (uint32_t)msr_content;
+                regs->edx = (uint32_t)(msr_content >> 32);
+                break;
+            }
+            /* FALLTHROUGH */
         default:
             if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
             {
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 3d55eb6..92f3683 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -27,6 +27,7 @@
 #define XENPMU_feature_set     3
 #define XENPMU_init            4
 #define XENPMU_finish          5
+#define XENPMU_lvtpc_set       6
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (13 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 12:58   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 16/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add support for handling PMU interrupts for PV guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vpmu.c  | 158 +++++++++++++++++++++++++++++++++++++++++++----
 xen/include/public/pmu.h |   7 +++
 2 files changed, 153 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 399e195..818f721 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -80,44 +80,167 @@ static void __init parse_vpmu_param(char *s)
 
 void vpmu_lvtpc_update(uint32_t val)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-    apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+    /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+    if ( is_hvm_domain(curr->domain) ||
+         !(vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED)) )
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
     if ( !(vpmu_mode & XENPMU_MODE_SELF) )
         return 0;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+
+        /*
+         * We may have received a PMU interrupt during WRMSR handling
+         * and since do_wrmsr may load VPMU context we should save
+         * (and unload) it again.
+         */
+        if ( !is_hvm_domain(curr->domain) &&
+             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+    struct vcpu *curr = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
     if ( !(vpmu_mode & XENPMU_MODE_SELF) )
         return 0;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    {
+        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+        if ( !is_hvm_domain(curr->domain) &&
+             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
+        {
+            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        }
+        return ret;
+    }
     return 0;
 }
 
+static struct vcpu *choose_hwdom_vcpu(void)
+{
+    struct vcpu *v;
+    unsigned idx = smp_processor_id() % hardware_domain->max_vcpus;
+
+    if ( hardware_domain->vcpu == NULL )
+        return NULL;
+
+    v = hardware_domain->vcpu[idx];
+
+    /*
+     * If index is not populated search downwards the vcpu array until
+     * a valid vcpu can be found
+     */
+    while ( !v && idx-- )
+        v = hardware_domain->vcpu[idx];
+
+    return v;
+}
+
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct vcpu *sampled = current, *sampling;
+    struct vpmu_struct *vpmu;
+
+    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
+    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+    {
+        sampling = choose_hwdom_vcpu();
+        if ( !sampling )
+            return 0;
+    }
+    else
+        sampling = sampled;
+
+    vpmu = vcpu_vpmu(sampling);
+    if ( !is_hvm_domain(sampling->domain) )
+    {
+        /* PV(H) guest or dom0 is doing system profiling */
+        const struct cpu_user_regs *gregs;
+
+        if ( !vpmu->xenpmu_data )
+            return 0;
+
+        if ( vpmu->xenpmu_data->pmu_flags & PMU_CACHED )
+            return 1;
+
+        if ( is_pvh_domain(sampled->domain) &&
+             !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+        vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+        /* Store appropriate registers in xenpmu_data */
+        if ( is_pv_32bit_domain(sampled->domain) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+             * and therefore we treat it the same way as a non-priviledged
+             * PV 32-bit domain.
+             */
+            struct compat_cpu_user_regs *cmp;
+
+            gregs = guest_cpu_user_regs();
+
+            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
+            XLAT_cpu_user_regs(cmp, gregs);
+        }
+        else if ( !is_hardware_domain(sampled->domain) &&
+                  !is_idle_vcpu(sampled) )
+        {
+            /* PV(H) guest */
+            gregs = guest_cpu_user_regs();
+            vpmu->xenpmu_data->pmu.r.regs = *gregs;
+        }
+        else
+            vpmu->xenpmu_data->pmu.r.regs = *regs;
+
+        vpmu->xenpmu_data->domain_id = sampled->domain->domain_id;
+        vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
+        vpmu->xenpmu_data->pcpu_id = smp_processor_id();
+
+        vpmu->xenpmu_data->pmu_flags |= PMU_CACHED;
+        apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
+
+        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
+
+        return 1;
+    }
 
     if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(v);
+        struct vlapic *vlapic = vcpu_vlapic(sampling);
         u32 vlapic_lvtpc;
         unsigned char int_vec;
 
@@ -131,9 +254,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
 
         if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+            vlapic_set_irq(vcpu_vlapic(sampling), int_vec, 0);
         else
-            v->nmi_pending = 1;
+            sampling->nmi_pending = 1;
         return 1;
     }
 
@@ -232,7 +355,9 @@ void vpmu_load(struct vcpu *v)
     local_irq_enable();
 
     /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+         (!is_hvm_domain(v->domain) &&
+          vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
         return;
 
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
@@ -436,6 +561,7 @@ vpmu_force_context_switch(XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 {
     int ret = -EINVAL;
+    struct vcpu *curr;
     xen_pmu_params_t pmu_params;
 
     switch ( op )
@@ -532,6 +658,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
         ret = 0;
         break;
+
+    case XENPMU_flush:
+        curr = current;
+        curr->arch.vpmu.xenpmu_data->pmu_flags &= ~PMU_CACHED;
+        vpmu_lvtpc_update(curr->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);
+        vpmu_load(curr);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 92f3683..44bf43f 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -28,6 +28,7 @@
 #define XENPMU_init            4
 #define XENPMU_finish          5
 #define XENPMU_lvtpc_set       6
+#define XENPMU_flush           7 /* Write cached MSR values to HW     */
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
@@ -59,6 +60,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  */
 #define XENPMU_FEATURE_INTEL_BTS  1
 
+/*
+ * PMU MSRs are cached in the context so the PV guest doesn't need to trap to
+ * the hypervisor
+ */
+#define PMU_CACHED 1
+
 /* Shared between hypervisor and PV domain */
 struct xen_pmu_data {
     uint32_t domain_id;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 16/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (14 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-08 16:55 ` [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

The two routines share most of their logic.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/svm.c     |  4 +--
 xen/arch/x86/hvm/vmx/vmx.c     |  6 ++--
 xen/arch/x86/hvm/vpmu.c        | 66 +++++++++++++++++++-----------------------
 xen/arch/x86/traps.c           |  4 +--
 xen/include/asm-x86/hvm/vpmu.h |  6 ++--
 5 files changed, 40 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index da5af5c..8935404 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1642,7 +1642,7 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        if ( vpmu_do_rdmsr(msr, msr_content) )
+        if ( vpmu_do_msr(msr, msr_content, VPMU_MSR_READ) )
             goto gpf;
         break;
 
@@ -1794,7 +1794,7 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_AMD_FAM15H_EVNTSEL3:
     case MSR_AMD_FAM15H_EVNTSEL4:
     case MSR_AMD_FAM15H_EVNTSEL5:
-        if ( vpmu_do_wrmsr(msr, msr_content) )
+        if ( vpmu_do_msr(msr, &msr_content, VPMU_MSR_WRITE) )
             goto gpf;
         break;
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index c8dbe80..7c11550 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2085,7 +2085,7 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
     case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
     case MSR_IA32_PEBS_ENABLE:
     case MSR_IA32_DS_AREA:
-        if ( vpmu_do_rdmsr(msr, msr_content) )
+        if ( vpmu_do_msr(msr, msr_content, VPMU_MSR_READ) )
             goto gp_fault;
         break;
     default:
@@ -2261,7 +2261,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( msr_content & ~supported )
         {
             /* Perhaps some other bits are supported in vpmu. */
-            if ( vpmu_do_wrmsr(msr, msr_content) )
+            if ( vpmu_do_msr(msr, &msr_content, VPMU_MSR_WRITE) )
                 goto gp_fault;
         }
         if ( msr_content & IA32_DEBUGCTLMSR_LBR )
@@ -2297,7 +2297,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
     case MSR_IA32_PEBS_ENABLE:
     case MSR_IA32_DS_AREA:
-        if ( vpmu_do_wrmsr(msr, msr_content) )
+        if ( vpmu_do_msr(msr, &msr_content, VPMU_MSR_WRITE) )
             goto gp_fault;
         break;
     default:
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 818f721..c9cf6c0 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -91,57 +91,49 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, int rw)
 {
     struct vcpu *curr = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(curr);
+    struct arch_vpmu_ops *ops = vpmu->arch_vpmu_ops;
+    int ret = 0;
 
     if ( !(vpmu_mode & XENPMU_MODE_SELF) )
         return 0;
 
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
+    switch ( rw )
     {
-        int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
-
-        /*
-         * We may have received a PMU interrupt during WRMSR handling
-         * and since do_wrmsr may load VPMU context we should save
-         * (and unload) it again.
-         */
-        if ( !is_hvm_domain(curr->domain) &&
-             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
-    }
-    return 0;
-}
+    case VPMU_MSR_READ:
+        if ( !ops || !ops->do_rdmsr )
+            return 0;
+        ret = ops->do_rdmsr(msr, msr_content);
+        break;
 
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *curr = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(curr);
+    case VPMU_MSR_WRITE:
+        if ( !ops || !ops->do_wrmsr )
+            return 0;
+        ret = ops->do_wrmsr(msr, *msr_content);
+        break;
 
-    if ( !(vpmu_mode & XENPMU_MODE_SELF) )
+    default:
+        ASSERT(0);
         return 0;
+    }
 
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
+    /*
+     * We may have received a PMU interrupt while handling MSR access
+     * and since do_wr/rdmsr may load VPMU context we should save
+     * (and unload) it again.
+     */
+    if ( !is_hvm_domain(curr->domain) &&
+         vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
     {
-        int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
-
-        if ( !is_hvm_domain(curr->domain) &&
-             vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-            vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-            vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-        }
-        return ret;
+        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+        ops->arch_vpmu_save(curr);
+        vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
     }
-    return 0;
+
+    return ret;
 }
 
 static struct vcpu *choose_hwdom_vcpu(void)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index f1830d5..4c4292b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2575,7 +2575,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( vpmu_msr ||
                  ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !vpmu_msr) )
             {
-                if ( vpmu_do_wrmsr(regs->ecx, msr_content) )
+                if ( vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_WRITE) )
                     goto fail;
                 break;
             }
@@ -2698,7 +2698,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( vpmu_msr ||
                  ((boot_cpu_data.x86_vendor != X86_VENDOR_AMD) && !vpmu_msr) )
             {
-                if ( vpmu_do_rdmsr(regs->ecx, &msr_content) )
+                if ( vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_READ) )
                     goto fail;
 
                 regs->eax = (uint32_t)msr_content;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 25954c6..429ab27 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -94,9 +94,11 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
     return !!((vpmu->flags & mask) == mask);
 }
 
+#define VPMU_MSR_READ  0
+#define VPMU_MSR_WRITE 1
+
 void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, int rw);
 int vpmu_do_interrupt(struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                        unsigned int *ecx, unsigned int *edx);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (15 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 16/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 13:06   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add support for privileged PMU mode which allows privileged domain (dom0)
profile both itself (and the hypervisor) and the guests. While this mode is on
profiling in guests is disabled.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/vpmu.c  | 92 +++++++++++++++++++++++++++++++++++-------------
 xen/arch/x86/traps.c     | 11 ++++++
 xen/include/public/pmu.h |  3 ++
 3 files changed, 81 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index c9cf6c0..5b1584a 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -98,7 +98,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, int rw)
     struct arch_vpmu_ops *ops = vpmu->arch_vpmu_ops;
     int ret = 0;
 
-    if ( !(vpmu_mode & XENPMU_MODE_SELF) )
+    if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+         ((vpmu_mode & XENPMU_MODE_ALL) &&
+          !is_hardware_domain(current->domain)) )
         return 0;
 
     switch ( rw )
@@ -161,8 +163,12 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
     struct vcpu *sampled = current, *sampling;
     struct vpmu_struct *vpmu;
 
-    /* dom0 will handle interrupt for special domains (e.g. idle domain) */
-    if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+    /*
+     * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+     * in XENPMU_MODE_ALL, for everyone.
+     */
+    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
     {
         sampling = choose_hwdom_vcpu();
         if ( !sampling )
@@ -172,7 +178,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         sampling = sampled;
 
     vpmu = vcpu_vpmu(sampling);
-    if ( !is_hvm_domain(sampling->domain) )
+    if ( !is_hvm_domain(sampling->domain) || (vpmu_mode & XENPMU_MODE_ALL) )
     {
         /* PV(H) guest or dom0 is doing system profiling */
         const struct cpu_user_regs *gregs;
@@ -184,6 +190,7 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             return 1;
 
         if ( is_pvh_domain(sampled->domain) &&
+             !(vpmu_mode & XENPMU_MODE_ALL) &&
              !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
@@ -192,32 +199,65 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
         vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
 
-        /* Store appropriate registers in xenpmu_data */
-        if ( is_pv_32bit_domain(sampled->domain) )
+        if ( !is_hvm_domain(sampled->domain) )
         {
-            /*
-             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
-             * and therefore we treat it the same way as a non-priviledged
-             * PV 32-bit domain.
-             */
-            struct compat_cpu_user_regs *cmp;
-
-            gregs = guest_cpu_user_regs();
-
-            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
-            XLAT_cpu_user_regs(cmp, gregs);
+            /* Store appropriate registers in xenpmu_data */
+            if ( is_pv_32bit_domain(sampled->domain) )
+            {
+                /*
+                 * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
+                 * and therefore we treat it the same way as a non-priviledged
+                 * PV 32-bit domain.
+                 */
+                struct compat_cpu_user_regs *cmp;
+
+                gregs = guest_cpu_user_regs();
+
+                cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
+                XLAT_cpu_user_regs(cmp, gregs);
+
+                /* Adjust RPL for kernel mode */
+                if ((cmp->cs & 3) == 1)
+                    cmp->cs &= ~3;
+            }
+            else if ( !is_hardware_domain(sampled->domain) &&
+                      !is_idle_vcpu(sampled) )
+            {
+                /* 64-bit PV(H) unprivileged guest */
+                gregs = guest_cpu_user_regs();
+                vpmu->xenpmu_data->pmu.r.regs = *gregs;
+            }
+            else
+                vpmu->xenpmu_data->pmu.r.regs = *regs;
+
+            if ( !is_pvh_domain(sampled->domain) )
+            {
+                if ( sampled->arch.flags & TF_kernel_mode )
+                    sampling->arch.vpmu.xenpmu_data->pmu.r.regs.cs &= ~3;
+            }
+            else
+            {
+                struct segment_register seg_cs;
+
+                hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+                sampling->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+            }
         }
-        else if ( !is_hardware_domain(sampled->domain) &&
-                  !is_idle_vcpu(sampled) )
+        else
         {
-            /* PV(H) guest */
+            /* HVM guest */
+            struct segment_register seg_cs;
+
             gregs = guest_cpu_user_regs();
             vpmu->xenpmu_data->pmu.r.regs = *gregs;
+
+            hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+            sampling->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
         }
-        else
-            vpmu->xenpmu_data->pmu.r.regs = *regs;
 
-        vpmu->xenpmu_data->domain_id = sampled->domain->domain_id;
+        vpmu->xenpmu_data->domain_id = (sampled == sampling) ?
+                                       DOMID_SELF :
+                                       sampled->domain->domain_id;
         vpmu->xenpmu_data->vcpu_id = sampled->vcpu_id;
         vpmu->xenpmu_data->pcpu_id = smp_processor_id();
 
@@ -569,7 +609,9 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         if ( copy_from_guest(&pmu_params, arg, 1) )
             return -EFAULT;
 
-        if ( pmu_params.val & ~XENPMU_MODE_SELF )
+        if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_ALL)) ||
+             ((pmu_params.val & XENPMU_MODE_SELF) &&
+              (pmu_params.val & XENPMU_MODE_ALL)) )
             return -EINVAL;
 
         /*
@@ -583,7 +625,7 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
         current_mode = vpmu_mode;
         vpmu_mode = pmu_params.val;
 
-        if ( vpmu_mode == XENPMU_MODE_OFF )
+        if ( (vpmu_mode == XENPMU_MODE_OFF) || (vpmu_mode == XENPMU_MODE_ALL) )
         {
             /*
              * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 4c4292b..c9b6ab8 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2575,6 +2575,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( vpmu_msr ||
                  ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !vpmu_msr) )
             {
+                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                     !is_hardware_domain(v->domain) )
+                    break;
+
                 if ( vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_WRITE) )
                     goto fail;
                 break;
@@ -2698,6 +2702,13 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
             if ( vpmu_msr ||
                  ((boot_cpu_data.x86_vendor != X86_VENDOR_AMD) && !vpmu_msr) )
             {
+                if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+                     !is_hardware_domain(v->domain) )
+                {
+                    /* Don't leak PMU MSRs to unprivileged domains */
+                    regs->eax = regs->edx = 0;
+                    break;
+                }
                 if ( vpmu_do_msr(regs->ecx, &msr_content, VPMU_MSR_READ) )
                     goto fail;
 
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 44bf43f..3023e52 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -50,9 +50,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
  * - XENPMU_MODE_OFF:   No PMU virtualization
  * - XENPMU_MODE_SELF:  Guests can profile themselves, dom0 profiles
  *                      itself and Xen
+ * - XENPMU_MODE_ALL:   Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
  */
 #define XENPMU_MODE_OFF           0
 #define XENPMU_MODE_SELF          (1<<0)
+#define XENPMU_MODE_ALL           (1<<1)
 
 /*
  * PMU features:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (16 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 14:15   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Save VPMU state during context switch for both HVM and PV guests unless we
are in PMU privileged mode (i.e. dom0 is doing all profiling).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/domain.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a810d1c..47469ec 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1478,17 +1478,15 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     }
 
     if ( prev != next )
-        _update_runstate_area(prev);
-
-    if ( is_hvm_vcpu(prev) )
     {
-        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
+        _update_runstate_area(prev);
+        if ( vpmu_mode & XENPMU_MODE_SELF )
             vpmu_save(prev);
-
-        if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
-            pt_save_timer(prev);
     }
 
+    if ( is_hvm_vcpu(prev) &&  !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+        pt_save_timer(prev);
+
     local_irq_disable();
 
     set_current(next);
@@ -1526,12 +1524,12 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            !is_hardware_domain(next->domain));
     }
 
-    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
+    context_saved(prev);
+
+    if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
         /* Must be done with interrupts enabled */
         vpmu_load(next);
 
-    context_saved(prev);
-
     if ( prev != next )
         _update_runstate_area(next);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (17 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 14:24   ` Jan Beulich
  2014-08-08 16:55 ` [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
  2014-08-11 13:32 ` [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Jan Beulich
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Add support for using NMIs as PMU interrupts.

Most of processing is still performed by vpmu_do_interrupt(). However, since
certain operations are not NMI-safe we defer them to a softint that vpmu_do_interrupt()
will schedule:
* For PV guests that would be send_guest_vcpu_virq()
* For HVM guests it's VLAPIC accesses and hvm_get_segment_register() (the later
can be called in privileged profiling mode when the interrupted guest is an HVM one).

With send_guest_vcpu_virq() and hvm_get_segment_register() for PV(H) and vlapic
accesses for HVM moved to sofint, the only routines/macros that vpmu_do_interrupt()
calls in NMI mode are:
* memcpy()
* querying domain type (is_XX_domain())
* guest_cpu_user_regs()
* XLAT_cpu_user_regs()
* raise_softirq()
* vcpu_vpmu()
* vpmu_ops->arch_vpmu_save()
* vpmu_ops->do_interrupt() (in the future for PVH support)

The latter two only access PMU MSRs with {rd,wr}msrl() (not the _safe versions
which would not be NMI-safe).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevint.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |   3 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c |   3 +-
 xen/arch/x86/hvm/vpmu.c           | 192 +++++++++++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/vpmu.h    |   4 +-
 xen/include/xen/softirq.h         |   1 +
 5 files changed, 164 insertions(+), 39 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 07dccb6..e2446a9 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -169,7 +169,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
     msr_bitmap_off(vpmu);
 }
 
-static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+static int amd_vpmu_do_interrupt(const struct cpu_user_regs *regs)
 {
     return 1;
 }
@@ -224,6 +224,7 @@ static inline void context_save(struct vcpu *v)
         rdmsrl(counters[i], counter_regs[i]);
 }
 
+/* Must be NMI-safe */
 static int amd_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 08b9b8a..e8c91eb 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -300,6 +300,7 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
+/* Must be NMI-safe */
 static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
@@ -710,7 +711,7 @@ static void core2_vpmu_dump(const struct vcpu *v)
     }
 }
 
-static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+static int core2_vpmu_do_interrupt(const struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
     u64 msr_content;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 5b1584a..3d48a29 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -36,6 +36,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 #include <public/pmu.h>
 #include <xen/tasklet.h>
 
@@ -55,27 +56,53 @@ uint64_t __read_mostly vpmu_features = 0;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
+static void pmu_softnmi(void);
+
 static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+static DEFINE_PER_CPU(struct vcpu *, sampled_vcpu);
+
+static uint32_t __read_mostly vpmu_interrupt_type = PMU_APIC_VECTOR;
 
 static void __init parse_vpmu_param(char *s)
 {
-    switch ( parse_bool(s) )
-    {
-    case 0:
-        break;
-    default:
-        if ( !strcmp(s, "bts") )
-            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
-        else if ( *s )
+    char *ss;
+
+    vpmu_mode = XENPMU_MODE_SELF;
+    if (*s == '\0')
+        return;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        switch  ( parse_bool(s) )
         {
-            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+        case 0:
+            vpmu_mode = XENPMU_MODE_OFF;
+            return;
+        case -1:
+            if ( !strcmp(s, "nmi") )
+                vpmu_interrupt_type = APIC_DM_NMI;
+            else if ( !strcmp(s, "bts") )
+                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
+            else if ( !strcmp(s, "all") )
+            {
+                vpmu_mode &= ~XENPMU_MODE_SELF;
+                vpmu_mode |= XENPMU_MODE_ALL;
+            }
+            else
+            {
+                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+                vpmu_mode = XENPMU_MODE_OFF;
+                return;
+            }
+        default:
             break;
         }
-        /* fall through */
-    case 1:
-        vpmu_mode = XENPMU_MODE_SELF;
-        break;
-    }
+
+        s = ss + 1;
+    } while ( ss );
 }
 
 void vpmu_lvtpc_update(uint32_t val)
@@ -83,7 +110,7 @@ void vpmu_lvtpc_update(uint32_t val)
     struct vcpu *curr = current;
     struct vpmu_struct *vpmu = vcpu_vpmu(curr);
 
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | (val & APIC_LVT_MASKED);
 
     /* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
     if ( is_hvm_domain(curr->domain) ||
@@ -91,6 +118,24 @@ void vpmu_lvtpc_update(uint32_t val)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
+static void vpmu_send_interrupt(struct vcpu *v)
+{
+    struct vlapic *vlapic;
+    u32 vlapic_lvtpc;
+
+    ASSERT( is_hvm_vcpu(v) );
+
+    vlapic = vcpu_vlapic(v);
+    if ( !is_vlapic_lvtpc_enabled(vlapic) )
+        return;
+
+    vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+    if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+        vlapic_set_irq(vcpu_vlapic(v), vlapic_lvtpc & APIC_VECTOR_MASK, 0);
+    else
+        v->nmi_pending = 1;
+}
+
 int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, int rw)
 {
     struct vcpu *curr = current;
@@ -158,7 +203,8 @@ static struct vcpu *choose_hwdom_vcpu(void)
     return v;
 }
 
-int vpmu_do_interrupt(struct cpu_user_regs *regs)
+/* This routine may be called in NMI context */
+int vpmu_do_interrupt(const struct cpu_user_regs *regs)
 {
     struct vcpu *sampled = current, *sampling;
     struct vpmu_struct *vpmu;
@@ -235,8 +281,9 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
                 if ( sampled->arch.flags & TF_kernel_mode )
                     sampling->arch.vpmu.xenpmu_data->pmu.r.regs.cs &= ~3;
             }
-            else
+            else if ( !(vpmu_interrupt_type & APIC_DM_NMI) )
             {
+                /* Unsafe in NMI context, defer to softint later */
                 struct segment_register seg_cs;
 
                 hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
@@ -251,8 +298,12 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
             gregs = guest_cpu_user_regs();
             vpmu->xenpmu_data->pmu.r.regs = *gregs;
 
-            hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
-            sampling->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+            /* This is unsafe in NMI context, we'll do it in softint handler */
+            if ( !(vpmu_interrupt_type & APIC_DM_NMI ) )
+            {
+                hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+                sampling->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+            }
         }
 
         vpmu->xenpmu_data->domain_id = (sampled == sampling) ?
@@ -265,30 +316,30 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
         apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
 
-        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = sampled;
+            raise_softirq(PMU_SOFTIRQ);
+        }
+        else
+            send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
 
         return 1;
     }
 
     if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(sampling);
-        u32 vlapic_lvtpc;
-        unsigned char int_vec;
-
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
-        if ( !is_vlapic_lvtpc_enabled(vlapic) )
-            return 1;
-
-        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-        int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
-
-        if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(sampling), int_vec, 0);
+        if ( vpmu_interrupt_type & APIC_DM_NMI )
+        {
+            per_cpu(sampled_vcpu, smp_processor_id()) = sampled;
+            raise_softirq(PMU_SOFTIRQ);
+        }
         else
-            sampling->nmi_pending = 1;
+            vpmu_send_interrupt(sampling);
+
         return 1;
     }
 
@@ -321,6 +372,8 @@ static void vpmu_save_force(void *arg)
     vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
+
+    pmu_softnmi();
 }
 
 void vpmu_save(struct vcpu *v)
@@ -338,7 +391,10 @@ void vpmu_save(struct vcpu *v)
         if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, vpmu_interrupt_type | APIC_LVT_MASKED);
+
+    /* Make sure there are no outstanding PMU NMIs */
+    pmu_softnmi();
 }
 
 void vpmu_load(struct vcpu *v)
@@ -392,6 +448,8 @@ void vpmu_load(struct vcpu *v)
           vpmu->xenpmu_data->pmu_flags & PMU_CACHED) )
         return;
 
+    pmu_softnmi();
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load )
     {
         apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
@@ -412,7 +470,7 @@ void vpmu_initialise(struct vcpu *v)
         vpmu_destroy(v);
     vpmu_clear(vpmu);
     vpmu->context = NULL;
-    vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
+    vpmu->hw_lapic_lvtpc = vpmu_interrupt_type | APIC_LVT_MASKED;
 
     switch ( vendor )
     {
@@ -448,11 +506,55 @@ void vpmu_destroy(struct vcpu *v)
     }
 }
 
+/* Process the softirq set by PMU NMI handler */
+static void pmu_softnmi(void)
+{
+    struct vcpu *v, *sampled = this_cpu(sampled_vcpu);
+
+    if ( sampled == NULL )
+        return;
+    this_cpu(sampled_vcpu) = NULL;
+
+    if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+         (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+            v = choose_hwdom_vcpu();
+            if ( !v )
+                return;
+    }
+    else
+    {
+        if ( is_hvm_domain(sampled->domain) )
+        {
+            vpmu_send_interrupt(sampled);
+            return;
+        }
+        v = sampled;
+    }
+
+    if ( has_hvm_container_domain(sampled->domain) )
+    {
+        struct segment_register seg_cs;
+
+        hvm_get_segment_register(sampled, x86_seg_cs, &seg_cs);
+        v->arch.vpmu.xenpmu_data->pmu.r.regs.cs = seg_cs.sel;
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+int pmu_nmi_interrupt(const struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
 static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
 {
     struct vcpu *v;
     struct page_info *page;
     uint64_t gfn = params->val;
+    static bool_t __read_mostly pvpmu_initted = 0;
+    static DEFINE_SPINLOCK(init_lock);
 
     if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
          (d->vcpu[params->vcpu] == NULL) )
@@ -476,6 +578,26 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
         return -EINVAL;
     }
 
+    spin_lock(&init_lock);
+
+    if ( !pvpmu_initted )
+    {
+        if ( reserve_lapic_nmi() == 0 )
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            spin_unlock(&init_lock);
+            printk(XENLOG_G_ERR "Failed to reserve PMU NMI\n");
+            put_page(page);
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
+
+        pvpmu_initted = 1;
+    }
+
+    spin_unlock(&init_lock);
+
     vpmu_initialise(v);
 
     return 0;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 429ab27..ec4ffde 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -41,7 +41,7 @@
 struct arch_vpmu_ops {
     int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
     int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-    int (*do_interrupt)(struct cpu_user_regs *regs);
+    int (*do_interrupt)(const struct cpu_user_regs *regs);
     void (*do_cpuid)(unsigned int input,
                      unsigned int *eax, unsigned int *ebx,
                      unsigned int *ecx, unsigned int *edx);
@@ -99,7 +99,7 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu,
 
 void vpmu_lvtpc_update(uint32_t val);
 int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, int rw);
-int vpmu_do_interrupt(struct cpu_user_regs *regs);
+int vpmu_do_interrupt(const struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                        unsigned int *ecx, unsigned int *edx);
 void vpmu_initialise(struct vcpu *v);
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index 0c0d481..5829fa4 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -8,6 +8,7 @@ enum {
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
     TASKLET_SOFTIRQ,
+    PMU_SOFTIRQ,
     NR_COMMON_SOFTIRQS
 };
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (18 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-08-08 16:55 ` Boris Ostrovsky
  2014-08-12 14:26   ` Jan Beulich
  2014-08-11 13:32 ` [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Jan Beulich
  20 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-08 16:55 UTC (permalink / raw)
  To: jbeulich, kevin.tian, suravee.suthikulpanit
  Cc: keir, andrew.cooper3, tim, xen-devel, jun.nakajima, boris.ostrovsky

Since PMU is now not HVM specific we can move VPMU-related files up from
arch/x86/hvm/ directory.

Specifically:
    arch/x86/hvm/vpmu.c -> arch/x86/vpmu.c
    arch/x86/hvm/svm/vpmu.c -> arch/x86/vpmu_amd.c
    arch/x86/hvm/vmx/vpmu_core2.c -> arch/x86/vpmu_intel.c
    include/asm-x86/hvm/vpmu.h -> include/asm-x86/vpmu.h

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
---
 xen/arch/x86/Makefile                               | 1 +
 xen/arch/x86/hvm/Makefile                           | 1 -
 xen/arch/x86/hvm/svm/Makefile                       | 1 -
 xen/arch/x86/hvm/vlapic.c                           | 2 +-
 xen/arch/x86/hvm/vmx/Makefile                       | 1 -
 xen/arch/x86/oprofile/op_model_ppro.c               | 2 +-
 xen/arch/x86/traps.c                                | 2 +-
 xen/arch/x86/{hvm => }/vpmu.c                       | 4 ++--
 xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c}         | 2 +-
 xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} | 2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h                  | 2 +-
 xen/include/asm-x86/{hvm => }/vpmu.h                | 0
 12 files changed, 9 insertions(+), 11 deletions(-)
 rename xen/arch/x86/{hvm => }/vpmu.c (99%)
 rename xen/arch/x86/{hvm/svm/vpmu.c => vpmu_amd.c} (99%)
 rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => vpmu_intel.c} (99%)
 rename xen/include/asm-x86/{hvm => }/vpmu.h (100%)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index c1e244d..e985cfc 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -59,6 +59,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += vpmu.o vpmu_amd.o vpmu_intel.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..742b83b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,3 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
index a10a55e..760d295 100644
--- a/xen/arch/x86/hvm/svm/Makefile
+++ b/xen/arch/x86/hvm/svm/Makefile
@@ -6,4 +6,3 @@ obj-y += nestedsvm.o
 obj-y += svm.o
 obj-y += svmdebug.o
 obj-y += vmcb.o
-obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index f1b543c..b3fe4d3 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -38,7 +38,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/nestedhvm.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..04a29ce 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -3,5 +3,4 @@ obj-y += intr.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
-obj-y += vpmu_core2.o
 obj-y += vvmx.o
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c
index 9e3510a..8f14d3e 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -19,7 +19,7 @@
 #include <asm/processor.h>
 #include <asm/regs.h>
 #include <asm/current.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index c9b6ab8..4f36a55 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -72,7 +72,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/vpmu.c
similarity index 99%
rename from xen/arch/x86/hvm/vpmu.c
rename to xen/arch/x86/vpmu.c
index 3d48a29..c84dada 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/vpmu.c
@@ -32,7 +32,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vmcs.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
@@ -587,7 +587,7 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
         else
         {
             spin_unlock(&init_lock);
-            printk(XENLOG_G_ERR "Failed to reserve PMU NMI\n");
+            printk("XENLOG_G_ERR Failed to reserve PMU NMI\n");
             put_page(page);
             return -EBUSY;
         }
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/vpmu_amd.c
similarity index 99%
rename from xen/arch/x86/hvm/svm/vpmu.c
rename to xen/arch/x86/vpmu_amd.c
index e2446a9..89943c8 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/vpmu_amd.c
@@ -29,7 +29,7 @@
 #include <xen/irq.h>
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/pmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/vpmu_intel.c
similarity index 99%
rename from xen/arch/x86/hvm/vmx/vpmu_core2.c
rename to xen/arch/x86/vpmu_intel.c
index e8c91eb..cea0022 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/vpmu_intel.c
@@ -37,7 +37,7 @@
 #include <public/sched.h>
 #include <public/hvm/save.h>
 #include <public/pmu.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 80bc998..15b37f6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -20,7 +20,7 @@
 #define __ASM_X86_HVM_VMX_VMCS_H__
 
 #include <asm/hvm/io.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <irq_vectors.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/vpmu.h
similarity index 100%
rename from xen/include/asm-x86/hvm/vpmu.h
rename to xen/include/asm-x86/vpmu.h
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  2014-08-08 16:55 ` [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
@ 2014-08-11 13:28   ` Jan Beulich
  2014-08-11 15:35     ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-11 13:28 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> vpmu_load() may leave VPMU_CONTEXT_SAVE set after calling vpmu_save_force() if
> the context is not loaded.

To me this is ambiguous (and looking at the patch I can't really
resolve it): Do you mean to say this can validly happen, or that it
could have happened in error prior to the patch? In any event
please alter the description to either clearly state what the wrong
old behavior was, or what the correct new behavior is to be.

Jan

> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/hvm/vpmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 63765fa..26e971b 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -130,6 +130,8 @@ static void vpmu_save_force(void *arg)
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>          return;
>  
> +    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
> +
>      if ( vpmu->arch_vpmu_ops )
>          (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
>  
> @@ -178,7 +180,6 @@ void vpmu_load(struct vcpu *v)
>           */
>          if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>          {
> -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>              on_selected_cpus(cpumask_of(vpmu->last_pcpu),
>                               vpmu_save_force, (void *)v, 1);
>              vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
> @@ -195,7 +196,6 @@ void vpmu_load(struct vcpu *v)
>          vpmu = vcpu_vpmu(prev);
>  
>          /* Someone ran here before us */
> -        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>          vpmu_save_force(prev);
>          vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>  
> -- 
> 1.8.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support
  2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
                   ` (19 preceding siblings ...)
  2014-08-08 16:55 ` [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
@ 2014-08-11 13:32 ` Jan Beulich
  20 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-11 13:32 UTC (permalink / raw)
  To: suravee.suthikulpanit, kevin.tian, Boris Ostrovsky
  Cc: andrew.cooper3, tim, keir, jun.nakajima, xen-devel

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> Here is the ninth version of PV(H) PMU patches.
> 
> Changes in v9:
> 
> * Restore VPMU context after context_saved() is called in
>   context_switch(). This is needed because vpmu_load() may end up
>   calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running
>   to be correctly set/cleared. (patch 18, dropped review acks)
> * Added patch 2 to properly manage VPMU_CONTEXT_LOADED
> * Addressed most of Jan's comments.
>   o Keep track of time in vpmu_force_context_switch() to properly break
>     out of a loop when using hypercall continuations
>   o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op()
>   o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully)
>     make it more clear which vcpu we are using
>   o Cleaned up vpmu_do_wrmsr()
>   o Did *not* replace sizeof(uint64_t) with sizeof(variable) in
>     amd_vpmu_initialise(): throughout the code registers are declared as
>     uint64_t and if we are to add a new type (e.g. reg_t) this should be
>     done in a separate patch, unrelated to this series.

Suspicious, but I hopefully won't forget to comment on this when
reaching the actual patch.

In any event, I think you continue to send this series to an
incomplete list of people (possibly because you didn't update it
after the initial version): Both SVM and VMX have one more
maintainer than you Cc (Aravind and Eddie respectively; oddly
enough you send the patch directly to two of the others).

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code
  2014-08-08 16:55 ` [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
@ 2014-08-11 13:45   ` Jan Beulich
  2014-08-11 16:01     ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-11 13:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1208,6 +1208,32 @@ int vmx_add_guest_msr(u32 msr)
>      return 0;
>  }
>  
> +void vmx_rm_guest_msr(u32 msr)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
> +
> +    if ( msr_area == NULL )
> +        return;
> +
> +    for ( idx = 0; idx < msr_count; idx++ )
> +        if ( msr_area[idx].index == msr )
> +            break;
> +
> +    if ( idx == msr_count )
> +        return;

This absence check (producing success) together with the presence
check in the corresponding "add" function (also producing success) is
certainly bogus without reference counting: Two independent callers
of "add" would each validly assume they ought to call "rm" when done
with their job, taking away the control over the MSR from the other.
Yet if you don't think of independent callers, these presence/absence
checks don't make sense at all.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses
  2014-08-08 16:55 ` [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
@ 2014-08-11 13:49   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-11 13:49 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> @@ -734,8 +735,10 @@ static int vlapic_reg_write(struct vcpu *v,
>              vlapic_adjust_i8259_target(v->domain);
>              pt_may_unmask_irq(v->domain, NULL);
>          }
> -        if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
> +        else if ( (offset == APIC_LVTT) && !(val & APIC_LVT_MASKED) )
>              pt_may_unmask_irq(NULL, &vlapic->pt);
> +        else if ( offset == APIC_LVTPC )
> +            vpmu_lvtpc_update(val);

Each time I review a version of this series I find myself verifying that
the introduction of the "else" that wasn't previously there is benign.
It is, yes, but the patch also doesn't win anything from that change.
Hence please either drop _both_ "else"-s, or convert the whole
construct to a small switch().

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h
  2014-08-08 16:55 ` [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
@ 2014-08-11 14:08   ` Jan Beulich
  2014-08-11 16:15     ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-11 14:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>  static int amd_vpmu_initialise(struct vcpu *v)
>  {
> -    struct amd_vpmu_context *ctxt;
> +    struct xen_pmu_amd_ctxt *ctxt;
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t family = current_cpu_data.x86;
>  
> @@ -382,7 +386,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
>  
> -    ctxt = xzalloc(struct amd_vpmu_context);
> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);

So I guess this is one of said sizeof()-s, and I continue to think this
would benefit from using a proper field. It is my understanding that
the real (non-C89 compliant) declaration of struct xen_pmu_amd_ctxt
would be

struct xen_pmu_amd_ctxt {
    /* Offsets to counter and control MSRs (relative to xen_arch_pmu.c.amd) */
    uint32_t counters;
    uint32_t ctrls;
    uint64_t values[];
};

Just like already done elsewhere you _can_ add variable-sized
arrays to the public interface as long as you guard them suitably.
And then your statement would become

    ctxt = xzalloc_bytes(sizeof(*ctxt) +
                         2 * sizeof(*ctxt->values) * AMD_MAX_COUNTERS);

making clear what the allocation is doing.

> @@ -391,7 +396,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>          return -ENOMEM;
>      }
>  
> +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;

    ctxt->counters = sizeof(*ctxt);
    ctxt->ctrls = ctxt->counters + sizeof(*ctxt->values) * AMD_MAX_COUNTERS;

or

    ctxt->counters = offsetof(typeof(ctxt), values[0]);
    ctxt->ctrls = offsetof(typeof(ctxt), values[AMD_MAX_COUNTERS]);

> @@ -228,6 +229,9 @@ void vpmu_initialise(struct vcpu *v)
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      uint8_t vendor = current_cpu_data.x86_vendor;
>  
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);

Don't these need to include the variable size array portions too?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force()
  2014-08-11 13:28   ` Jan Beulich
@ 2014-08-11 15:35     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-11 15:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/11/2014 09:28 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> vpmu_load() may leave VPMU_CONTEXT_SAVE set after calling vpmu_save_force() if
>> the context is not loaded.
> To me this is ambiguous (and looking at the patch I can't really
> resolve it): Do you mean to say this can validly happen, or that it
> could have happened in error prior to the patch? In any event
> please alter the description to either clearly state what the wrong
> old behavior was, or what the correct new behavior is to be.

What I was trying to say (not particularly successfully, apparently) is 
that currently (i.e. prior to the patch) we may end up not clearing 
VPMU_CONTEXT_SAVE flag after calling vpmu_save_forced(). And that would 
be a bug, which this patch is fixing.

(It doesn't help that the diff doesn't show that the flag is cleared 
right after ->arch_vpmu_save() below)

-boris

>
> Jan
>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>   xen/arch/x86/hvm/vpmu.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
>> index 63765fa..26e971b 100644
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -130,6 +130,8 @@ static void vpmu_save_force(void *arg)
>>       if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>>           return;
>>   
>> +    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>> +
>>       if ( vpmu->arch_vpmu_ops )
>>           (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
>>   
>> @@ -178,7 +180,6 @@ void vpmu_load(struct vcpu *v)
>>            */
>>           if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
>>           {
>> -            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>>               on_selected_cpus(cpumask_of(vpmu->last_pcpu),
>>                                vpmu_save_force, (void *)v, 1);
>>               vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>> @@ -195,7 +196,6 @@ void vpmu_load(struct vcpu *v)
>>           vpmu = vcpu_vpmu(prev);
>>   
>>           /* Someone ran here before us */
>> -        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
>>           vpmu_save_force(prev);
>>           vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
>>   
>> -- 
>> 1.8.1.4
>
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code
  2014-08-11 13:45   ` Jan Beulich
@ 2014-08-11 16:01     ` Boris Ostrovsky
  2014-08-11 16:13       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-11 16:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/11/2014 09:45 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -1208,6 +1208,32 @@ int vmx_add_guest_msr(u32 msr)
>>       return 0;
>>   }
>>   
>> +void vmx_rm_guest_msr(u32 msr)
>> +{
>> +    struct vcpu *curr = current;
>> +    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
>> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
>> +
>> +    if ( msr_area == NULL )
>> +        return;
>> +
>> +    for ( idx = 0; idx < msr_count; idx++ )
>> +        if ( msr_area[idx].index == msr )
>> +            break;
>> +
>> +    if ( idx == msr_count )
>> +        return;
> This absence check (producing success) together with the presence
> check in the corresponding "add" function (also producing success) is
> certainly bogus without reference counting: Two independent callers
> of "add" would each validly assume they ought to call "rm" when done
> with their job, taking away the control over the MSR from the other.
> Yet if you don't think of independent callers, these presence/absence
> checks don't make sense at all.


Hmm, yes, that's a problem. Refcounting would require separate data 
structures which would need to be dynamically grown (we don't know how 
many MSR we will be tracking and most often the number is very small).

So I wonder whether I should drop support for remove altogether since 
this is only used in error path and I am not sure whether adding more 
complexity is worth it.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code
  2014-08-11 16:01     ` Boris Ostrovsky
@ 2014-08-11 16:13       ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-11 16:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 11.08.14 at 18:01, <boris.ostrovsky@oracle.com> wrote:
> On 08/11/2014 09:45 AM, Jan Beulich wrote:
>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -1208,6 +1208,32 @@ int vmx_add_guest_msr(u32 msr)
>>>       return 0;
>>>   }
>>>   
>>> +void vmx_rm_guest_msr(u32 msr)
>>> +{
>>> +    struct vcpu *curr = current;
>>> +    unsigned int idx, msr_count = curr->arch.hvm_vmx.msr_count;
>>> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
>>> +
>>> +    if ( msr_area == NULL )
>>> +        return;
>>> +
>>> +    for ( idx = 0; idx < msr_count; idx++ )
>>> +        if ( msr_area[idx].index == msr )
>>> +            break;
>>> +
>>> +    if ( idx == msr_count )
>>> +        return;
>> This absence check (producing success) together with the presence
>> check in the corresponding "add" function (also producing success) is
>> certainly bogus without reference counting: Two independent callers
>> of "add" would each validly assume they ought to call "rm" when done
>> with their job, taking away the control over the MSR from the other.
>> Yet if you don't think of independent callers, these presence/absence
>> checks don't make sense at all.
> 
> 
> Hmm, yes, that's a problem. Refcounting would require separate data 
> structures which would need to be dynamically grown (we don't know how 
> many MSR we will be tracking and most often the number is very small).
> 
> So I wonder whether I should drop support for remove altogether since 
> this is only used in error path and I am not sure whether adding more 
> complexity is worth it.

Probably not - see e.g. the debugctl and lbr handling, which also only
ever adds MSRs to that list. That's not optimal, but sufficient for now -
if we ever learn this presents a (performance) problem, we could then
start thinking about adding remove support.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h
  2014-08-11 14:08   ` Jan Beulich
@ 2014-08-11 16:15     ` Boris Ostrovsky
  2014-08-18 16:02       ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-11 16:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, jun.nakajima, andrew.cooper3, tim, xen-devel,
	suravee.suthikulpanit

On 08/11/2014 10:08 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>   static int amd_vpmu_initialise(struct vcpu *v)
>>   {
>> -    struct amd_vpmu_context *ctxt;
>> +    struct xen_pmu_amd_ctxt *ctxt;
>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>       uint8_t family = current_cpu_data.x86;
>>   
>> @@ -382,7 +386,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>   	 }
>>       }
>>   
>> -    ctxt = xzalloc(struct amd_vpmu_context);
>> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
> So I guess this is one of said sizeof()-s, and I continue to think this
> would benefit from using a proper field. It is my understanding that
> the real (non-C89 compliant) declaration of struct xen_pmu_amd_ctxt
> would be
>
> struct xen_pmu_amd_ctxt {
>      /* Offsets to counter and control MSRs (relative to xen_arch_pmu.c.amd) */
>      uint32_t counters;
>      uint32_t ctrls;
>      uint64_t values[];
> };

OK, this should work.

>
> Just like already done elsewhere you _can_ add variable-sized
> arrays to the public interface as long as you guard them suitably.
> And then your statement would become
>
>      ctxt = xzalloc_bytes(sizeof(*ctxt) +
>                           2 * sizeof(*ctxt->values) * AMD_MAX_COUNTERS);
>
> making clear what the allocation is doing.
>
>> @@ -391,7 +396,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>           return -ENOMEM;
>>       }
>>   
>> +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
>> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * AMD_MAX_COUNTERS;
>      ctxt->counters = sizeof(*ctxt);
>      ctxt->ctrls = ctxt->counters + sizeof(*ctxt->values) * AMD_MAX_COUNTERS;
>
> or
>
>      ctxt->counters = offsetof(typeof(ctxt), values[0]);
>      ctxt->ctrls = offsetof(typeof(ctxt), values[AMD_MAX_COUNTERS]);
>
>> @@ -228,6 +229,9 @@ void vpmu_initialise(struct vcpu *v)
>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>       uint8_t vendor = current_cpu_data.x86_vendor;
>>   
>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
> Don't these need to include the variable size array portions too?

Not in BUILD_BUG_ON but I should check whether I have enough space when 
computing ctxt->{ctrls | counters}.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-08-08 16:55 ` [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
@ 2014-08-12 10:37   ` Jan Beulich
  2014-08-12 15:12     ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 10:37 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>  
>      if ( is_hvm_vcpu(prev) )
>      {
> -        if (prev != next)
> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>              vpmu_save(prev);
>  
>          if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>                             !is_hardware_domain(next->domain));
>      }
>  
> -    if (is_hvm_vcpu(next) && (prev != next) )
> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>          /* Must be done with interrupts enabled */
>          vpmu_load(next);

Wouldn't such vPMU internals be better hidden in the functions
themselves? I realize you can save the calls this way, but if the
condition changes again later, we'll again have to adjust this
core function rather than just the vPMU code. It's bad enough that
the vpmu_mode variable is visible to non-vPMU code.

> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -21,6 +21,8 @@
>  #include <xen/config.h>
>  #include <xen/sched.h>
>  #include <xen/xenoprof.h>
> +#include <xen/event.h>
> +#include <xen/guest_access.h>
>  #include <asm/regs.h>
>  #include <asm/types.h>
>  #include <asm/msr.h>
> @@ -32,13 +34,21 @@
>  #include <asm/hvm/svm/vmcb.h>
>  #include <asm/apic.h>
>  #include <public/pmu.h>
> +#include <xen/tasklet.h>
> +
> +#include <compat/pmu.h>
> +CHECK_pmu_params;
> +CHECK_pmu_intel_ctxt;
> +CHECK_pmu_amd_ctxt;
> +CHECK_pmu_cntr_pair;

Such being placed in a HVM-specific file suggests that the series is badly
ordered: Anything relevant to PV should normally only be added here
_after_ the file got moved out of the hvm/ subtree. Yet since I realize
this would be a major re-work, I think you can leave this as is unless
you expect potentially just part of the series to go in, with the splitting
point being (immediately or later) after this patch (from my looking at
it I would suppose that patches 3 and 4 could go in right away - they
don't appear to depend on patches 1 and 2 - and patch 1 probably
could go in too, but it doesn't make much sense to have it in without
the rest of the series).

> @@ -274,3 +290,159 @@ void vpmu_dump(struct vcpu *v)
>          vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
>  }
>  
> +static atomic_t vpmu_sched_counter;
> +
> +static void vpmu_sched_checkin(unsigned long unused)
> +{
> +    atomic_inc(&vpmu_sched_counter);
> +}
> +
> +static int
> +vpmu_force_context_switch(XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> +{
> +    unsigned i, j, allbutself_num;
> +    cpumask_t allbutself;
> +    static s_time_t start;
> +    static struct tasklet *sync_task;
> +    int ret = 0;
> +
> +    allbutself_num = num_online_cpus() - 1;
> +
> +    if ( sync_task ) /* if true, we are in hypercall continuation */

"true"? This is not a boolean, so perhaps "set" would be the better
term?

> +        goto cont_wait;
> +
> +    cpumask_andnot(&allbutself, &cpu_online_map,
> +                   cpumask_of(smp_processor_id()));
> +
> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
> +    if ( !sync_task )
> +    {
> +        printk("vpmu_force_context_switch: out of memory\n");
> +        return -ENOMEM;
> +    }
> +
> +    for ( i = 0; i < allbutself_num; i++ )
> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
> +
> +    atomic_set(&vpmu_sched_counter, 0);
> +
> +    j = 0;
> +    for_each_cpu ( i, &allbutself )

This looks to be the only use for the (on stack) allbutself variable,
but you could easily avoid this by using for_each_online_cpu() and
skipping the local one. I'd also recommend that you count
allbutself_num here rather than up front, since that will much more
obviously prove that you wait for exactly as many CPUs as you
scheduled. The array allocation above is bogus anyway, as on a
huge system this can easily be more than a page in size.

> +        tasklet_schedule_on_cpu(&sync_task[j++], i);
> +
> +    vpmu_save(current);
> +
> +    start = NOW();
> +
> + cont_wait:
> +    /*
> +     * Note that we may fail here if a CPU is hot-unplugged while we are
> +     * waiting. We will then time out.
> +     */
> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
> +    {
> +        /* Give up after 5 seconds */
> +        if ( NOW() > start + SECONDS(5) )
> +        {
> +            printk("vpmu_force_context_switch: failed to sync\n");
> +            ret = -EBUSY;
> +            break;
> +        }
> +        cpu_relax();
> +        if ( hypercall_preempt_check() )
> +            return hypercall_create_continuation(
> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
> +    }
> +
> +    for ( i = 0; i < allbutself_num; i++ )
> +        tasklet_kill(&sync_task[i]);
> +    xfree(sync_task);
> +    sync_task = NULL;
> +
> +    return ret;
> +}
> +
> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
> +{
> +    int ret = -EINVAL;
> +    xen_pmu_params_t pmu_params;
> +
> +    switch ( op )
> +    {
> +    case XENPMU_mode_set:
> +    {
> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
> +        uint32_t current_mode;
> +
> +        if ( !is_control_domain(current->domain) )
> +            return -EPERM;
> +
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
> +            return -EINVAL;
> +
> +        /*
> +         * Return error is someone else is in the middle of changing mode ---
> +         * this is most likely indication of two system administrators
> +         * working against each other
> +         */
> +        if ( !spin_trylock(&xenpmu_mode_lock) )
> +            return -EAGAIN;
> +
> +        current_mode = vpmu_mode;
> +        vpmu_mode = pmu_params.val;
> +
> +        if ( vpmu_mode == XENPMU_MODE_OFF )
> +        {
> +            /*
> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
> +             * can be achieved by having all physical processors go through
> +             * context_switch().
> +             */
> +            ret = vpmu_force_context_switch(arg);
> +            if ( ret )
> +                vpmu_mode = current_mode;
> +        }
> +        else
> +            ret = 0;
> +
> +        spin_unlock(&xenpmu_mode_lock);
> +        break;

This still isn't safe: There's nothing preventing another vCPU to issue
another XENPMU_mode_set operation while the one turning the vPMU
off is still in the process of waiting, but having exited the lock protected
region in order to allow other processing to occur. I think you simply
need another mode "being-turned-off" during which only mode
changes to XENPMU_MODE_OFF, and only by the originally requesting
vCPU, are permitted (or else your "if true, we are in hypercall
continuation" comment above wouldn't always be true either, as that
second vCPU might also issue a second turn-off request).

> +    case XENPMU_mode_get:
> +        pmu_params.val = vpmu_mode;
> +        pmu_params.version.maj = XENPMU_VER_MAJ;
> +        pmu_params.version.min = XENPMU_VER_MIN;
> +        if ( copy_to_guest(arg, &pmu_params, 1) )

You're leaking hypervisor stack contents here ...

> +    case XENPMU_feature_get:
> +        pmu_params.val = vpmu_mode;
> +        if ( copy_to_guest(arg, &pmu_params, 1) )

... and here.

> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -26,6 +26,7 @@ headers-y := \
>  headers-$(CONFIG_X86)     += compat/arch-x86/xen-mca.h
>  headers-$(CONFIG_X86)     += compat/arch-x86/xen.h
>  headers-$(CONFIG_X86)     += compat/arch-x86/xen-$(compat-arch-y).h
> +headers-$(CONFIG_X86)     += compat/pmu.h compat/arch-x86/pmu.h

The first one isn't x86-specific, so doesn't belong here.

> --- a/xen/include/public/arch-x86/pmu.h
> +++ b/xen/include/public/arch-x86/pmu.h
> @@ -9,12 +9,16 @@ struct xen_pmu_amd_ctxt {
>      uint32_t counters;
>      uint32_t ctrls;
>  };
> +typedef struct xen_pmu_amd_ctxt xen_pmu_amd_ctxt_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_amd_ctxt_t);
>  
>  /* Intel PMU registers and structures */
>  struct xen_pmu_cntr_pair {
>      uint64_t counter;
>      uint64_t control;
>  };
> +typedef struct xen_pmu_cntr_pair xen_pmu_cntr_pair_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_cntr_pair_t);
>  
>  struct xen_pmu_intel_ctxt {
>      uint64_t global_ctrl;
> @@ -31,8 +35,10 @@ struct xen_pmu_intel_ctxt {
>      uint32_t fixed_counters;
>      uint32_t arch_counters;
>  };
> +typedef struct xen_pmu_intel_ctxt xen_pmu_intel_ctxt_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_pmu_intel_ctxt_t);

If you really need these typedefs and handles, why don't you add
them right away when introducing this header?

> -struct xen_arch_pmu {
> +struct xen_pmu_arch {

And why can't this be named the final way from the beginning?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests
  2014-08-08 16:55 ` [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
@ 2014-08-12 11:59   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 11:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1158,7 +1158,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
>          return rc;
>      }
>  
> -    vpmu_initialise(v);
> +    /* PVH's VPMU is initialized via hypercall */
> +    if ( is_hvm_domain(v->domain) )
> +        vpmu_initialise(v);
>  
>      svm_guest_osvw_init(v);
>  
> @@ -1167,7 +1169,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
>  
>  static void svm_vcpu_destroy(struct vcpu *v)
>  {
> -    vpmu_destroy(v);
> +    /* PVH's VPMU destroyed via hypercall */
> +    if ( is_hvm_domain(v->domain) )
> +        vpmu_destroy(v);

While I can see that being the case for init, is this really correct for
destroy even for a misbehaving guest?

> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -386,14 +386,21 @@ static int amd_vpmu_initialise(struct vcpu *v)
>  	 }
>      }
>  
> -    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> -                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
> -    if ( !ctxt )
> +    if ( is_hvm_domain(v->domain) )
>      {
> -        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> -            " PMU feature is unavailable on domain %d vcpu %d.\n",
> -            v->vcpu_id, v->domain->domain_id);
> -        return -ENOMEM;
> +        ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
> +                             2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
> +        if ( !ctxt )
> +        {
> +            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
> +                " PMU feature is unavailable on domain %d vcpu %d.\n",
> +                v->vcpu_id, v->domain->domain_id);

If you already have to touch this, please make it sane: gdprintk()
already prints a (domain,vcpu) pair, and two of them are unlikely
to be useful here. Also this should make use of %pv.

> +            return -ENOMEM;
> +        }
> +    }
> +    else
> +    {
> +        ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
>      }

Pointless braces.

> @@ -387,7 +399,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
>  
>      return 1;
>  
> -out_err:
> +out_err_hvm:
>      vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_HOST_MSR);
>      vmx_rm_msr(MSR_CORE_PERF_GLOBAL_CTRL, VMX_GUEST_MSR);
>      release_pmu_ownship(PMU_OWNER_HVM);
> @@ -395,6 +407,7 @@ out_err:
>      xfree(core2_vpmu_cxt);
>      xfree(p);
>  
> +out_err:

I think I said this before: Please indent labels by at least one space.

> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -22,10 +22,13 @@
>  #include <xen/sched.h>
>  #include <xen/xenoprof.h>
>  #include <xen/event.h>
> +#include <xen/softirq.h>
> +#include <xen/hypercall.h>

Leftover additions from an earlier version?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2014-08-08 16:55 ` [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
@ 2014-08-12 12:45   ` Jan Beulich
  2014-08-12 15:47     ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 12:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -458,13 +458,13 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>              if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
>                  supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
>                               IA32_DEBUGCTLMSR_BTS_OFF_USR;
> -            if ( msr_content & supported )
> +
> +            if ( !(msr_content & supported ) ||
> +                 !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>              {
> -                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
> -                    return 1;
> -                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> -                return 0;
> +                gdprintk(XENLOG_WARNING,
> +                         "Debug Store is not supported on this cpu\n");
> +                return 1;

The code here (and in particular the #GP injection) was wrong anyway;
see the small series I sent earlier today. Please base your on top of that.

> @@ -476,18 +476,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>      {
>      case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
>          core2_vpmu_cxt->global_status &= ~msr_content;
> -        return 1;
> +        return 0;
>      case MSR_CORE_PERF_GLOBAL_STATUS:
>          gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
>                   "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
> -        hvm_inject_hw_exception(TRAP_gp_fault, 0);
>          return 1;

Suspicious: Most return values get flipped, but this one (and at least
one more below) doesn't. Any such inconsistencies that are being
corrected (assuming this is intentional) as you go should be spelled
out in the description. And then the question of course is whether
it's really necessary to flip the meaning of this and some similar SVM
function's return values anyway.

> @@ -541,45 +539,43 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>          }
>      }
>  
> -    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
> -        vpmu_set(vpmu, VPMU_RUNNING);
> -    else
> -        vpmu_reset(vpmu, VPMU_RUNNING);

The moving off this code is - again without at least a brief comment -
also not obviously correct; at least to me it looks like this slightly
alters behavior (perhaps towards the better, but anyway).

> @@ -607,19 +603,14 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>              rdmsrl(msr, *msr_content);
>          }
>      }
> -    else
> +    else if ( msr == MSR_IA32_MISC_ENABLE )
>      {
>          /* Extension for BTS */
> -        if ( msr == MSR_IA32_MISC_ENABLE )
> -        {
> -            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
> -                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
> -        }
> -        else
> -            return 0;

Again a case where the return value stays the same even if in
general it appears to get flipped in this function.

> +        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
> +            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
>      }
>  
> -    return 1;
> +    return 0;
>  }

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests
  2014-08-08 16:55 ` [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
@ 2014-08-12 12:55   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 12:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> @@ -2556,7 +2564,22 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>              if ( v->arch.debugreg[7] & DR7_ACTIVE_MASK )
>                  wrmsrl(regs->_ecx, msr_content);
>              break;
> -
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +            if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +                vpmu_msr = 1;
> +            /* FALLTHROUGH */
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( vpmu_msr ||
> +                 ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && !vpmu_msr) )

Pointlessly complicated right side of the ||: (a || (b && !a)) is the same
as (a || b). Meaning this _still_ doesn't achieve what you appear to
want, i.e. the set of Intel MSRs continues to also get handled on AMD
CPUs.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for PV guests
  2014-08-08 16:55 ` [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
@ 2014-08-12 12:58   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 12:58 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> @@ -436,6 +561,7 @@ vpmu_force_context_switch(XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>  long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>  {
>      int ret = -EINVAL;
> +    struct vcpu *curr;
>      xen_pmu_params_t pmu_params;
>  
>      switch ( op )
> @@ -532,6 +658,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>          vpmu_lvtpc_update(current->arch.vpmu.xenpmu_data->pmu.l.lapic_lvtpc);

So why does this not get switched to use "curr" at once?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode
  2014-08-08 16:55 ` [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
@ 2014-08-12 13:06   ` Jan Beulich
  2014-08-12 16:14     ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 13:06 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> @@ -192,32 +199,65 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>          vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
>          vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>  
> -        /* Store appropriate registers in xenpmu_data */
> -        if ( is_pv_32bit_domain(sampled->domain) )
> +        if ( !is_hvm_domain(sampled->domain) )

I agree with using "sampled" here.

>          {
> -            /*
> -             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
> -             * and therefore we treat it the same way as a non-priviledged
> -             * PV 32-bit domain.
> -             */
> -            struct compat_cpu_user_regs *cmp;
> -
> -            gregs = guest_cpu_user_regs();
> -
> -            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
> -            XLAT_cpu_user_regs(cmp, gregs);
> +            /* Store appropriate registers in xenpmu_data */
> +            if ( is_pv_32bit_domain(sampled->domain) )

But why is this not "sampling"? The layout of the data you present
should depend on the consumer of the data

> +            {
> +                /*
> +                 * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
> +                 * and therefore we treat it the same way as a non-priviledged

non-privileged

> +                 * PV 32-bit domain.
> +                 */
> +                struct compat_cpu_user_regs *cmp;
> +
> +                gregs = guest_cpu_user_regs();
> +
> +                cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
> +                XLAT_cpu_user_regs(cmp, gregs);
> +
> +                /* Adjust RPL for kernel mode */
> +                if ((cmp->cs & 3) == 1)
> +                    cmp->cs &= ~3;
> +            }

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch
  2014-08-08 16:55 ` [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
@ 2014-08-12 14:15   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 14:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> @@ -1526,12 +1524,12 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>                             !is_hardware_domain(next->domain));
>      }
>  
> -    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
> +    context_saved(prev);
> +
> +    if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>          /* Must be done with interrupts enabled */
>          vpmu_load(next);
>  
> -    context_saved(prev);
> -
>      if ( prev != next )
>          _update_runstate_area(next);
>  

While fine afaict, this re-ordering needs an explanation in the commit
message (and not just in the overview mail).

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support
  2014-08-08 16:55 ` [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
@ 2014-08-12 14:24   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 14:24 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>  static void __init parse_vpmu_param(char *s)
>  {
> -    switch ( parse_bool(s) )
> -    {
> -    case 0:
> -        break;
> -    default:
> -        if ( !strcmp(s, "bts") )
> -            vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
> -        else if ( *s )
> +    char *ss;
> +
> +    vpmu_mode = XENPMU_MODE_SELF;
> +    if (*s == '\0')
> +        return;
> +
> +    do {
> +        ss = strchr(s, ',');
> +        if ( ss )
> +            *ss = '\0';
> +
> +        switch  ( parse_bool(s) )
>          {
> -            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
> +        case 0:
> +            vpmu_mode = XENPMU_MODE_OFF;
> +            return;
> +        case -1:

This should be "default".

> +            if ( !strcmp(s, "nmi") )
> +                vpmu_interrupt_type = APIC_DM_NMI;
> +            else if ( !strcmp(s, "bts") )
> +                vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
> +            else if ( !strcmp(s, "all") )
> +            {
> +                vpmu_mode &= ~XENPMU_MODE_SELF;
> +                vpmu_mode |= XENPMU_MODE_ALL;
> +            }
> +            else
> +            {
> +                printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
> +                vpmu_mode = XENPMU_MODE_OFF;
> +                return;
> +            }
> +        default:

And this should be "case 1" and preceded by a fall-through comment.

> @@ -265,30 +316,30 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>          apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc | APIC_LVT_MASKED);
>          vpmu->hw_lapic_lvtpc |= APIC_LVT_MASKED;
>  
> -        send_guest_vcpu_virq(sampling, VIRQ_XENPMU);
> +        if ( vpmu_interrupt_type & APIC_DM_NMI )
> +        {
> +            per_cpu(sampled_vcpu, smp_processor_id()) = sampled;

Please use this_cpu() instead of kind of open-coding it (also again
below, and who knows where else).

>  static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>  {
>      struct vcpu *v;
>      struct page_info *page;
>      uint64_t gfn = params->val;
> +    static bool_t __read_mostly pvpmu_initted = 0;

Pointless initializer. Also I think pvpmu_init_done would be a better name.

> @@ -476,6 +578,26 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>          return -EINVAL;
>      }
>  
> +    spin_lock(&init_lock);
> +
> +    if ( !pvpmu_initted )
> +    {
> +        if ( reserve_lapic_nmi() == 0 )
> +            set_nmi_callback(pmu_nmi_interrupt);
> +        else
> +        {
> +            spin_unlock(&init_lock);
> +            printk(XENLOG_G_ERR "Failed to reserve PMU NMI\n");
> +            put_page(page);
> +            return -EBUSY;
> +        }
> +        open_softirq(PMU_SOFTIRQ, pmu_softnmi);
> +
> +        pvpmu_initted = 1;

Consistency please: Either you have an if and an else branch, and
each does all that is needed, or you have just an if branch ending in
"return" and the else branch is implicit.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory
  2014-08-08 16:55 ` [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
@ 2014-08-12 14:26   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 14:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
> @@ -587,7 +587,7 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
>          else
>          {
>              spin_unlock(&init_lock);
> -            printk(XENLOG_G_ERR "Failed to reserve PMU NMI\n");
> +            printk("XENLOG_G_ERR Failed to reserve PMU NMI\n");

Certainly not.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-08-12 10:37   ` Jan Beulich
@ 2014-08-12 15:12     ` Boris Ostrovsky
  2014-08-12 15:35       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-12 15:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/12/2014 06:37 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>   
>>       if ( is_hvm_vcpu(prev) )
>>       {
>> -        if (prev != next)
>> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>               vpmu_save(prev);
>>   
>>           if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>                              !is_hardware_domain(next->domain));
>>       }
>>   
>> -    if (is_hvm_vcpu(next) && (prev != next) )
>> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>           /* Must be done with interrupts enabled */
>>           vpmu_load(next);
> Wouldn't such vPMU internals be better hidden in the functions
> themselves? I realize you can save the calls this way, but if the
> condition changes again later, we'll again have to adjust this
> core function rather than just the vPMU code. It's bad enough that
> the vpmu_mode variable is visible to non-vPMU code.

How about an inline function?

>
>> --- a/xen/arch/x86/hvm/vpmu.c
>> +++ b/xen/arch/x86/hvm/vpmu.c
>> @@ -21,6 +21,8 @@
>>   #include <xen/config.h>
>>   #include <xen/sched.h>
>>   #include <xen/xenoprof.h>
>> +#include <xen/event.h>
>> +#include <xen/guest_access.h>
>>   #include <asm/regs.h>
>>   #include <asm/types.h>
>>   #include <asm/msr.h>
>> @@ -32,13 +34,21 @@
>>   #include <asm/hvm/svm/vmcb.h>
>>   #include <asm/apic.h>
>>   #include <public/pmu.h>
>> +#include <xen/tasklet.h>
>> +
>> +#include <compat/pmu.h>
>> +CHECK_pmu_params;
>> +CHECK_pmu_intel_ctxt;
>> +CHECK_pmu_amd_ctxt;
>> +CHECK_pmu_cntr_pair;
> Such being placed in a HVM-specific file suggests that the series is badly
> ordered: Anything relevant to PV should normally only be added here
> _after_ the file got moved out of the hvm/ subtree. Yet since I realize
> this would be a major re-work, I think you can leave this as is unless
> you expect potentially just part of the series to go in, with the splitting
> point being (immediately or later) after this patch (from my looking at
> it I would suppose that patches 3 and 4 could go in right away - they
> don't appear to depend on patches 1 and 2 - and patch 1 probably
> could go in too, but it doesn't make much sense to have it in without
> the rest of the series).

I would prefer the whole series to get in in one go (with patch 2 
possibly being the only exception). All other patches (including patch 
1) are not particularly useful on their own.

>> +        goto cont_wait;
>> +
>> +    cpumask_andnot(&allbutself, &cpu_online_map,
>> +                   cpumask_of(smp_processor_id()));
>> +
>> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
>> +    if ( !sync_task )
>> +    {
>> +        printk("vpmu_force_context_switch: out of memory\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
>> +
>> +    atomic_set(&vpmu_sched_counter, 0);
>> +
>> +    j = 0;
>> +    for_each_cpu ( i, &allbutself )
> This looks to be the only use for the (on stack) allbutself variable,
> but you could easily avoid this by using for_each_online_cpu() and
> skipping the local one. I'd also recommend that you count
> allbutself_num here rather than up front, since that will much more
> obviously prove that you wait for exactly as many CPUs as you
> scheduled. The array allocation above is bogus anyway, as on a
> huge system this can easily be more than a page in size.

I don't understand the last sentence. What does page-sized allocation 
have to do with this?

>
>> +        tasklet_schedule_on_cpu(&sync_task[j++], i);
>> +
>> +    vpmu_save(current);
>> +
>> +    start = NOW();
>> +
>> + cont_wait:
>> +    /*
>> +     * Note that we may fail here if a CPU is hot-unplugged while we are
>> +     * waiting. We will then time out.
>> +     */
>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>> +    {
>> +        /* Give up after 5 seconds */
>> +        if ( NOW() > start + SECONDS(5) )
>> +        {
>> +            printk("vpmu_force_context_switch: failed to sync\n");
>> +            ret = -EBUSY;
>> +            break;
>> +        }
>> +        cpu_relax();
>> +        if ( hypercall_preempt_check() )
>> +            return hypercall_create_continuation(
>> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
>> +    }
>> +
>> +    for ( i = 0; i < allbutself_num; i++ )
>> +        tasklet_kill(&sync_task[i]);
>> +    xfree(sync_task);
>> +    sync_task = NULL;
>> +
>> +    return ret;
>> +}
>> +
>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    int ret = -EINVAL;
>> +    xen_pmu_params_t pmu_params;
>> +
>> +    switch ( op )
>> +    {
>> +    case XENPMU_mode_set:
>> +    {
>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>> +        uint32_t current_mode;
>> +
>> +        if ( !is_control_domain(current->domain) )
>> +            return -EPERM;
>> +
>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>> +            return -EINVAL;
>> +
>> +        /*
>> +         * Return error is someone else is in the middle of changing mode ---
>> +         * this is most likely indication of two system administrators
>> +         * working against each other
>> +         */
>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>> +            return -EAGAIN;
>> +
>> +        current_mode = vpmu_mode;
>> +        vpmu_mode = pmu_params.val;
>> +
>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>> +        {
>> +            /*
>> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
>> +             * can be achieved by having all physical processors go through
>> +             * context_switch().
>> +             */
>> +            ret = vpmu_force_context_switch(arg);
>> +            if ( ret )
>> +                vpmu_mode = current_mode;
>> +        }
>> +        else
>> +            ret = 0;
>> +
>> +        spin_unlock(&xenpmu_mode_lock);
>> +        break;
> This still isn't safe: There's nothing preventing another vCPU to issue
> another XENPMU_mode_set operation while the one turning the vPMU
> off is still in the process of waiting, but having exited the lock protected
> region in order to allow other processing to occur.

I think that's OK: one of these two competing requests will timeout in 
the while loop of vpmu_force_context_switch(). It will, in fact, likely 
be the first caller because the second one will get right into the while 
loop, without setting up the waiting data. This is a little 
counter-intuitive but trying to set mode simultaneously from two 
different places is asking for trouble anyway.


> I think you simply
> need another mode "being-turned-off" during which only mode
> changes to XENPMU_MODE_OFF, and only by the originally requesting
> vCPU, are permitted (or else your "if true, we are in hypercall
> continuation" comment above wouldn't always be true either, as that
> second vCPU might also issue a second turn-off request).

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-08-12 15:12     ` Boris Ostrovsky
@ 2014-08-12 15:35       ` Jan Beulich
  2014-08-12 16:25         ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 15:35 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 12.08.14 at 17:12, <boris.ostrovsky@oracle.com> wrote:
> On 08/12/2014 06:37 AM, Jan Beulich wrote:
>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>>   
>>>       if ( is_hvm_vcpu(prev) )
>>>       {
>>> -        if (prev != next)
>>> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>>               vpmu_save(prev);
>>>   
>>>           if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>>                              !is_hardware_domain(next->domain));
>>>       }
>>>   
>>> -    if (is_hvm_vcpu(next) && (prev != next) )
>>> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>>           /* Must be done with interrupts enabled */
>>>           vpmu_load(next);
>> Wouldn't such vPMU internals be better hidden in the functions
>> themselves? I realize you can save the calls this way, but if the
>> condition changes again later, we'll again have to adjust this
>> core function rather than just the vPMU code. It's bad enough that
>> the vpmu_mode variable is visible to non-vPMU code.
> 
> How about an inline function?

Yeah, that would perhaps do too. Albeit I'd still prefer all vPMU
logic to be handle in the called vPMU functions.

> I would prefer the whole series to get in in one go (with patch 2 
> possibly being the only exception). All other patches (including patch 
> 1) are not particularly useful on their own.

Okay. In that case in patch 1, please consider swapping struct
xenpf_symdata's address and name fields (I had put a respective
note on the patch for when committing it), shrinking the structure
for compat mode guests.

>>> +        goto cont_wait;
>>> +
>>> +    cpumask_andnot(&allbutself, &cpu_online_map,
>>> +                   cpumask_of(smp_processor_id()));
>>> +
>>> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
>>> +    if ( !sync_task )
>>> +    {
>>> +        printk("vpmu_force_context_switch: out of memory\n");
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    for ( i = 0; i < allbutself_num; i++ )
>>> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
>>> +
>>> +    atomic_set(&vpmu_sched_counter, 0);
>>> +
>>> +    j = 0;
>>> +    for_each_cpu ( i, &allbutself )
>> This looks to be the only use for the (on stack) allbutself variable,
>> but you could easily avoid this by using for_each_online_cpu() and
>> skipping the local one. I'd also recommend that you count
>> allbutself_num here rather than up front, since that will much more
>> obviously prove that you wait for exactly as many CPUs as you
>> scheduled. The array allocation above is bogus anyway, as on a
>> huge system this can easily be more than a page in size.
> 
> I don't understand the last sentence. What does page-sized allocation 
> have to do with this?

We dislike greater than page size allocations at runtime.

>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>> +{
>>> +    int ret = -EINVAL;
>>> +    xen_pmu_params_t pmu_params;
>>> +
>>> +    switch ( op )
>>> +    {
>>> +    case XENPMU_mode_set:
>>> +    {
>>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>>> +        uint32_t current_mode;
>>> +
>>> +        if ( !is_control_domain(current->domain) )
>>> +            return -EPERM;
>>> +
>>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>> +            return -EFAULT;
>>> +
>>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>>> +            return -EINVAL;
>>> +
>>> +        /*
>>> +         * Return error is someone else is in the middle of changing mode ---
>>> +         * this is most likely indication of two system administrators
>>> +         * working against each other
>>> +         */
>>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>>> +            return -EAGAIN;
>>> +
>>> +        current_mode = vpmu_mode;
>>> +        vpmu_mode = pmu_params.val;
>>> +
>>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>>> +        {
>>> +            /*
>>> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
>>> +             * can be achieved by having all physical processors go through
>>> +             * context_switch().
>>> +             */
>>> +            ret = vpmu_force_context_switch(arg);
>>> +            if ( ret )
>>> +                vpmu_mode = current_mode;
>>> +        }
>>> +        else
>>> +            ret = 0;
>>> +
>>> +        spin_unlock(&xenpmu_mode_lock);
>>> +        break;
>> This still isn't safe: There's nothing preventing another vCPU to issue
>> another XENPMU_mode_set operation while the one turning the vPMU
>> off is still in the process of waiting, but having exited the lock protected
>> region in order to allow other processing to occur.
> 
> I think that's OK: one of these two competing requests will timeout in 
> the while loop of vpmu_force_context_switch(). It will, in fact, likely 
> be the first caller because the second one will get right into the while 
> loop, without setting up the waiting data. This is a little 
> counter-intuitive but trying to set mode simultaneously from two 
> different places is asking for trouble anyway.

Sure it is, but we nevertheless want the hypervisor to be safe. So I
consider what you currently have acceptable only if you can prove
that nothing bad can happen no matter in what order multiple
requests get issued - from looking over your code I wasn't able to
convince myself of this.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2014-08-12 12:45   ` Jan Beulich
@ 2014-08-12 15:47     ` Boris Ostrovsky
  2014-08-12 16:00       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-12 15:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/12/2014 08:45 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
>> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
>> @@ -458,13 +458,13 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>>               if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
>>                   supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
>>                                IA32_DEBUGCTLMSR_BTS_OFF_USR;
>> -            if ( msr_content & supported )
>> +
>> +            if ( !(msr_content & supported ) ||
>> +                 !vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>>               {
>> -                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>> -                    return 1;
>> -                gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n");
>> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> -                return 0;
>> +                gdprintk(XENLOG_WARNING,
>> +                         "Debug Store is not supported on this cpu\n");
>> +                return 1;
> The code here (and in particular the #GP injection) was wrong anyway;
> see the small series I sent earlier today. Please base your on top of that.
>
>> @@ -476,18 +476,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>>       {
>>       case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
>>           core2_vpmu_cxt->global_status &= ~msr_content;
>> -        return 1;
>> +        return 0;
>>       case MSR_CORE_PERF_GLOBAL_STATUS:
>>           gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
>>                    "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
>> -        hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>           return 1;
> Suspicious: Most return values get flipped, but this one (and at least
> one more below) doesn't. Any such inconsistencies that are being
> corrected (assuming this is intentional) as you go should be spelled
> out in the description. And then the question of course is whether
> it's really necessary to flip the meaning of this and some similar SVM
> function's return values anyway.

The return value of 1 of vpmu_do_msr() will now indicate that the 
routine encountered a fault as opposed to indicating whether the MSR 
access was to a VPMU register.

I will make a comment to that effect.

>
>> @@ -541,45 +539,43 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>>           }
>>       }
>>   
>> -    if ( (global_ctrl & *enabled_cntrs) || (core2_vpmu_cxt->ds_area != 0) )
>> -        vpmu_set(vpmu, VPMU_RUNNING);
>> -    else
>> -        vpmu_reset(vpmu, VPMU_RUNNING);
> The moving off this code is - again without at least a brief comment -
> also not obviously correct; at least to me it looks like this slightly
> alters behavior (perhaps towards the better, but anyway).

Right. Because we don't want to change VPMU state if a fault is 
encountered (which is possible with current code).

Come think of it, this is still not enough --- we may leave registers 
updated when fault happens. I should fix that too.

-boris

>
>> @@ -607,19 +603,14 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
>>               rdmsrl(msr, *msr_content);
>>           }
>>       }
>> -    else
>> +    else if ( msr == MSR_IA32_MISC_ENABLE )
>>       {
>>           /* Extension for BTS */
>> -        if ( msr == MSR_IA32_MISC_ENABLE )
>> -        {
>> -            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>> -                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
>> -        }
>> -        else
>> -            return 0;
> Again a case where the return value stays the same even if in
> general it appears to get flipped in this function.
>
>> +        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>> +            *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
>>       }
>>   
>> -    return 1;
>> +    return 0;
>>   }
> Jan
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2014-08-12 15:47     ` Boris Ostrovsky
@ 2014-08-12 16:00       ` Jan Beulich
  2014-08-12 16:30         ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2014-08-12 16:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 12.08.14 at 17:47, <boris.ostrovsky@oracle.com> wrote:
> On 08/12/2014 08:45 AM, Jan Beulich wrote:
>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -476,18 +476,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>>>       {
>>>       case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
>>>           core2_vpmu_cxt->global_status &= ~msr_content;
>>> -        return 1;
>>> +        return 0;
>>>       case MSR_CORE_PERF_GLOBAL_STATUS:
>>>           gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
>>>                    "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
>>> -        hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>>           return 1;
>> Suspicious: Most return values get flipped, but this one (and at least
>> one more below) doesn't. Any such inconsistencies that are being
>> corrected (assuming this is intentional) as you go should be spelled
>> out in the description. And then the question of course is whether
>> it's really necessary to flip the meaning of this and some similar SVM
>> function's return values anyway.
> 
> The return value of 1 of vpmu_do_msr() will now indicate that the 
> routine encountered a fault as opposed to indicating whether the MSR 
> access was to a VPMU register.

And how would it now indicate that latter fact?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode
  2014-08-12 13:06   ` Jan Beulich
@ 2014-08-12 16:14     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-12 16:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/12/2014 09:06 AM, Jan Beulich wrote:
>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>> @@ -192,32 +199,65 @@ int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>           vpmu->arch_vpmu_ops->arch_vpmu_save(sampling);
>>           vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>>   
>> -        /* Store appropriate registers in xenpmu_data */
>> -        if ( is_pv_32bit_domain(sampled->domain) )
>> +        if ( !is_hvm_domain(sampled->domain) )
> I agree with using "sampled" here.
>
>>           {
>> -            /*
>> -             * 32-bit dom0 cannot process Xen's addresses (which are 64 bit)
>> -             * and therefore we treat it the same way as a non-priviledged
>> -             * PV 32-bit domain.
>> -             */
>> -            struct compat_cpu_user_regs *cmp;
>> -
>> -            gregs = guest_cpu_user_regs();
>> -
>> -            cmp = (void *)&vpmu->xenpmu_data->pmu.r.regs;
>> -            XLAT_cpu_user_regs(cmp, gregs);
>> +            /* Store appropriate registers in xenpmu_data */
>> +            if ( is_pv_32bit_domain(sampled->domain) )
> But why is this not "sampling"? The layout of the data you present
> should depend on the consumer of the data


Right.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-08-12 15:35       ` Jan Beulich
@ 2014-08-12 16:25         ` Boris Ostrovsky
  2014-08-14 16:32           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-12 16:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/12/2014 11:35 AM, Jan Beulich wrote:
>>>> On 12.08.14 at 17:12, <boris.ostrovsky@oracle.com> wrote:
>> On 08/12/2014 06:37 AM, Jan Beulich wrote:
>>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -1482,7 +1482,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>>>    
>>>>        if ( is_hvm_vcpu(prev) )
>>>>        {
>>>> -        if (prev != next)
>>>> +        if ( (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>>>                vpmu_save(prev);
>>>>    
>>>>            if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
>>>> @@ -1526,7 +1526,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>>>                               !is_hardware_domain(next->domain));
>>>>        }
>>>>    
>>>> -    if (is_hvm_vcpu(next) && (prev != next) )
>>>> +    if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & XENPMU_MODE_SELF) )
>>>>            /* Must be done with interrupts enabled */
>>>>            vpmu_load(next);
>>> Wouldn't such vPMU internals be better hidden in the functions
>>> themselves? I realize you can save the calls this way, but if the
>>> condition changes again later, we'll again have to adjust this
>>> core function rather than just the vPMU code. It's bad enough that
>>> the vpmu_mode variable is visible to non-vPMU code.
>> How about an inline function?
> Yeah, that would perhaps do too. Albeit I'd still prefer all vPMU
> logic to be handle in the called vPMU functions.

I was thinking about an inline in vpmu.h. Something like

inline void vpmu_next(struct vcpu *prev, struct vcpu *next)
{
      if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & 
XENPMU_MODE_SELF) )
         /* Must be done with interrupts enabled */
          vpmu_load(next);
}

(and similar for vpmu_save())


>> I would prefer the whole series to get in in one go (with patch 2
>> possibly being the only exception). All other patches (including patch
>> 1) are not particularly useful on their own.
> Okay. In that case in patch 1, please consider swapping struct
> xenpf_symdata's address and name fields (I had put a respective
> note on the patch for when committing it), shrinking the structure
> for compat mode guests.
>
>>>> +        goto cont_wait;
>>>> +
>>>> +    cpumask_andnot(&allbutself, &cpu_online_map,
>>>> +                   cpumask_of(smp_processor_id()));
>>>> +
>>>> +    sync_task = xmalloc_array(struct tasklet, allbutself_num);
>>>> +    if ( !sync_task )
>>>> +    {
>>>> +        printk("vpmu_force_context_switch: out of memory\n");
>>>> +        return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    for ( i = 0; i < allbutself_num; i++ )
>>>> +        tasklet_init(&sync_task[i], vpmu_sched_checkin, 0);
>>>> +
>>>> +    atomic_set(&vpmu_sched_counter, 0);
>>>> +
>>>> +    j = 0;
>>>> +    for_each_cpu ( i, &allbutself )
>>> This looks to be the only use for the (on stack) allbutself variable,
>>> but you could easily avoid this by using for_each_online_cpu() and
>>> skipping the local one. I'd also recommend that you count
>>> allbutself_num here rather than up front, since that will much more
>>> obviously prove that you wait for exactly as many CPUs as you
>>> scheduled. The array allocation above is bogus anyway, as on a
>>> huge system this can easily be more than a page in size.
>> I don't understand the last sentence. What does page-sized allocation
>> have to do with this?
> We dislike greater than page size allocations at runtime.


I can allocate array of pointers and then allocate tasklets in the 
for(i) loop above if that makes it more palatable but I still need 
allbutself_num value earlier than what you suggested.



>
>>>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>>> +{
>>>> +    int ret = -EINVAL;
>>>> +    xen_pmu_params_t pmu_params;
>>>> +
>>>> +    switch ( op )
>>>> +    {
>>>> +    case XENPMU_mode_set:
>>>> +    {
>>>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>>>> +        uint32_t current_mode;
>>>> +
>>>> +        if ( !is_control_domain(current->domain) )
>>>> +            return -EPERM;
>>>> +
>>>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>>>> +            return -EFAULT;
>>>> +
>>>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>>>> +            return -EINVAL;
>>>> +
>>>> +        /*
>>>> +         * Return error is someone else is in the middle of changing mode ---
>>>> +         * this is most likely indication of two system administrators
>>>> +         * working against each other
>>>> +         */
>>>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>>>> +            return -EAGAIN;
>>>> +
>>>> +        current_mode = vpmu_mode;
>>>> +        vpmu_mode = pmu_params.val;
>>>> +
>>>> +        if ( vpmu_mode == XENPMU_MODE_OFF )
>>>> +        {
>>>> +            /*
>>>> +             * Make sure all (non-dom0) VCPUs have unloaded their VPMUs. This
>>>> +             * can be achieved by having all physical processors go through
>>>> +             * context_switch().
>>>> +             */
>>>> +            ret = vpmu_force_context_switch(arg);
>>>> +            if ( ret )
>>>> +                vpmu_mode = current_mode;
>>>> +        }
>>>> +        else
>>>> +            ret = 0;
>>>> +
>>>> +        spin_unlock(&xenpmu_mode_lock);
>>>> +        break;
>>> This still isn't safe: There's nothing preventing another vCPU to issue
>>> another XENPMU_mode_set operation while the one turning the vPMU
>>> off is still in the process of waiting, but having exited the lock protected
>>> region in order to allow other processing to occur.
>> I think that's OK: one of these two competing requests will timeout in
>> the while loop of vpmu_force_context_switch(). It will, in fact, likely
>> be the first caller because the second one will get right into the while
>> loop, without setting up the waiting data. This is a little
>> counter-intuitive but trying to set mode simultaneously from two
>> different places is asking for trouble anyway.
> Sure it is, but we nevertheless want the hypervisor to be safe. So I
> consider what you currently have acceptable only if you can prove
> that nothing bad can happen no matter in what order multiple
> requests get issued - from looking over your code I wasn't able to
> convince myself of this.


Would a comment with what I said above (possibly a bit massaged) be 
sufficient?

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers
  2014-08-12 16:00       ` Jan Beulich
@ 2014-08-12 16:30         ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-12 16:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

On 08/12/2014 12:00 PM, Jan Beulich wrote:
>>>> On 12.08.14 at 17:47, <boris.ostrovsky@oracle.com> wrote:
>> On 08/12/2014 08:45 AM, Jan Beulich wrote:
>>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>>> @@ -476,18 +476,17 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>>>>        {
>>>>        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
>>>>            core2_vpmu_cxt->global_status &= ~msr_content;
>>>> -        return 1;
>>>> +        return 0;
>>>>        case MSR_CORE_PERF_GLOBAL_STATUS:
>>>>            gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
>>>>                     "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
>>>> -        hvm_inject_hw_exception(TRAP_gp_fault, 0);
>>>>            return 1;
>>> Suspicious: Most return values get flipped, but this one (and at least
>>> one more below) doesn't. Any such inconsistencies that are being
>>> corrected (assuming this is intentional) as you go should be spelled
>>> out in the description. And then the question of course is whether
>>> it's really necessary to flip the meaning of this and some similar SVM
>>> function's return values anyway.
>> The return value of 1 of vpmu_do_msr() will now indicate that the
>> routine encountered a fault as opposed to indicating whether the MSR
>> access was to a VPMU register.
> And how would it now indicate that latter fact?

It won't. Which is why I made changes to vmx_msr_read/write_intercept() 
to only call this routine when we know that a VPMU register is possibly 
accessed.

I thought about tri-stating vpmu_do_msr() returns (-1 for faults, 0 for 
no registers have been accessed, 1 for successful accesses) but decided 
against it.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags
  2014-08-12 16:25         ` Boris Ostrovsky
@ 2014-08-14 16:32           ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-14 16:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> On 12.08.14 at 18:25, <boris.ostrovsky@oracle.com> wrote:
> On 08/12/2014 11:35 AM, Jan Beulich wrote:
>>>>> On 12.08.14 at 17:12, <boris.ostrovsky@oracle.com> wrote:
>>> How about an inline function?
>> Yeah, that would perhaps do too. Albeit I'd still prefer all vPMU
>> logic to be handle in the called vPMU functions.
> 
> I was thinking about an inline in vpmu.h. Something like
> 
> inline void vpmu_next(struct vcpu *prev, struct vcpu *next)
> {
>       if ( is_hvm_vcpu(next) && (prev != next) && (vpmu_mode & 
> XENPMU_MODE_SELF) )
>          /* Must be done with interrupts enabled */
>           vpmu_load(next);
> }
> 
> (and similar for vpmu_save())

That doesn't change my opinion: Perhaps acceptable, but doing
isolation in the Linux like way rather than true separation.

>>> I think that's OK: one of these two competing requests will timeout in
>>> the while loop of vpmu_force_context_switch(). It will, in fact, likely
>>> be the first caller because the second one will get right into the while
>>> loop, without setting up the waiting data. This is a little
>>> counter-intuitive but trying to set mode simultaneously from two
>>> different places is asking for trouble anyway.
>> Sure it is, but we nevertheless want the hypervisor to be safe. So I
>> consider what you currently have acceptable only if you can prove
>> that nothing bad can happen no matter in what order multiple
>> requests get issued - from looking over your code I wasn't able to
>> convince myself of this.
> 
> 
> Would a comment with what I said above (possibly a bit massaged) be 
> sufficient?

Perhaps, as long as that comment supplies the requested proof.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h
  2014-08-11 16:15     ` Boris Ostrovsky
@ 2014-08-18 16:02       ` Boris Ostrovsky
  2014-08-18 20:23         ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2014-08-18 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, jun.nakajima, andrew.cooper3, tim, xen-devel,
	suravee.suthikulpanit

On 08/11/2014 12:15 PM, Boris Ostrovsky wrote:
> On 08/11/2014 10:08 AM, Jan Beulich wrote:
>>>>> On 08.08.14 at 18:55, <boris.ostrovsky@oracle.com> wrote:
>>>   static int amd_vpmu_initialise(struct vcpu *v)
>>>   {
>>> -    struct amd_vpmu_context *ctxt;
>>> +    struct xen_pmu_amd_ctxt *ctxt;
>>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>>       uint8_t family = current_cpu_data.x86;
>>>   @@ -382,7 +386,8 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>>        }
>>>       }
>>>   -    ctxt = xzalloc(struct amd_vpmu_context);
>>> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>>> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>> So I guess this is one of said sizeof()-s, and I continue to think this
>> would benefit from using a proper field. It is my understanding that
>> the real (non-C89 compliant) declaration of struct xen_pmu_amd_ctxt
>> would be
>>
>> struct xen_pmu_amd_ctxt {
>>      /* Offsets to counter and control MSRs (relative to 
>> xen_arch_pmu.c.amd) */
>>      uint32_t counters;
>>      uint32_t ctrls;
>>      uint64_t values[];
>> };
>
> OK, this should work.

Actually, no, it won't: we decided early on that register banks should 
be outside the fixed portion of the structure. Keeping values[] inside 
xen_pmu_amd_ctxt doesn't necessarily preclude this *if* it is never 
referred to by code and is only used for determining the type. But that 
would be an unreasonable requirement IMO.

I understand the dislike to sizeof(uint64_t). OTOH, your original 
concern was that we may forget to update size calculation if value 
changes type sometime later. But the type cannot be anything but 
uint64_t since it is used in rd/wrmsr().

I also considered introducing pmu_reg_t (typedef'd to uint64_t) but we'd 
be back again to rd/wrmsr() -- even though compiler may not flag it this 
still would be rather unnatural.

-boris

>
>>
>> Just like already done elsewhere you _can_ add variable-sized
>> arrays to the public interface as long as you guard them suitably.
>> And then your statement would become
>>
>>      ctxt = xzalloc_bytes(sizeof(*ctxt) +
>>                           2 * sizeof(*ctxt->values) * AMD_MAX_COUNTERS);
>>
>> making clear what the allocation is doing.
>>
>>> @@ -391,7 +396,11 @@ static int amd_vpmu_initialise(struct vcpu *v)
>>>           return -ENOMEM;
>>>       }
>>>   +    ctxt->counters = sizeof(struct xen_pmu_amd_ctxt);
>>> +    ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * 
>>> AMD_MAX_COUNTERS;
>>      ctxt->counters = sizeof(*ctxt);
>>      ctxt->ctrls = ctxt->counters + sizeof(*ctxt->values) * 
>> AMD_MAX_COUNTERS;
>>
>> or
>>
>>      ctxt->counters = offsetof(typeof(ctxt), values[0]);
>>      ctxt->ctrls = offsetof(typeof(ctxt), values[AMD_MAX_COUNTERS]);
>>
>>> @@ -228,6 +229,9 @@ void vpmu_initialise(struct vcpu *v)
>>>       struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>>       uint8_t vendor = current_cpu_data.x86_vendor;
>>>   +    BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > 
>>> XENPMU_CTXT_PAD_SZ);
>>> +    BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > 
>>> XENPMU_CTXT_PAD_SZ);
>> Don't these need to include the variable size array portions too?
>
> Not in BUILD_BUG_ON but I should check whether I have enough space 
> when computing ctxt->{ctrls | counters}.
>
> -boris
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h
  2014-08-18 16:02       ` Boris Ostrovsky
@ 2014-08-18 20:23         ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2014-08-18 20:23 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: kevin.tian, keir, suravee.suthikulpanit, andrew.cooper3, tim,
	xen-devel, jun.nakajima

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/18/14 6:02 PM >>>
>On 08/11/2014 12:15 PM, Boris Ostrovsky wrote:
>> On 08/11/2014 10:08 AM, Jan Beulich wrote:
>>>> +    ctxt = xzalloc_bytes(sizeof(struct xen_pmu_amd_ctxt) +
>>>> +                         2 * sizeof(uint64_t) * AMD_MAX_COUNTERS);
>>> So I guess this is one of said sizeof()-s, and I continue to think this
>>> would benefit from using a proper field. It is my understanding that
>>> the real (non-C89 compliant) declaration of struct xen_pmu_amd_ctxt
>>> would be
>>>
>>> struct xen_pmu_amd_ctxt {
>>>      /* Offsets to counter and control MSRs (relative to 
>>> xen_arch_pmu.c.amd) */
>>>      uint32_t counters;
>>>      uint32_t ctrls;
>>>      uint64_t values[];
>>> };
>>
>> OK, this should work.
>
>Actually, no, it won't: we decided early on that register banks should 
>be outside the fixed portion of the structure. Keeping values[] inside 
>xen_pmu_amd_ctxt doesn't necessarily preclude this *if* it is never 
>referred to by code and is only used for determining the type. But that 
>would be an unreasonable requirement IMO.
>
>I understand the dislike to sizeof(uint64_t). OTOH, your original 
>concern was that we may forget to update size calculation if value 
>changes type sometime later. But the type cannot be anything but 
>uint64_t since it is used in rd/wrmsr().

I see; stay with the explicit sizeof(uint64_t) here then, but please try
to use sizeof(variable-or-field) elsewhere when possible.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-08-18 20:23 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 16:55 [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 01/20] common/symbols: Export hypervisor symbols to privileged guest Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 02/20] x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() Boris Ostrovsky
2014-08-11 13:28   ` Jan Beulich
2014-08-11 15:35     ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 03/20] x86/VPMU: Set MSR bitmaps only for HVM/PVH guests Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 04/20] x86/VPMU: Make vpmu macros a bit more efficient Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 05/20] intel/VPMU: Clean up Intel VPMU code Boris Ostrovsky
2014-08-11 13:45   ` Jan Beulich
2014-08-11 16:01     ` Boris Ostrovsky
2014-08-11 16:13       ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 06/20] vmx: Merge MSR management routines Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 07/20] x86/VPMU: Handle APIC_LVTPC accesses Boris Ostrovsky
2014-08-11 13:49   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 08/20] intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 09/20] x86/VPMU: Add public xenpmu.h Boris Ostrovsky
2014-08-11 14:08   ` Jan Beulich
2014-08-11 16:15     ` Boris Ostrovsky
2014-08-18 16:02       ` Boris Ostrovsky
2014-08-18 20:23         ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 10/20] x86/VPMU: Make vpmu not HVM-specific Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 11/20] x86/VPMU: Interface for setting PMU mode and flags Boris Ostrovsky
2014-08-12 10:37   ` Jan Beulich
2014-08-12 15:12     ` Boris Ostrovsky
2014-08-12 15:35       ` Jan Beulich
2014-08-12 16:25         ` Boris Ostrovsky
2014-08-14 16:32           ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 12/20] x86/VPMU: Initialize PMU for PV(H) guests Boris Ostrovsky
2014-08-12 11:59   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 13/20] x86/VPMU: When handling MSR accesses, leave fault injection to callers Boris Ostrovsky
2014-08-12 12:45   ` Jan Beulich
2014-08-12 15:47     ` Boris Ostrovsky
2014-08-12 16:00       ` Jan Beulich
2014-08-12 16:30         ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 14/20] x86/VPMU: Add support for PMU register handling on PV guests Boris Ostrovsky
2014-08-12 12:55   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 15/20] x86/VPMU: Handle PMU interrupts for " Boris Ostrovsky
2014-08-12 12:58   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 16/20] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 17/20] x86/VPMU: Add privileged PMU mode Boris Ostrovsky
2014-08-12 13:06   ` Jan Beulich
2014-08-12 16:14     ` Boris Ostrovsky
2014-08-08 16:55 ` [PATCH v9 18/20] x86/VPMU: Save VPMU state for PV guests during context switch Boris Ostrovsky
2014-08-12 14:15   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 19/20] x86/VPMU: NMI-based VPMU support Boris Ostrovsky
2014-08-12 14:24   ` Jan Beulich
2014-08-08 16:55 ` [PATCH v9 20/20] x86/VPMU: Move VPMU files up from hvm/ directory Boris Ostrovsky
2014-08-12 14:26   ` Jan Beulich
2014-08-11 13:32 ` [PATCH v9 00/20] x86/PMU: Xen PMU PV(H) support Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.