[PATCH v8 00/14] Guest LBR Enabling

* [PATCH v8 00/14] Guest LBR Enabling
@ 2019-08-06  7:16 Wei Wang
  2019-08-06  7:16 ` [PATCH v8 01/14] perf/x86: fix the variable type of the lbr msrs Wei Wang
                   ` (15 more replies)
  0 siblings, 16 replies; 20+ messages in thread
From: Wei Wang @ 2019-08-06  7:16 UTC (permalink / raw)
  To: linux-kernel, kvm, ak, peterz, pbonzini
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh,
	arei.gonglei, jmattson

Last Branch Recording (LBR) is a performance monitor unit (PMU) feature
on Intel CPUs that captures branch related info. This patch series enables
this feature to KVM guests.

Each guest can be configured to expose this LBR feature to the guest via
userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR (patch 3).

About the lbr emulation method:
Since the vcpu get scheduled in, the lbr related msrs are made
interceptible. This makes guest first access to a lbr related msr always
vm-exit to kvm, so that kvm can know whether the lbr feature is used
during the vcpu time slice. The kvm lbr msr handler does the following
things:
  - create an lbr perf event (task pinned) for the vcpu thread.
    The perf event mainly serves 2 purposes:
      -- follow the host perf scheduling rules to manage the vcpu's usage
         of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus
         stopping the vcpu's use);
      -- have the host perf do context switching of the lbr state on the
         vcpu thread switching.
  - pass the lbr related msrs through to the guest.
    This enables the following guest accesses to the lbr related msrs
    without vm-exit, as long as the vcpu's lbr event owns the lbr feature.
    A cpu pinned lbr event on the host could come and take over the lbr
    feature via IPI calls. In this case, the pass-through will be
    cancelled (patch 13), and the guest following accesses to the lbr msrs
    will vm-exit to kvm and accesses will be forbidden in the handler.

If the guest doesn't touch any of the lbr related msrs (likely the guest
doesn't need to run lbr in the near future), the vcpu's lbr perf event
will be freed (please see patch 12 commit for more details).

* Tests
Conclusion: the profiling results on the guest are similar to that on the host.

Run: ./perf -b ./test_program

- Test on the host:
Overhead  Command  Source Shared Object  Source Symbol    Target Symbol   
  22.35%  ftest    libc-2.23.so          [.] __random     [.] __random        
   8.20%  ftest    ftest                 [.] qux          [.] qux             
   5.88%  ftest    ftest                 [.] random@plt   [.] __random        
   5.88%  ftest    libc-2.23.so          [.] __random     [.] __random_r  
   5.79%  ftest    ftest                 [.] main         [.] random@plt  
   5.60%  ftest    ftest                 [.] main         [.] foo             
   5.24%  ftest    libc-2.23.so          [.] __random     [.] main            
   5.20%  ftest    libc-2.23.so          [.] __random_r   [.] __random        
   5.00%  ftest    ftest                 [.] foo          [.] qux             
   4.91%  ftest    ftest                 [.] main         [.] bar             
   4.83%  ftest    ftest                 [.] bar          [.] qux             
   4.57%  ftest    ftest                 [.] main         [.] main            
   4.38%  ftest    ftest                 [.] foo          [.] main            
   4.13%  ftest    ftest                 [.] qux          [.] foo             
   3.89%  ftest    ftest                 [.] qux          [.] bar             
   3.86%  ftest    ftest                 [.] bar          [.] main            

- Test on the guest:
Overhead  Command  Source Shaged Object  Source Symbol    Target Symbol
  22.36%  ftest    libc-2.23.so          [.] random       [.] random  
   8.55%  ftest    ftest                 [.] qux          [.] qux                    
   5.79%  ftest    libc-2.23.so          [.] random       [.] random_r                     
   5.64%  ftest    ftest                 [.] random@plt   [.] random                     
   5.58%  ftest    ftest                 [.] main         [.] random@plt                       
   5.55%  ftest    ftest                 [.] main         [.] foo                       
   5.41%  ftest    libc-2.23.so          [.] random       [.] main                 
   5.31%  ftest    libc-2.23.so          [.] random_r     [.] random                      
   5.11%  ftest    ftest                 [.] foo          [.] qux                     
   4.93%  ftest    ftest                 [.] main         [.] main                     
   4.59%  ftest    ftest                 [.] qux          [.] bar                       
   4.49%  ftest    ftest                 [.] bar          [.] main                       
   4.42%  ftest    ftest                 [.] bar          [.] qux                       
   4.16%  ftest    ftest                 [.] main         [.] bar                       
   3.95%  ftest    ftest                 [.] qux          [.] foo                        
   3.79%  ftest    ftest                 [.] foo          [.] main
(due to the lib version difference, "random" is equavlent to __random above)

v7->v8 Changelog:
  - Patch 3:
    -- document KVM_CAP_X86_GUEST_LBR in api.txt
    -- make the check of KVM_CAP_X86_GUEST_LBR return the size of
       struct x86_perf_lbr_stack, to let userspace do a compatibility
       check.
  - Patch 7:
    -- support perf scheduler to not assign a counter for the perf event
       that has PERF_EV_CAP_NO_COUNTER set (rather than skipping the perf
       scheduler). This allows the scheduler to detect lbr usage conflicts
       via get_event_constraints, and lower priority events will finally
       fail to use lbr.
    -- define X86_PMC_IDX_NA as "-1", which represents a never assigned
       counter id. There are other places that use "-1", but could be
       updated to use the new macro in another patch series.
  - Patch 8:
    -- move the event->owner assignment into perf_event_alloc to have it
       set before event_init is called. Please see this patch's commit for
       reasons.
  - Patch 9:
    -- use "exclude_host" and "is_kernel_event" to decide if the lbr event
       is used for the vcpu lbr emulation, which doesn't need a counter,
       and removes the usage of the previous new perf_event_create API.
    -- remove the unused attr fields.
  - Patch 10:
    -- set a hardware reserved bit (bit 62 of LBR_SELECT) to reg->config
       for the vcpu lbr emulation event. This makes the config different
       from other host lbr event, so that they don't share the lbr.
       Please see the comments in the patch for the reasons why they
       shouldn't share.
  - Patch 12:
    -- disable interrupt and check if the vcpu lbr event owns the lbr
       feature before kvm writing to the lbr related msr. This avoids kvm
       updating the lbr msrs after lbr has been reclaimed by other events
       via ipi.
    -- remove arch v4 related support.
  - Patch 13:
    -- double check if the vcpu lbr event owns the lbr feature before
       vm-entry into the guest. The lbr pass-through will be cancelled if
       lbr feature has been reclaimed by a cpu pinned lbr event.

Previous:
https://lkml.kernel.org/r/1562548999-37095-1-git-send-email-wei.w.wang@intel.com

Wei Wang (14):
  perf/x86: fix the variable type of the lbr msrs
  perf/x86: add a function to get the addresses of the lbr stack msrs
  KVM/x86: KVM_CAP_X86_GUEST_LBR
  KVM/x86: intel_pmu_lbr_enable
  KVM/x86/vPMU: tweak kvm_pmu_get_msr
  KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  perf/x86: support to create a perf event without counter allocation
  perf/core: set the event->owner before event_init
  KVM/x86/vPMU: APIs to create/free lbr perf event for a vcpu thread
  perf/x86/lbr: don't share lbr for the vcpu usage case
  perf/x86: save/restore LBR_SELECT on vcpu switching
  KVM/x86/lbr: lbr emulation
  KVM/x86/vPMU: check the lbr feature before entering guest
  KVM/x86: remove the common handling of the debugctl msr

 Documentation/virt/kvm/api.txt    |  26 +++
 arch/x86/events/core.c            |  36 ++-
 arch/x86/events/intel/core.c      |   3 +
 arch/x86/events/intel/lbr.c       |  95 +++++++-
 arch/x86/events/perf_event.h      |   6 +-
 arch/x86/include/asm/kvm_host.h   |   5 +
 arch/x86/include/asm/perf_event.h |  17 ++
 arch/x86/kvm/cpuid.c              |   2 +-
 arch/x86/kvm/pmu.c                |  24 +-
 arch/x86/kvm/pmu.h                |  11 +-
 arch/x86/kvm/pmu_amd.c            |   7 +-
 arch/x86/kvm/vmx/pmu_intel.c      | 476 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c            |   4 +-
 arch/x86/kvm/vmx/vmx.h            |   2 +
 arch/x86/kvm/x86.c                |  47 ++--
 include/linux/perf_event.h        |  18 ++
 include/uapi/linux/kvm.h          |   1 +
 kernel/events/core.c              |  19 +-
 18 files changed, 738 insertions(+), 61 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread