linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [V6 00/11] perf: New conditional branch filter
@ 2014-05-05  9:09 Anshuman Khandual
  2014-05-05  9:09 ` [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND Anshuman Khandual
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

		This patchset is the re-spin of the original branch stack sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
also enables SW based branch filtering support for book3s powerpc platforms which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample test
application program (included here).

Changes in V2
=============
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=============
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code
(3) Added new instruction analysis functionality into powerpc code-patching library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

Changes in V4
=============
(1) Changed the commit message for patch (01/10)
(2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman
(3) Rebased the patchset against latest Linus's tree

Changes in V5
=============
(1) Added a precursor patch to cleanup the indentation problem in power_pmu_bhrb_read
(2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which improved the clarity
(3) Merged the previous 10th patch into the 8th patch
(4) Moved SW based branch analysis code from core perf into code-patching library as suggested by Michael
(5) Simplified the logic in branch analysis library
(6) Fixed some ambiguities in documentation at various places
(7) Added some more in-code documentation blocks at various places
(8) Renamed some local variable and function names
(9) Fixed some indentation and white space errors in the code
(10) Implemented almost all the review comments and suggestions made by Michael Ellerman on V4 patchset
(11) Enabled privilege mode SW branch filter
(12) Simplified and generalized the SW implemented conditional branch filter
(13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW implementation
(14) Adjusted other patches to deal with the above changes

Changes in V6
=============
(1) Rebased the patchset against the master
(2) Added "Reviewed-by: Andi Kleen" in the first four patches in the series which changes the
    generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130]

HW implemented branch filters
=============================

(1) perf record -j any_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object         Target Symbol
# ........  .......  ....................  .......................  ....................  ....................
#
     7.85%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_2   
     5.66%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_2        
     5.65%    cprog  cprog                 [.] hw_1_1               cprog                 [.] symbol1         
     5.42%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_3        
     5.40%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_1          
     5.40%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_1   
     5.40%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_1        
     5.39%    cprog  cprog                 [.] sw_4_2               cprog                 [.] lr_addr         
     5.39%    cprog  cprog                 [.] callme               cprog                 [.] sw_4_2          
     5.39%    cprog  [unknown]             [.] 00000000             cprog                 [.] ctr_addr        
     5.38%    cprog  cprog                 [.] hw_1_2               cprog                 [.] symbol2         
     5.38%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_2          
     5.16%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_3   
     5.15%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_2          
     5.14%    cprog  cprog                 [.] callme               cprog                 [.] hw_2_2          
     2.96%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_1          
     2.94%    cprog  cprog                 [.] callme               cprog                 [.] hw_2_1          
     2.71%    cprog  cprog                 [.] main                 cprog                 [.] callme          
     2.71%    cprog  [unknown]             [.] 00000000             cprog                 [.] lr_addr         
     2.70%    cprog  cprog                 [.] sw_4_1               cprog                 [.] ctr_addr        
     2.70%    cprog  cprog                 [.] callme               cprog                 [.] sw_4_1          
     0.09%    cprog  [unknown]             [.] 0xf7ad76c4           [unknown]             [.] 0xf7ac22c0      
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf             libc-2.11.2.so        [.] __errno_location
     0.00%    cprog  libc-2.11.2.so        [.] printf               libc-2.11.2.so        [.] vfprintf        
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] isatty          
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] mmap            
     0.00%    cprog  libc-2.11.2.so        [.] isatty               libc-2.11.2.so        [.] tcgetattr       
     0.00%    cprog  cprog                 [.] main                 [unknown]             [.] 0x10000950      
     0.00%    cprog  [unknown]             [.] 00000000             libc-2.11.2.so        [.] _IO_file_stat   
     0.00%    cprog  [unknown]             [.] 0xf7acfca4           cprog                 [.] _fini           
     0.00%    cprog  [unknown]             [k] 00000000             cprog                 [k] ctr_addr        
     0.00%    cprog  [unknown]             [k] 00000000             cprog                 [k] lr_addr         

SW implemented branch filters
=============================

(2) perf record -j cond -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object           Source Symbol  Target Shared Object           Target Symbol
# ........  .......  ....................  ......................  ....................  ......................
#
    25.82%    cprog  [unknown]             [.] 00000000            cprog                 [.] sw_3_1            
    12.66%    cprog  cprog                 [.] sw_4_2              cprog                 [.] lr_addr           
    12.63%    cprog  [unknown]             [.] 00000000            cprog                 [.] callme            
     9.42%    cprog  cprog                 [.] hw_2_2              cprog                 [.] address2          
     9.39%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_2     
     4.91%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_1     
     4.91%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_3     
     3.35%    cprog  cprog                 [.] sw_3_1_3            cprog                 [.] sw_3_1            
     3.34%    cprog  cprog                 [.] sw_3_1_1            cprog                 [.] sw_3_1            
     3.31%    cprog  cprog                 [.] hw_1_2              cprog                 [.] symbol2           
     3.31%    cprog  cprog                 [.] sw_4_1              cprog                 [.] ctr_addr          
     3.29%    cprog  cprog                 [.] hw_2_1              cprog                 [.] address1          
     3.27%    cprog  cprog                 [.] sw_3_1_2            cprog                 [.] sw_3_1            
     0.32%    cprog  [unknown]             [.] 0xf7c62328          [unknown]             [.] 0xf7c62320        
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf            libc-2.11.2.so        [.] vfprintf          
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_xsputn     libc-2.11.2.so        [.] _IO_file_xsputn   
     0.01%    cprog  libc-2.11.2.so        [.] _IO_default_xsputn  libc-2.11.2.so        [.] _IO_default_xsputn
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul           libc-2.11.2.so        [.] strchrnul         
     0.01%    cprog  [unknown]             [.] 00000000            libc-2.11.2.so        [.] _IO_file_xsputn   
     0.01%    cprog  [unknown]             [k] 00000000            cprog                 [k] callme            


(3) perf record -j any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
    15.61%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1           
     6.28%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2           
     6.28%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1           
     6.26%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1           
     6.24%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1           
     6.24%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme           
     6.21%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme           
     6.19%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2           
     3.16%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme           
     3.15%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1           
     3.15%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme           
     3.14%    cprog  cprog                 [.] callme             cprog                 [.] main             
     3.13%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme           
     3.13%    cprog  cprog                 [.] sw_3_1_1           cprog                 [.] sw_3_1           
     3.12%    cprog  cprog                 [.] back2              cprog                 [.] callme           
     3.12%    cprog  cprog                 [.] sw_3_1             cprog                 [.] callme           
     3.11%    cprog  cprog                 [.] back1              cprog                 [.] callme           
     3.11%    cprog  cprog                 [.] sw_3_1_2           cprog                 [.] sw_3_1           
     3.11%    cprog  cprog                 [.] sw_3_1_3           cprog                 [.] sw_3_1           
     3.10%    cprog  cprog                 [.] sw_3_2             cprog                 [.] callme           
     3.09%    cprog  cprog                 [.] success_3_1_2      cprog                 [.] sw_3_1           
     0.03%    cprog  [unknown]             [.] 0x100009b0         [unknown]             [.] 0xf7d5581c       
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow  libc-2.11.2.so        [.] _IO_file_xsputn  
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_setbuf    [unknown]             [.] 0x0fee1084       
     0.01%    cprog  [unknown]             [.] 0xf7d5589c         libc-2.11.2.so        [.] printf           
     0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_overflow
     0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_setbuf  
     0.01%    cprog  [unknown]             [k] 00000000           cprog                 [k] callme           

(4) perf record -j ind_call  -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  ..............  ....................  .....................
#
    42.59%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1           
    25.88%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr          
    25.65%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme           
     5.58%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr         
     0.23%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme           
     0.05%    cprog  [unknown]             [.] 00000000    [unknown]             [.] 0xf79fd740       
     0.03%    cprog  [unknown]             [.] 00000000    libc-2.11.2.so        [.] _IO_file_overflow


(5) perf record -j any_call,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object              Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .........................  ....................  .....................
#
    10.00%    cprog  [unknown]             [.] 00000000               cprog                 [.] sw_3_1           
     4.20%    cprog  cprog                 [.] sw_4_2                 cprog                 [.] lr_addr          
     4.17%    cprog  cprog                 [.] lr_addr                cprog                 [.] sw_4_2           
     4.16%    cprog  cprog                 [.] symbol1                cprog                 [.] hw_1_1           
     4.12%    cprog  [unknown]             [.] 00000000               cprog                 [.] callme           
     4.12%    cprog  cprog                 [.] symbol2                cprog                 [.] hw_1_2           
     4.11%    cprog  cprog                 [.] success_3_1_3          cprog                 [.] sw_3_1           
     4.11%    cprog  cprog                 [.] ctr_addr               cprog                 [.] sw_4_1           
     4.10%    cprog  cprog                 [.] sw_4_2                 cprog                 [.] callme           
     2.42%    cprog  cprog                 [.] callme                 cprog                 [.] sw_4_2           
     2.40%    cprog  cprog                 [.] sw_3_1_3               cprog                 [.] sw_3_1           
     2.40%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] sw_3_1_3         
     2.39%    cprog  cprog                 [.] hw_1_2                 cprog                 [.] symbol2          
     2.39%    cprog  cprog                 [.] back1                  cprog                 [.] callme           
     2.39%    cprog  cprog                 [.] sw_3_1_1               cprog                 [.] sw_3_1           
     2.39%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] sw_3_1_1         
     2.39%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] callme           
     2.39%    cprog  cprog                 [.] sw_4_1                 cprog                 [.] ctr_addr         
     2.39%    cprog  cprog                 [.] callme                 cprog                 [.] hw_1_2           
     2.39%    cprog  cprog                 [.] callme                 cprog                 [.] sw_3_1           
     2.39%    cprog  cprog                 [.] sw_3_1_2               cprog                 [.] sw_3_1           
     2.39%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] sw_3_1_2         
     2.38%    cprog  cprog                 [.] hw_1_1                 cprog                 [.] symbol1          
     2.38%    cprog  cprog                 [.] callme                 cprog                 [.] hw_1_1           
     1.78%    cprog  cprog                 [.] back2                  cprog                 [.] callme           
     1.78%    cprog  cprog                 [.] hw_1_1                 cprog                 [.] callme           
     1.76%    cprog  cprog                 [.] success_3_1_2          cprog                 [.] sw_3_1           
     1.76%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] success_3_1_2    
     1.76%    cprog  cprog                 [.] sw_3_2                 cprog                 [.] callme           
     1.76%    cprog  cprog                 [.] callme                 cprog                 [.] sw_3_2           
     1.73%    cprog  cprog                 [.] success_3_1_1          cprog                 [.] sw_3_1           
     1.73%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] success_3_1_1    
     1.73%    cprog  cprog                 [.] hw_1_2                 cprog                 [.] callme           
     1.71%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] success_3_1_3    
     1.71%    cprog  cprog                 [.] sw_4_1                 cprog                 [.] callme           
     1.71%    cprog  cprog                 [.] callme                 cprog                 [.] main             
     0.05%    cprog  [unknown]             [k] 00000000               cprog                 [k] callme           
     0.03%    cprog  [unknown]             [.] 0xf7aa9d4c             [unknown]             [.] 0xf7aa5f80       
     0.01%    cprog  libc-2.11.2.so        [.] __errno_location       libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf               libc-2.11.2.so        [.] __errno_location 
     0.01%    cprog  libc-2.11.2.so        [.] _IO_doallocbuf         libc-2.11.2.so        [.] _IO_file_overflow
     0.01%    cprog  cprog                 [.] __do_global_dtors_aux  [unknown]             [.] 0xf7a9fc74       
     0.01%    cprog  [unknown]             [.] 0xf7a9fca4             cprog                 [.] _fini            

(6) perf record -j any_call,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object           Source Symbol  Target Shared Object           Target Symbol
# ........  .......  ....................  ......................  ....................  ......................
#
    17.38%    cprog  [unknown]             [.] 00000000            cprog                 [.] sw_3_1            
     7.76%    cprog  cprog                 [.] sw_4_2              cprog                 [.] lr_addr           
     7.64%    cprog  [unknown]             [.] 00000000            cprog                 [.] callme            
     6.00%    cprog  cprog                 [.] sw_3_1              cprog                 [.] sw_3_1_1          
     6.00%    cprog  cprog                 [.] callme              cprog                 [.] sw_3_1            
     5.98%    cprog  cprog                 [.] sw_4_1              cprog                 [.] ctr_addr          
     5.97%    cprog  cprog                 [.] hw_1_1              cprog                 [.] symbol1           
     5.97%    cprog  cprog                 [.] hw_1_2              cprog                 [.] symbol2           
     5.97%    cprog  cprog                 [.] sw_3_1              cprog                 [.] sw_3_1_3          
     5.97%    cprog  cprog                 [.] callme              cprog                 [.] hw_1_1            
     5.97%    cprog  cprog                 [.] callme              cprog                 [.] hw_1_2            
     5.96%    cprog  cprog                 [.] callme              cprog                 [.] sw_4_2            
     5.95%    cprog  cprog                 [.] sw_3_1              cprog                 [.] sw_3_1_2          
     1.83%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_2     
     1.82%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_1     
     1.82%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_3     
     1.82%    cprog  cprog                 [.] callme              cprog                 [.] sw_3_2            
     0.14%    cprog  [unknown]             [k] 00000000            cprog                 [k] callme            
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf            libc-2.11.2.so        [.] strchrnul         
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_xsputn     libc-2.11.2.so        [.] _IO_default_xsputn
     0.01%    cprog  libc-2.11.2.so        [.] _IO_default_xsputn  libc-2.11.2.so        [.] _IO_file_overflow 
     0.01%    cprog  ld-2.11.2.so          [.] calloc              [unknown]             [.] 0xf795b390        
     0.01%    cprog  [unknown]             [.] 0x0fee00fc          libc-2.11.2.so        [.] _IO_file_overflow 
     0.01%    cprog  [unknown]             [.] 00000000            ld-2.11.2.so          [.] calloc            
     0.01%    cprog  [unknown]             [.] 0xf794b41c          [unknown]             [.] 0xf794ab70        

(7) perf record -j cond,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object           Source Symbol  Target Shared Object           Target Symbol
# ........  .......  ....................  ......................  ....................  ......................
#
    12.43%    cprog  [unknown]             [.] 00000000            cprog                 [.] sw_3_1            
     4.91%    cprog  cprog                 [.] lr_addr             cprog                 [.] sw_4_2            
     4.89%    cprog  [unknown]             [.] 00000000            cprog                 [.] callme            
     4.87%    cprog  cprog                 [.] sw_4_2              cprog                 [.] lr_addr           
     4.87%    cprog  cprog                 [.] symbol1             cprog                 [.] hw_1_1            
     4.19%    cprog  cprog                 [.] hw_2_2              cprog                 [.] address2          
     4.19%    cprog  cprog                 [.] back2               cprog                 [.] callme            
     4.19%    cprog  cprog                 [.] sw_3_2              cprog                 [.] callme            
     4.18%    cprog  cprog                 [.] hw_1_1              cprog                 [.] callme            
     4.18%    cprog  cprog                 [.] success_3_1_2       cprog                 [.] sw_3_1            
     4.18%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_2     
     4.16%    cprog  cprog                 [.] sw_4_2              cprog                 [.] callme            
     4.13%    cprog  cprog                 [.] ctr_addr            cprog                 [.] sw_4_1            
     4.12%    cprog  cprog                 [.] symbol2             cprog                 [.] hw_1_2            
     4.12%    cprog  cprog                 [.] success_3_1_3       cprog                 [.] sw_3_1            
     3.43%    cprog  cprog                 [.] callme              cprog                 [.] main              
     3.42%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_3     
     3.41%    cprog  cprog                 [.] success_3_1_1       cprog                 [.] sw_3_1            
     3.41%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_1     
     3.41%    cprog  cprog                 [.] sw_4_1              cprog                 [.] callme            
     3.40%    cprog  cprog                 [.] hw_1_2              cprog                 [.] callme            
     0.73%    cprog  cprog                 [.] sw_3_1_3            cprog                 [.] sw_3_1            
     0.73%    cprog  cprog                 [.] sw_4_1              cprog                 [.] ctr_addr          
     0.72%    cprog  cprog                 [.] hw_1_2              cprog                 [.] symbol2           
     0.72%    cprog  cprog                 [.] sw_3_1_1            cprog                 [.] sw_3_1            
     0.70%    cprog  cprog                 [.] hw_2_1              cprog                 [.] address1          
     0.70%    cprog  cprog                 [.] back1               cprog                 [.] callme            
     0.70%    cprog  cprog                 [.] sw_3_1_2            cprog                 [.] sw_3_1            
     0.70%    cprog  cprog                 [.] sw_3_1              cprog                 [.] callme            
     0.19%    cprog  [unknown]             [.] 0xf7c12328          [unknown]             [.] 0xf7c12320        
     0.01%    cprog  libc-2.11.2.so        [.] __errno_location    libc-2.11.2.so        [.] vfprintf          
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf            libc-2.11.2.so        [.] vfprintf          
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow   [unknown]             [.] 0x0fee0100        
     0.01%    cprog  libc-2.11.2.so        [.] _IO_default_xsputn  libc-2.11.2.so        [.] _IO_default_xsputn
     0.01%    cprog  [unknown]             [.] 00000000            libc-2.11.2.so        [.] _IO_file_overflow 

(8) perf record -j cond,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
# ........  .......  ....................  ..............  ....................  .................
#
    20.70%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1       
     9.99%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr      
     9.91%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme       
     9.45%    cprog  cprog                 [.] sw_3_1_3    cprog                 [.] sw_3_1       
     9.44%    cprog  cprog                 [.] hw_2_1      cprog                 [.] address1     
     9.43%    cprog  cprog                 [.] sw_3_1_1    cprog                 [.] sw_3_1       
     9.42%    cprog  cprog                 [.] hw_1_2      cprog                 [.] symbol2      
     9.42%    cprog  cprog                 [.] sw_3_1_2    cprog                 [.] sw_3_1       
     9.42%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr     
     0.65%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_1
     0.62%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_3
     0.56%    cprog  cprog                 [.] hw_2_2      cprog                 [.] address2     
     0.55%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_2
     0.29%    cprog  [unknown]             [.] 0xf7f72328  [unknown]             [.] 0xf7f72320   
     0.10%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme       
     0.02%    cprog  libc-2.11.2.so        [.] _IO_setb    libc-2.11.2.so        [.] _IO_setb     

(9) perf record -e branch-misses:u -j any_call,any_ret,ind_call,cond ./cprog

# Overhead  Command  Source Shared Object       Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  ..................  ....................  .......................
#
     9.31%    cprog  [unknown]             [.] 00000000        cprog                 [.] sw_3_1             
     4.04%    cprog  cprog                 [.] symbol1         cprog                 [.] hw_1_1             
     4.03%    cprog  cprog                 [.] lr_addr         cprog                 [.] sw_4_2             
     4.03%    cprog  cprog                 [.] sw_4_2          cprog                 [.] lr_addr            
     4.00%    cprog  [unknown]             [.] 00000000        cprog                 [.] callme             
     3.88%    cprog  cprog                 [.] ctr_addr        cprog                 [.] sw_4_1             
     3.87%    cprog  cprog                 [.] sw_4_2          cprog                 [.] callme             
     3.86%    cprog  cprog                 [.] symbol2         cprog                 [.] hw_1_2             
     3.86%    cprog  cprog                 [.] success_3_1_3   cprog                 [.] sw_3_1             
     2.49%    cprog  cprog                 [.] sw_4_1          cprog                 [.] ctr_addr           
     2.47%    cprog  cprog                 [.] hw_1_1          cprog                 [.] symbol1            
     2.47%    cprog  cprog                 [.] sw_3_1_1        cprog                 [.] sw_3_1             
     2.47%    cprog  cprog                 [.] sw_3_1          cprog                 [.] sw_3_1_1           
     2.47%    cprog  cprog                 [.] callme          cprog                 [.] hw_1_1             
     2.47%    cprog  cprog                 [.] callme          cprog                 [.] sw_3_1             
     2.47%    cprog  cprog                 [.] hw_1_2          cprog                 [.] symbol2            
     2.47%    cprog  cprog                 [.] hw_2_1          cprog                 [.] address1           
     2.47%    cprog  cprog                 [.] back1           cprog                 [.] callme             
     2.47%    cprog  cprog                 [.] sw_3_1_3        cprog                 [.] sw_3_1             
     2.47%    cprog  cprog                 [.] sw_3_1          cprog                 [.] sw_3_1_3           
     2.47%    cprog  cprog                 [.] sw_3_1          cprog                 [.] callme             
     2.47%    cprog  cprog                 [.] callme          cprog                 [.] hw_1_2             
     2.47%    cprog  cprog                 [.] callme          cprog                 [.] sw_4_2             
     2.46%    cprog  cprog                 [.] sw_3_1_2        cprog                 [.] sw_3_1             
     2.46%    cprog  cprog                 [.] sw_3_1          cprog                 [.] sw_3_1_2           
     1.57%    cprog  cprog                 [.] success_3_1_2   cprog                 [.] sw_3_1             
     1.57%    cprog  cprog                 [.] sw_3_1          cprog                 [.] success_3_1_2      
     1.57%    cprog  cprog                 [.] hw_1_1          cprog                 [.] callme             
     1.56%    cprog  cprog                 [.] hw_2_2          cprog                 [.] address2           
     1.56%    cprog  cprog                 [.] back2           cprog                 [.] callme             
     1.56%    cprog  cprog                 [.] sw_3_2          cprog                 [.] callme             
     1.56%    cprog  cprog                 [.] callme          cprog                 [.] sw_3_2             
     1.41%    cprog  cprog                 [.] success_3_1_1   cprog                 [.] sw_3_1             
     1.41%    cprog  cprog                 [.] sw_3_1          cprog                 [.] success_3_1_1      
     1.40%    cprog  cprog                 [.] sw_4_1          cprog                 [.] callme             
     1.39%    cprog  cprog                 [.] hw_1_2          cprog                 [.] callme             
     1.39%    cprog  cprog                 [.] sw_3_1          cprog                 [.] success_3_1_3      
     1.39%    cprog  cprog                 [.] callme          cprog                 [.] main               
     0.14%    cprog  [unknown]             [.] 0xf7d72328      [unknown]             [.] 0xf7d72320         
     0.03%    cprog  [unknown]             [k] 00000000        cprog                 [k] callme             
     0.01%    cprog  libc-2.11.2.so        [.] _IO_doallocbuf  libc-2.11.2.so        [.] _IO_doallocbuf     
     0.01%    cprog  libc-2.11.2.so        [.] printf          cprog                 [.] main               
     0.01%    cprog  libc-2.11.2.so        [.] _IO_doallocbuf  libc-2.11.2.so        [.] _IO_file_doallocate
     0.01%    cprog  ld-2.11.2.so          [.] malloc          [unknown]             [.] 0xf7d8b380         
     0.01%    cprog  cprog                 [.] main            [unknown]             [.] 0x0fe7f63c         
     0.01%    cprog  [unknown]             [.] 0xf7d8b388      ld-2.11.2.so          [.] __libc_memalign    
     0.01%    cprog  [unknown]             [.] 00000000        ld-2.11.2.so          [.] malloc             

Please refer to the V4 version of the patchset to learn about the sample test case and it's makefile.

Anshuman Khandual (11):
  perf: Add PERF_SAMPLE_BRANCH_COND
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter
  powerpc, perf: Re-arrange BHRB processing
  powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
  powerpc, perf: Change the name of HW PMU branch filter tracking variable
  powerpc, lib: Add new branch analysis support functions
  powerpc, perf: Enable SW filtering in branch stack sampling framework
  power8, perf: Adapt BHRB PMU configuration to work with SW filters
  powerpc, perf: Enable privilege mode SW branch filters

 arch/powerpc/include/asm/code-patching.h     |  16 ++
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/lib/code-patching.c             |  80 +++++++
 arch/powerpc/perf/core-book3s.c              | 323 ++++++++++++++++++++++-----
 arch/powerpc/perf/power8-pmu.c               |  70 ++++--
 arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
 include/uapi/linux/perf_event.h              |   3 +-
 tools/perf/Documentation/perf-record.txt     |   3 +-
 tools/perf/builtin-record.c                  |   1 +
 9 files changed, 429 insertions(+), 78 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record Anshuman Khandual
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Various architectures can provide
this functionality with either with HW filtering support (if present)
or with SW filtering of captured branch instructions.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..696f69b4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_ABORT_TX	= 1U << 7, /* transaction aborts */
 	PERF_SAMPLE_BRANCH_IN_TX	= 1U << 8, /* in transaction */
 	PERF_SAMPLE_BRANCH_NO_TX	= 1U << 9, /* not in transaction */
+	PERF_SAMPLE_BRANCH_COND		= 1U << 10, /* conditional branches */
 
-	PERF_SAMPLE_BRANCH_MAX		= 1U << 10, /* non-ABI */
+	PERF_SAMPLE_BRANCH_MAX		= 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
  2014-05-05  9:09 ` [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 03/11] x86, perf: Add conditional branch filtering support Anshuman Khandual
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8ce62ef..dfe6b9d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
 	BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
 	BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
 	BRANCH_END
 };
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 03/11] x86, perf: Add conditional branch filtering support
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
  2014-05-05  9:09 ` [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND Anshuman Khandual
  2014-05-05  9:09 ` [V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 04/11] perf, documentation: Description for conditional branch filter Anshuman Khandual
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 	if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
 		mask |= X86_BR_NO_TX;
 
+	if (br_type & PERF_SAMPLE_BRANCH_COND)
+		mask |= X86_BR_JCC;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 	 */
 	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
 					| LBR_FAR,
 	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 04/11] perf, documentation: Description for conditional branch filter
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (2 preceding siblings ...)
  2014-05-05  9:09 ` [V6 03/11] x86, perf: Add conditional branch filtering support Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 05/11] powerpc, perf: Re-arrange BHRB processing Anshuman Khandual
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index c71b0f3..d460049 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -184,9 +184,10 @@ following filters are defined:
 	- in_tx: only when the target is in a hardware transaction
 	- no_tx: only when the target is not in a hardware transaction
 	- abort_tx: only when the target is a hardware transaction abort
+	- cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, ind_call.
+The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
 levels are subject to permissions.  When sampling on multiple events, branch stack sampling
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 05/11] powerpc, perf: Re-arrange BHRB processing
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (3 preceding siblings ...)
  2014-05-05  9:09 ` [V6 04/11] perf, documentation: Description for conditional branch filter Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8 Anshuman Khandual
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch cleans up some existing indentation problem and
re-organizes the BHRB processing code with an helper function
named `update_branch_entry` making it more readable. This patch
does not change any functionality.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 102 ++++++++++++++++++++--------------------
 1 file changed, 52 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 4520c93..66bea54 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -402,11 +402,21 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	return target - (unsigned long)&instr + addr;
 }
 
+/* Update individual branch entry */
+void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred)
+{
+	cpuhw->bhrb_entries[u_index].from = from;
+	cpuhw->bhrb_entries[u_index].to = to;
+	cpuhw->bhrb_entries[u_index].mispred = pred;
+	cpuhw->bhrb_entries[u_index].predicted = ~pred;
+	return;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
 	u64 val;
-	u64 addr;
+	u64 addr, tmp;
 	int r_index, u_index, pred;
 
 	r_index = 0;
@@ -417,62 +427,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 		if (!val)
 			/* Terminal marker: End of valid BHRB entries */
 			break;
-		else {
-			addr = val & BHRB_EA;
-			pred = val & BHRB_PREDICTION;
 
-			if (!addr)
-				/* invalid entry */
-				continue;
+		addr = val & BHRB_EA;
+		pred = val & BHRB_PREDICTION;
 
-			/* Branches are read most recent first (ie. mfbhrb 0 is
-			 * the most recent branch).
-			 * There are two types of valid entries:
-			 * 1) a target entry which is the to address of a
-			 *    computed goto like a blr,bctr,btar.  The next
-			 *    entry read from the bhrb will be branch
-			 *    corresponding to this target (ie. the actual
-			 *    blr/bctr/btar instruction).
-			 * 2) a from address which is an actual branch.  If a
-			 *    target entry proceeds this, then this is the
-			 *    matching branch for that target.  If this is not
-			 *    following a target entry, then this is a branch
-			 *    where the target is given as an immediate field
-			 *    in the instruction (ie. an i or b form branch).
-			 *    In this case we need to read the instruction from
-			 *    memory to determine the target/to address.
+		if (!addr)
+			/* invalid entry */
+			continue;
+
+		/* Branches are read most recent first (ie. mfbhrb 0 is
+		 * the most recent branch).
+		 * There are two types of valid entries:
+		 * 1) a target entry which is the to address of a
+		 *    computed goto like a blr,bctr,btar.  The next
+		 *    entry read from the bhrb will be branch
+		 *    corresponding to this target (ie. the actual
+		 *    blr/bctr/btar instruction).
+		 * 2) a from address which is an actual branch.  If a
+		 *    target entry proceeds this, then this is the
+		 *    matching branch for that target.  If this is not
+		 *    following a target entry, then this is a branch
+		 *    where the target is given as an immediate field
+		 *    in the instruction (ie. an i or b form branch).
+		 *    In this case we need to read the instruction from
+		 *    memory to determine the target/to address.
+		 */
+		if (val & BHRB_TARGET) {
+			/* Target branches use two entries
+			 * (ie. computed gotos/XL form)
 			 */
+			tmp = addr;
 
+			/* Get from address in next entry */
+			val = read_bhrb(r_index++);
+			addr = val & BHRB_EA;
 			if (val & BHRB_TARGET) {
-				/* Target branches use two entries
-				 * (ie. computed gotos/XL form)
-				 */
-				cpuhw->bhrb_entries[u_index].to = addr;
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
-
-				/* Get from address in next entry */
-				val = read_bhrb(r_index++);
-				addr = val & BHRB_EA;
-				if (val & BHRB_TARGET) {
-					/* Shouldn't have two targets in a
-					   row.. Reset index and try again */
-					r_index--;
-					addr = 0;
-				}
-				cpuhw->bhrb_entries[u_index].from = addr;
-			} else {
-				/* Branches to immediate field 
-				   (ie I or B form) */
-				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to =
-					power_pmu_bhrb_to(addr);
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
+				/* Shouldn't have two targets in a
+				   row.. Reset index and try again */
+				r_index--;
+				addr = 0;
 			}
-			u_index++;
-
+			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
+		} else {
+			/* Branches to immediate field
+			   (ie I or B form) */
+			tmp = power_pmu_bhrb_to(addr);
+			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
 		}
+		u_index++;
 	}
 	cpuhw->bhrb_stack.nr = u_index;
 	return;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (4 preceding siblings ...)
  2014-05-05  9:09 ` [V6 05/11] powerpc, perf: Re-arrange BHRB processing Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable Anshuman Khandual
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch does some code re-arrangements to make it clear that
it ignores any separate privilege level branch filter request
and does not support any combinations of HW PMU branch filters.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index fe2763b..13f47f5 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,8 +635,6 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 {
-	u64 pmu_bhrb_filter = 0;
-
 	/* BHRB and regular PMU events share the same privilege state
 	 * filter configuration. BHRB is always recorded along with a
 	 * regular PMU event. As the privilege state filter is handled
@@ -644,20 +642,15 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 	 * PMU event, we ignore any separate BHRB specific request.
 	 */
 
-	/* No branch filter requested */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-		return pmu_bhrb_filter;
-
-	/* Invalid branch filter options - HW does not support */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-		return -1;
+	/* Ignore user, kernel, hv bits */
+	branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-		return -1;
+	/* No branch filter requested */
+	if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
+		return 0;
 
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
-		return pmu_bhrb_filter;
+	if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
+		return POWER8_MMCRA_IFM1;
 	}
 
 	/* Every thing else is unsupported */
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (5 preceding siblings ...)
  2014-05-05  9:09 ` [V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8 Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 08/11] powerpc, lib: Add new branch analysis support functions Anshuman Khandual
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch simply changes the name of the variable from 'bhrb_filter' to
'bhrb_hw_filter' in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch. This patch does not change any functionality.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 66bea54..1d7e909 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
 	int n_txn_start;
 
 	/* BHRB bits */
-	u64				bhrb_filter;	/* BHRB HW branch filter */
+	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -1298,7 +1298,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
 	mb();
 	if (cpuhw->bhrb_users)
-		ppmu->config_bhrb(cpuhw->bhrb_filter);
+		ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
 	write_mmcr0(cpuhw, mmcr0);
 
@@ -1405,7 +1405,7 @@ nocheck:
  out:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
 					event->attr.branch_sample_type);
 	}
 
@@ -1788,10 +1788,10 @@ static int power_pmu_event_init(struct perf_event *event)
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
 					event->attr.branch_sample_type);
 
-		if(cpuhw->bhrb_filter == -1)
+		if(cpuhw->bhrb_hw_filter == -1)
 			return -EOPNOTSUPP;
 	}
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 08/11] powerpc, lib: Add new branch analysis support functions
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (6 preceding siblings ...)
  2014-05-05  9:09 ` [V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework Anshuman Khandual
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Generic powerpc branch analysis support added in the code patching
library which will help the subsequent patch on SW based filtering
of branch records in perf.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h | 16 +++++++
 arch/powerpc/lib/code-patching.c         | 80 ++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index 97e02f9..39919d4 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,16 @@
 #define BRANCH_SET_LINK	0x1
 #define BRANCH_ABSOLUTE	0x2
 
+#define XL_FORM_LR  0x4C000020
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS    0x02800000
+#define BO_CTR       0x02000000
+#define BO_CRBI_OFF  0x00800000
+#define BO_CRBI_ON   0x01800000
+#define BO_CRBI_HINT 0x00400000
+
 unsigned int create_branch(const unsigned int *addr,
 			   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
@@ -56,4 +66,10 @@ static inline unsigned long ppc_function_entry(void *func)
 #endif
 }
 
+/* Perf branch filters */
+bool instr_is_return_branch(unsigned int instr);
+bool instr_is_conditional_branch(unsigned int instr);
+bool instr_is_func_call(unsigned int instr);
+bool instr_is_indirect_func_call(unsigned int instr);
+
 #endif /* _ASM_POWERPC_CODE_PATCHING_H */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index d5edbeb..a06f8b3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,6 +77,7 @@ static unsigned int branch_opcode(unsigned int instr)
 	return (instr >> 26) & 0x3F;
 }
 
+/* Forms of branch instruction */
 static int instr_is_branch_iform(unsigned int instr)
 {
 	return branch_opcode(instr) == 18;
@@ -87,6 +88,85 @@ static int instr_is_branch_bform(unsigned int instr)
 	return branch_opcode(instr) == 16;
 }
 
+static int instr_is_branch_xlform(unsigned int instr)
+{
+	return branch_opcode(instr) == 19;
+}
+
+/* Classification of XL-form instruction */
+static int is_xlform_lr(unsigned int instr)
+{
+	return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+/* BO field analysis (B-form or XL-form) */
+static int is_bo_always(unsigned int instr)
+{
+	return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+/* Link bit is set */
+static int is_branch_link_set(unsigned int instr)
+{
+	return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+/* 
+ * Generic software implemented branch filters used
+ * by perf branch stack sampling when PMU does not
+ * process them for some reason.
+ */
+
+/* PERF_SAMPLE_BRANCH_ANY_RETURN */
+bool instr_is_return_branch(unsigned int instr)
+{
+	/*
+	 * Conditional and unconditional branch to LR register
+	 * without seting the link register.
+	 */
+	if (is_xlform_lr(instr) && !is_branch_link_set(instr))
+		return true;
+
+	return false;
+}
+
+/* PERF_SAMPLE_BRANCH_COND */
+bool instr_is_conditional_branch(unsigned int instr)
+{
+	/* I-form instruction - excluded */
+	if (instr_is_branch_iform(instr))
+		return false;
+
+	/* B-form or XL-form instruction */
+	if (instr_is_branch_bform(instr) || instr_is_branch_xlform(instr))  {
+
+		/* Not branch always */
+		if (!is_bo_always(instr))
+			return true;
+	}
+	return false;
+}
+
+/* PERF_SAMPLE_BRANCH_ANY_CALL */
+bool instr_is_func_call(unsigned int instr)
+{
+	/* LR should be set */
+	if (is_branch_link_set(instr))
+		return true;
+
+	return false;
+}
+
+/* PERF_SAMPLE_BRANCH_IND_CALL */
+bool instr_is_indirect_func_call(unsigned int instr)
+{
+	/* XL-form instruction with LR set */
+	if (instr_is_branch_xlform(instr) && is_branch_link_set(instr))
+		return true;
+
+	return false;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
 	if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (7 preceding siblings ...)
  2014-05-05  9:09 ` [V6 08/11] powerpc, lib: Add new branch analysis support functions Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters Anshuman Khandual
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

	Introduced two new variables track various filter values and mask

	(a) bhrb_sw_filter	Tracks SW implemented branch filter flags
	(b) bhrb_filter		Tracks both (SW and HW) branch filter flags

(2) Event creation

	Kernel will figure out supported BHRB branch filters through a PMU call
	back 'bhrb_filter_map'. This function will find out how many of the
	requested branch filters can be supported in the PMU HW. It will not
	try to invalidate any branch filter combinations. Event creation will not
	error out because of lack of HW based branch filters. Meanwhile it will
	track the overall supported branch filters in the 'bhrb_filter' variable.

	Once the PMU call back returns kernel will process the user branch filter
	request against available SW filters (bhrb_sw_filter_map) while looking
	at the 'bhrb_filter'. During this phase all the branch filters which are
	still pending from the user requested list will have to be supported in
	SW failing which the event creation will error out.

(3) SW branch filter

	During the BHRB data capture inside the PMU interrupt context, each
	of the captured 'perf_branch_entry.from' will be checked for compliance
	with applicable SW branch filters. If the entry does not conform to the
	filter requirements, it will be discarded from the final perf branch
	stack buffer.

(4) Supported SW based branch filters

	(a) PERF_SAMPLE_BRANCH_ANY_RETURN
	(b) PERF_SAMPLE_BRANCH_IND_CALL
	(c) PERF_SAMPLE_BRANCH_ANY_CALL
	(d) PERF_SAMPLE_BRANCH_COND

	Please refer the patch to understand the classification of instructions
	into these branch filter categories.

(5) Multiple branch filter semantics

	Book3 sever implementation follows the same OR semantics (as implemented in
	x86) while dealing with multiple branch filters at any point of time. SW
	branch filter analysis is carried on the data set captured in the PMU HW.
	So the resulting set of data (after applying the SW filters) will inherently
	be an AND with the HW captured set. Hence any combination of HW and SW branch
	filters will be invalid. HW based branch filters are more efficient and faster
	compared to SW implemented branch filters. So at first the PMU should decide
	whether it can support all the requested branch filters itself or not. In case
	it can support all the branch filters in an OR manner, we dont apply any SW
	branch filter on top of the HW captured set (which is the final set). This
	preserves the OR semantic of multiple branch filters as required. But in case
	where the PMU cannot support all the requested branch filters in an OR manner,
	it should not apply any it's filters and leave it upto the SW to handle them
	all. Its the PMU code's responsibility to uphold this protocol to be able to
	conform to the overall OR semantic of perf branch stack sampling framework.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c              | 188 ++++++++++++++++++++++++++-
 arch/powerpc/perf/power8-pmu.c               |   2 +-
 3 files changed, 187 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 9ed73714..93a9a8a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -19,6 +19,10 @@
 #define MAX_EVENT_ALTERNATIVES	8
 #define MAX_LIMITED_HWCOUNTERS	2
 
+#define for_each_branch_sample_type(x) \
+        for ((x) = PERF_SAMPLE_BRANCH_USER; \
+             (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -35,7 +39,7 @@ struct power_pmu {
 				unsigned long *valp);
 	int		(*get_alternatives)(u64 event_id, unsigned int flags,
 				u64 alt[]);
-	u64             (*bhrb_filter_map)(u64 branch_sample_type);
+	u64             (*bhrb_filter_map)(u64 branch_sample_type, u64 *bhrb_filter);
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1d7e909..a94cc43 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -46,8 +46,9 @@ struct cpu_hw_events {
 	unsigned int group_flag;
 	int n_txn_start;
 
-	/* BHRB bits */
 	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
+	u64				bhrb_sw_filter;	/* BHRB SW branch filter */
+	u64				bhrb_filter;	/* Branch filter mask */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -412,6 +413,152 @@ void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64
 	return;
 }
 
+/*
+ * Instruction opcode analysis
+ *
+ * Analyse instruction opcodes and classify them
+ * into various branch filter options available.
+ * This follows the standard semantics of OR which
+ * means that instructions which conforms to `any`
+ * of the requested branch filters get picked up.
+ */
+static bool check_instruction(unsigned int *addr, u64 sw_filter)
+{
+	if (sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+		if (instr_is_return_branch(*addr))
+			return true;
+	}
+
+	if (sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
+		if (instr_is_indirect_func_call(*addr))
+			return true;
+	}
+
+	if (sw_filter & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		if (instr_is_func_call(*addr))
+			return true;
+	}
+
+	if (sw_filter & PERF_SAMPLE_BRANCH_COND) {
+		if (instr_is_conditional_branch(*addr))
+			return true;
+	}
+	return false;
+}
+
+/* 
+ * Access the instruction contained in the address and check
+ * whether it complies with the applicable SW branch filters.
+ */
+static bool keep_branch(u64 from, u64 sw_filter)
+{
+	unsigned int instr;
+	bool ret;
+
+	/*
+	 * The "from" branch for every branch record has to go
+	 * through this filter verification. So this quick check
+	 * here for no SW filters will improve performance.
+	 */
+	if (sw_filter == 0)
+		return true;
+
+	if (is_kernel_addr(from)) {
+		return check_instruction((unsigned int *) from, sw_filter);
+	} else {
+		/*
+		 * Userspace address needs to be
+		 * copied first before analysis.
+		 */
+		pagefault_disable();
+		ret =  __get_user_inatomic(instr, (unsigned int __user *) from);
+
+		/*
+		 * If the instruction could not be accessible
+		 * from user space, we still 'okay' the entry.
+		 */
+		if (ret) {
+			pagefault_enable();
+			return true;
+		}
+		pagefault_enable();
+		return check_instruction(&instr, sw_filter);
+	}
+}
+
+/*
+ * Validate whether all the requested branch filters
+ * are getting processed either in the PMU or in SW.
+ */
+static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter)
+{
+	u64 x;
+
+	if (bhrb_filter == PERF_SAMPLE_BRANCH_ANY)
+		return true;
+
+	for_each_branch_sample_type(x) {
+		if (!(branch_sample_type & x))
+			continue;
+		/*
+		 * Privilege filter requests have been already
+		 * taken care during the base PMU configuration.
+		 */
+		if ((x == PERF_SAMPLE_BRANCH_USER)
+			|| (x == PERF_SAMPLE_BRANCH_KERNEL)
+				|| (x == PERF_SAMPLE_BRANCH_HV))
+			continue;
+
+		/*
+		 * Requested filter not available either
+		 * in PMU or in SW.
+		 */
+		if (!(bhrb_filter & x))
+			return false;
+	}
+	return true;
+}
+
+/* SW implemented branch filters */
+static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL,
+					  PERF_SAMPLE_BRANCH_COND,
+					  PERF_SAMPLE_BRANCH_ANY_RETURN,
+					  PERF_SAMPLE_BRANCH_IND_CALL };
+
+/*
+ * Required SW based branch filters
+ *
+ * This is called after figuring out what all branch filters the
+ * PMU HW supports for the requested branch filter set. Here we
+ * will go through all the SW implemented branch filters one by
+ * one and pick them up if its not already supported in the PMU.
+ */
+static u64 bhrb_sw_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
+{
+	u64 branch_sw_filter = 0;
+	unsigned int i;
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		WARN_ON(*bhrb_filter != PERF_SAMPLE_BRANCH_ANY);
+		return branch_sw_filter;
+	}
+
+	/*
+	 * PMU supported branch filters must be implemented in SW
+	 * when the PMU is unable to process them for some reason.
+	 */
+	for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) {
+		if (branch_sample_type & power_sw_filter[i]) {
+			if (!(*bhrb_filter & power_sw_filter[i])) {
+				branch_sw_filter |= power_sw_filter[i];
+				*bhrb_filter |= power_sw_filter[i];
+			}
+		}
+	}
+
+	return branch_sw_filter;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
@@ -474,6 +621,11 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 			tmp = power_pmu_bhrb_to(addr);
 			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
 		}
+
+		/* Apply SW branch filters and drop the entry if required */
+		if (!keep_branch(cpuhw->bhrb_entries[u_index].from,
+						cpuhw->bhrb_sw_filter))
+			u_index--;
 		u_index++;
 	}
 	cpuhw->bhrb_stack.nr = u_index;
@@ -1297,6 +1449,8 @@ static void power_pmu_enable(struct pmu *pmu)
 	mmcr0 = ebb_switch_in(ebb, cpuhw->mmcr[0]);
 
 	mb();
+
+	/* Enable PMU based branch filters */
 	if (cpuhw->bhrb_users)
 		ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
@@ -1405,8 +1559,12 @@ nocheck:
  out:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
-		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
+					(event->attr.branch_sample_type,
+							&cpuhw->bhrb_filter);
+		cpuhw->bhrb_sw_filter = bhrb_sw_filter_map
+					(event->attr.branch_sample_type,
+							&cpuhw->bhrb_filter);
 	}
 
 	perf_pmu_enable(event->pmu);
@@ -1787,11 +1945,27 @@ static int power_pmu_event_init(struct perf_event *event)
 	cpuhw = &get_cpu_var(cpu_hw_events);
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
+	/*
+	 * BHRB branch filters implemented in PMU will take
+	 * effect when we enable the event and data set
+	 * collected thereafter will be compliant with those
+	 * branch filters. Where as the SW branch filters will
+	 * be applied during the post processing of BHRB data.
+	 */
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
-
-		if(cpuhw->bhrb_hw_filter == -1)
+		/* Query available PMU branch filter support */
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
+				(event->attr.branch_sample_type,
+						&cpuhw->bhrb_filter);
+
+		/* Query available SW branch filter support */
+		cpuhw->bhrb_sw_filter = bhrb_sw_filter_map
+				(event->attr.branch_sample_type,
+						&cpuhw->bhrb_filter);
+
+		/* Check overall coverage of branch filter request */
+		if(!all_filters_covered(event->attr.branch_sample_type,
+						cpuhw->bhrb_filter))
 			return -EOPNOTSUPP;
 	}
 
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 13f47f5..699b1dd 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -633,7 +633,7 @@ static int power8_generic_events[] = {
 	[PERF_COUNT_HW_CACHE_MISSES] =			PM_LD_MISS_L1,
 };
 
-static u64 power8_bhrb_filter_map(u64 branch_sample_type)
+static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 {
 	/* BHRB and regular PMU events share the same privilege state
 	 * filter configuration. BHRB is always recorded along with a
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (8 preceding siblings ...)
  2014-05-05  9:09 ` [V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-05  9:09 ` [V6 11/11] powerpc, perf: Enable privilege mode SW branch filters Anshuman Khandual
  2014-05-27 12:09 ` [V6 00/11] perf: New conditional branch filter Stephane Eranian
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU can only handle one HW based branch filter request at any point of time.
For all other combinations PMU will pass it on to the SW.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 50 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 699b1dd..4743bde 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -635,6 +635,16 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 {
+	u64 x, pmu_bhrb_filter;
+	pmu_bhrb_filter = 0;
+	*bhrb_filter = 0;
+
+	/* No branch filter requested */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		*bhrb_filter = PERF_SAMPLE_BRANCH_ANY;
+		return pmu_bhrb_filter;
+	}
+
 	/* BHRB and regular PMU events share the same privilege state
 	 * filter configuration. BHRB is always recorded along with a
 	 * regular PMU event. As the privilege state filter is handled
@@ -645,16 +655,42 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 	/* Ignore user, kernel, hv bits */
 	branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
-	/* No branch filter requested */
-	if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
-		return 0;
+	/*
+	 * P8 does not support oring of PMU HW branch filters. Hence
+	 * if multiple branch filters are requested which includes filters
+	 * supported in PMU, still go ahead and clear the PMU based HW branch
+	 * filter component as in this case all the filters will be processed
+	 * in SW.
+	 */
 
-	if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
-		return POWER8_MMCRA_IFM1;
+	for_each_branch_sample_type(x) {
+		/* Ignore privilege branch filters */
+		if ((x == PERF_SAMPLE_BRANCH_USER)
+			|| (x == PERF_SAMPLE_BRANCH_KERNEL)
+				|| (x == PERF_SAMPLE_BRANCH_HV))
+			continue;
+
+		if (!(branch_sample_type & x))
+			continue;
+
+		/* Supported individual PMU branch filters */
+		if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+			branch_sample_type &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+			if (branch_sample_type) {
+				/* Multiple branch filters will be processed in SW */
+				pmu_bhrb_filter = 0;
+				*bhrb_filter = 0;
+				return pmu_bhrb_filter;
+			} else {
+				/* Individual branch filter will be processed in PMU */
+				pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+				*bhrb_filter    |= PERF_SAMPLE_BRANCH_ANY_CALL;
+				return pmu_bhrb_filter;
+			}
+		}
 	}
 
-	/* Every thing else is unsupported */
-	return -1;
+	return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [V6 11/11] powerpc, perf: Enable privilege mode SW branch filters
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (9 preceding siblings ...)
  2014-05-05  9:09 ` [V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters Anshuman Khandual
@ 2014-05-05  9:09 ` Anshuman Khandual
  2014-05-27 12:09 ` [V6 00/11] perf: New conditional branch filter Stephane Eranian
  11 siblings, 0 replies; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-05  9:09 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch enables privilege mode SW branch filters. Also modifies
POWER8 PMU branch filter configuration so that the privilege mode
branch filter implemented as part of base PMU event configuration
is reflected in bhrb filter mask. As a result, the SW will skip and
not try to process the privilege mode branch filters itself.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 53 +++++++++++++++++++++++++++++++----------
 arch/powerpc/perf/power8-pmu.c  | 13 ++++++++--
 2 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a94cc43..297cddb 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -26,6 +26,9 @@
 #define BHRB_PREDICTION		0x0000000000000001
 #define BHRB_EA			0xFFFFFFFFFFFFFFFCUL
 
+#define POWER_ADDR_USER		0
+#define POWER_ADDR_KERNEL	1
+
 struct cpu_hw_events {
 	int n_events;
 	int n_percpu;
@@ -450,10 +453,10 @@ static bool check_instruction(unsigned int *addr, u64 sw_filter)
  * Access the instruction contained in the address and check
  * whether it complies with the applicable SW branch filters.
  */
-static bool keep_branch(u64 from, u64 sw_filter)
+static bool keep_branch(u64 from, u64 to, u64 sw_filter)
 {
 	unsigned int instr;
-	bool ret;
+	bool to_plm, ret, flag;
 
 	/*
 	 * The "from" branch for every branch record has to go
@@ -463,6 +466,37 @@ static bool keep_branch(u64 from, u64 sw_filter)
 	if (sw_filter == 0)
 		return true;
 
+	to_plm = is_kernel_addr(to) ? POWER_ADDR_KERNEL : POWER_ADDR_USER;
+
+	/*
+	 * Applying privilege mode SW branch filters first on the
+	 * 'to' address makes an AND semantic with the SW generic
+	 * branch filters (OR with each other) being applied on the
+	 * from address there after.
+	 */
+
+	/* Ignore PERF_SAMPLE_BRANCH_HV */
+	sw_filter &= ~PERF_SAMPLE_BRANCH_HV;
+
+	/* Privilege mode branch filters for "TO" address */
+	if (sw_filter & PERF_SAMPLE_BRANCH_PLM_ALL) {
+		flag = false;
+
+		if (sw_filter & PERF_SAMPLE_BRANCH_USER) {
+			if(to_plm == POWER_ADDR_USER)
+				flag = true;
+		}
+
+		if (sw_filter & PERF_SAMPLE_BRANCH_KERNEL) {
+			if(to_plm == POWER_ADDR_KERNEL)
+				flag = true;
+		}
+
+		if (!flag)
+			return false;
+	}
+
+	/* Generic branch filters for "FROM" address */
 	if (is_kernel_addr(from)) {
 		return check_instruction((unsigned int *) from, sw_filter);
 	} else {
@@ -501,15 +535,6 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter)
 		if (!(branch_sample_type & x))
 			continue;
 		/*
-		 * Privilege filter requests have been already
-		 * taken care during the base PMU configuration.
-		 */
-		if ((x == PERF_SAMPLE_BRANCH_USER)
-			|| (x == PERF_SAMPLE_BRANCH_KERNEL)
-				|| (x == PERF_SAMPLE_BRANCH_HV))
-			continue;
-
-		/*
 		 * Requested filter not available either
 		 * in PMU or in SW.
 		 */
@@ -520,7 +545,10 @@ static int all_filters_covered(u64 branch_sample_type, u64 bhrb_filter)
 }
 
 /* SW implemented branch filters */
-static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_ANY_CALL,
+static unsigned int power_sw_filter[] = { PERF_SAMPLE_BRANCH_USER,
+					  PERF_SAMPLE_BRANCH_KERNEL,
+					  PERF_SAMPLE_BRANCH_HV,
+					  PERF_SAMPLE_BRANCH_ANY_CALL,
 					  PERF_SAMPLE_BRANCH_COND,
 					  PERF_SAMPLE_BRANCH_ANY_RETURN,
 					  PERF_SAMPLE_BRANCH_IND_CALL };
@@ -624,6 +652,7 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 
 		/* Apply SW branch filters and drop the entry if required */
 		if (!keep_branch(cpuhw->bhrb_entries[u_index].from,
+					cpuhw->bhrb_entries[u_index].to,
 						cpuhw->bhrb_sw_filter))
 			u_index--;
 		u_index++;
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 4743bde..b6e21da 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -649,9 +649,19 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 	 * filter configuration. BHRB is always recorded along with a
 	 * regular PMU event. As the privilege state filter is handled
 	 * in the basic PMC configuration of the accompanying regular
-	 * PMU event, we ignore any separate BHRB specific request.
+	 * PMU event, we ignore any separate BHRB specific request. But
+	 * this needs to be communicated with the branch filter mask.
 	 */
 
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_USER)
+		*bhrb_filter |= PERF_SAMPLE_BRANCH_USER;
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL)
+		*bhrb_filter |= PERF_SAMPLE_BRANCH_KERNEL;
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_HV)
+		*bhrb_filter |= PERF_SAMPLE_BRANCH_HV;
+
 	/* Ignore user, kernel, hv bits */
 	branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
 
@@ -679,7 +689,6 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *bhrb_filter)
 			if (branch_sample_type) {
 				/* Multiple branch filters will be processed in SW */
 				pmu_bhrb_filter = 0;
-				*bhrb_filter = 0;
 				return pmu_bhrb_filter;
 			} else {
 				/* Individual branch filter will be processed in PMU */
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [V6 00/11] perf: New conditional branch filter
  2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
                   ` (10 preceding siblings ...)
  2014-05-05  9:09 ` [V6 11/11] powerpc, perf: Enable privilege mode SW branch filters Anshuman Khandual
@ 2014-05-27 12:09 ` Stephane Eranian
  2014-05-28  8:04   ` Anshuman Khandual
  11 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2014-05-27 12:09 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Michael Neuling, ak, Peter Zijlstra, LKML, Michael Ellerman,
	Linux PPC dev, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu,
	Ingo Molnar

Hi,


On Mon, May 5, 2014 at 11:09 AM, Anshuman Khandual
<khandual@linux.vnet.ibm.com> wrote:
>
>                 This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> also enables SW based branch filtering support for book3s powerpc platforms which
> have PMU HW backed branch stack sampling support.
>
> Summary of code changes in this patchset:
>
> (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> (2) Add the "cond" branch filter options in the "perf record" tool
> (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform
> (5) Update the documentation regarding "perf record" tool
> (6) Add some new powerpc instruction analysis functions in code-patching library
> (7) Enable SW based branch filter support for powerpc book3s
> (8) Changed BHRB configuration in POWER8 to accommodate SW branch filters
>
I have been looking at those patches and ran some tests.
And I found a few issues so far.

I am running:
$ perf record -j any_ret -e cycles:u test_program
$ perf report -D

Most entries are okay and match the filter, however some do not make sense:

3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921:
0x10001170 period: 613678 addr: 0
.... branch stack: nr:9
.....  0: 00000000100011cc -> 0000000010000e38
.....  1: 0000000010001150 -> 00000000100011bc
.....  2: 0000000010001208 -> 0000000010000e38
.....  3: 0000000010001160 -> 00000000100011f8
.....  4: 00000000100011cc -> 0000000010000e38
.....  5: 0000000010001150 -> 00000000100011bc
.....  6: 0000000010001208 -> 0000000010000e38
.....  7: 0000000010001160 -> 00000000100011f8
.....  8: 0000000000000000 -> 0000000010001160
^^^^^^
Entry 8 does not make sense, unless 0x0 is a valid return branch
instruction address.
If an address is invalid, the whole entry needs to be eliminated. It
is okay to have
less than the max number of entries supported by HW.

I also had cases where monitoring only at the user level, got me
branch addresses in the
0xc0000000...... range. My test program is linked statically.

when eliminating the bogus entries, my tests yielded only return
branch instruction addresses
which is good. Will run more tests.


> With this new SW enablement, the branch filter support for book3s platforms have
> been extended to include all these combinations discussed below with a sample test
> application program (included here).
>
> Changes in V2
> =============
> (1) Enabled PPC64 SW branch filtering support
> (2) Incorporated changes required for all previous comments
>
> Changes in V3
> =============
> (1) Split the SW branch filter enablement into multiple patches
> (2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code
> (3) Added new instruction analysis functionality into powerpc code-patching library
> (4) Changed name for some of the functions
> (5) Fixed couple of spelling mistakes
> (6) Changed code documentation in multiple places
>
> Changes in V4
> =============
> (1) Changed the commit message for patch (01/10)
> (2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman
> (3) Rebased the patchset against latest Linus's tree
>
> Changes in V5
> =============
> (1) Added a precursor patch to cleanup the indentation problem in power_pmu_bhrb_read
> (2) Added a precursor patch to re-arrange P8 PMU BHRB filter config which improved the clarity
> (3) Merged the previous 10th patch into the 8th patch
> (4) Moved SW based branch analysis code from core perf into code-patching library as suggested by Michael
> (5) Simplified the logic in branch analysis library
> (6) Fixed some ambiguities in documentation at various places
> (7) Added some more in-code documentation blocks at various places
> (8) Renamed some local variable and function names
> (9) Fixed some indentation and white space errors in the code
> (10) Implemented almost all the review comments and suggestions made by Michael Ellerman on V4 patchset
> (11) Enabled privilege mode SW branch filter
> (12) Simplified and generalized the SW implemented conditional branch filter
> (13) PERF_SAMPLE_BRANCH_COND filter is now supported only through SW implementation
> (14) Adjusted other patches to deal with the above changes
>
> Changes in V6
> =============
> (1) Rebased the patchset against the master
> (2) Added "Reviewed-by: Andi Kleen" in the first four patches in the series which changes the
>     generic or X86 perf code. [https://lkml.org/lkml/2014/4/7/130]
>
> HW implemented branch filters
> =============================
>
> (1) perf record -j any_call -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object         Target Symbol
> # ........  .......  ....................  .......................  ....................  ....................
> #
>      7.85%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_2
>      5.66%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_2
>      5.65%    cprog  cprog                 [.] hw_1_1               cprog                 [.] symbol1
>      5.42%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_3
>      5.40%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_1
>      5.40%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_1
>      5.40%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_1
>      5.39%    cprog  cprog                 [.] sw_4_2               cprog                 [.] lr_addr
>      5.39%    cprog  cprog                 [.] callme               cprog                 [.] sw_4_2
>      5.39%    cprog  [unknown]             [.] 00000000             cprog                 [.] ctr_addr
>      5.38%    cprog  cprog                 [.] hw_1_2               cprog                 [.] symbol2
>      5.38%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_2
>      5.16%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_3
>      5.15%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_2
>      5.14%    cprog  cprog                 [.] callme               cprog                 [.] hw_2_2
>      2.96%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_1
>      2.94%    cprog  cprog                 [.] callme               cprog                 [.] hw_2_1
>      2.71%    cprog  cprog                 [.] main                 cprog                 [.] callme
>      2.71%    cprog  [unknown]             [.] 00000000             cprog                 [.] lr_addr
>      2.70%    cprog  cprog                 [.] sw_4_1               cprog                 [.] ctr_addr
>      2.70%    cprog  cprog                 [.] callme               cprog                 [.] sw_4_1
>      0.09%    cprog  [unknown]             [.] 0xf7ad76c4           [unknown]             [.] 0xf7ac22c0
>      0.00%    cprog  libc-2.11.2.so        [.] vfprintf             libc-2.11.2.so        [.] __errno_location
>      0.00%    cprog  libc-2.11.2.so        [.] printf               libc-2.11.2.so        [.] vfprintf
>      0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] isatty
>      0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] mmap
>      0.00%    cprog  libc-2.11.2.so        [.] isatty               libc-2.11.2.so        [.] tcgetattr
>      0.00%    cprog  cprog                 [.] main                 [unknown]             [.] 0x10000950
>      0.00%    cprog  [unknown]             [.] 00000000             libc-2.11.2.so        [.] _IO_file_stat
>      0.00%    cprog  [unknown]             [.] 0xf7acfca4           cprog                 [.] _fini
>      0.00%    cprog  [unknown]             [k] 00000000             cprog                 [k] ctr_addr
>      0.00%    cprog  [unknown]             [k] 00000000             cprog                 [k] lr_addr
>
> SW implemented branch filters
> =============================
>
> (2) perf record -j cond -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object           Source Symbol  Target Shared Object           Target Symbol
> # ........  .......  ....................  ......................  ....................  ......................
> #
>     25.82%    cprog  [unknown]             [.] 00000000            cprog                 [.] sw_3_1
>     12.66%    cprog  cprog                 [.] sw_4_2              cprog                 [.] lr_addr
>     12.63%    cprog  [unknown]             [.] 00000000            cprog                 [.] callme
>      9.42%    cprog  cprog                 [.] hw_2_2              cprog                 [.] address2
>      9.39%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_2
>      4.91%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_1
>      4.91%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_3
>      3.35%    cprog  cprog                 [.] sw_3_1_3            cprog                 [.] sw_3_1
>      3.34%    cprog  cprog                 [.] sw_3_1_1            cprog                 [.] sw_3_1
>      3.31%    cprog  cprog                 [.] hw_1_2              cprog                 [.] symbol2
>      3.31%    cprog  cprog                 [.] sw_4_1              cprog                 [.] ctr_addr
>      3.29%    cprog  cprog                 [.] hw_2_1              cprog                 [.] address1
>      3.27%    cprog  cprog                 [.] sw_3_1_2            cprog                 [.] sw_3_1
>      0.32%    cprog  [unknown]             [.] 0xf7c62328          [unknown]             [.] 0xf7c62320
>      0.01%    cprog  libc-2.11.2.so        [.] vfprintf            libc-2.11.2.so        [.] vfprintf
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_file_xsputn     libc-2.11.2.so        [.] _IO_file_xsputn
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_default_xsputn  libc-2.11.2.so        [.] _IO_default_xsputn
>      0.01%    cprog  libc-2.11.2.so        [.] strchrnul           libc-2.11.2.so        [.] strchrnul
>      0.01%    cprog  [unknown]             [.] 00000000            libc-2.11.2.so        [.] _IO_file_xsputn
>      0.01%    cprog  [unknown]             [k] 00000000            cprog                 [k] callme
>
>
> (3) perf record -j any_ret -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
> # ........  .......  ....................  .....................  ....................  .....................
> #
>     15.61%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1
>      6.28%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2
>      6.28%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1
>      6.26%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1
>      6.24%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1
>      6.24%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme
>      6.21%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme
>      6.19%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2
>      3.16%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme
>      3.15%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1
>      3.15%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme
>      3.14%    cprog  cprog                 [.] callme             cprog                 [.] main
>      3.13%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme
>      3.13%    cprog  cprog                 [.] sw_3_1_1           cprog                 [.] sw_3_1
>      3.12%    cprog  cprog                 [.] back2              cprog                 [.] callme
>      3.12%    cprog  cprog                 [.] sw_3_1             cprog                 [.] callme
>      3.11%    cprog  cprog                 [.] back1              cprog                 [.] callme
>      3.11%    cprog  cprog                 [.] sw_3_1_2           cprog                 [.] sw_3_1
>      3.11%    cprog  cprog                 [.] sw_3_1_3           cprog                 [.] sw_3_1
>      3.10%    cprog  cprog                 [.] sw_3_2             cprog                 [.] callme
>      3.09%    cprog  cprog                 [.] success_3_1_2      cprog                 [.] sw_3_1
>      0.03%    cprog  [unknown]             [.] 0x100009b0         [unknown]             [.] 0xf7d5581c
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow  libc-2.11.2.so        [.] _IO_file_xsputn
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_file_setbuf    [unknown]             [.] 0x0fee1084
>      0.01%    cprog  [unknown]             [.] 0xf7d5589c         libc-2.11.2.so        [.] printf
>      0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_overflow
>      0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_setbuf
>      0.01%    cprog  [unknown]             [k] 00000000           cprog                 [k] callme
>
> (4) perf record -j ind_call  -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object          Target Symbol
> # ........  .......  ....................  ..............  ....................  .....................
> #
>     42.59%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1
>     25.88%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr
>     25.65%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme
>      5.58%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr
>      0.23%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>      0.05%    cprog  [unknown]             [.] 00000000    [unknown]             [.] 0xf79fd740
>      0.03%    cprog  [unknown]             [.] 00000000    libc-2.11.2.so        [.] _IO_file_overflow
>
>
> (5) perf record -j any_call,any_ret -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object              Source Symbol  Target Shared Object          Target Symbol
> # ........  .......  ....................  .........................  ....................  .....................
> #
>     10.00%    cprog  [unknown]             [.] 00000000               cprog                 [.] sw_3_1
>      4.20%    cprog  cprog                 [.] sw_4_2                 cprog                 [.] lr_addr
>      4.17%    cprog  cprog                 [.] lr_addr                cprog                 [.] sw_4_2
>      4.16%    cprog  cprog                 [.] symbol1                cprog                 [.] hw_1_1
>      4.12%    cprog  [unknown]             [.] 00000000               cprog                 [.] callme
>      4.12%    cprog  cprog                 [.] symbol2                cprog                 [.] hw_1_2
>      4.11%    cprog  cprog                 [.] success_3_1_3          cprog                 [.] sw_3_1
>      4.11%    cprog  cprog                 [.] ctr_addr               cprog                 [.] sw_4_1
>      4.10%    cprog  cprog                 [.] sw_4_2                 cprog                 [.] callme
>      2.42%    cprog  cprog                 [.] callme                 cprog                 [.] sw_4_2
>      2.40%    cprog  cprog                 [.] sw_3_1_3               cprog                 [.] sw_3_1
>      2.40%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] sw_3_1_3
>      2.39%    cprog  cprog                 [.] hw_1_2                 cprog                 [.] symbol2
>      2.39%    cprog  cprog                 [.] back1                  cprog                 [.] callme
>      2.39%    cprog  cprog                 [.] sw_3_1_1               cprog                 [.] sw_3_1
>      2.39%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] sw_3_1_1
>      2.39%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] callme
>      2.39%    cprog  cprog                 [.] sw_4_1                 cprog                 [.] ctr_addr
>      2.39%    cprog  cprog                 [.] callme                 cprog                 [.] hw_1_2
>      2.39%    cprog  cprog                 [.] callme                 cprog                 [.] sw_3_1
>      2.39%    cprog  cprog                 [.] sw_3_1_2               cprog                 [.] sw_3_1
>      2.39%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] sw_3_1_2
>      2.38%    cprog  cprog                 [.] hw_1_1                 cprog                 [.] symbol1
>      2.38%    cprog  cprog                 [.] callme                 cprog                 [.] hw_1_1
>      1.78%    cprog  cprog                 [.] back2                  cprog                 [.] callme
>      1.78%    cprog  cprog                 [.] hw_1_1                 cprog                 [.] callme
>      1.76%    cprog  cprog                 [.] success_3_1_2          cprog                 [.] sw_3_1
>      1.76%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] success_3_1_2
>      1.76%    cprog  cprog                 [.] sw_3_2                 cprog                 [.] callme
>      1.76%    cprog  cprog                 [.] callme                 cprog                 [.] sw_3_2
>      1.73%    cprog  cprog                 [.] success_3_1_1          cprog                 [.] sw_3_1
>      1.73%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] success_3_1_1
>      1.73%    cprog  cprog                 [.] hw_1_2                 cprog                 [.] callme
>      1.71%    cprog  cprog                 [.] sw_3_1                 cprog                 [.] success_3_1_3
>      1.71%    cprog  cprog                 [.] sw_4_1                 cprog                 [.] callme
>      1.71%    cprog  cprog                 [.] callme                 cprog                 [.] main
>      0.05%    cprog  [unknown]             [k] 00000000               cprog                 [k] callme
>      0.03%    cprog  [unknown]             [.] 0xf7aa9d4c             [unknown]             [.] 0xf7aa5f80
>      0.01%    cprog  libc-2.11.2.so        [.] __errno_location       libc-2.11.2.so        [.] vfprintf
>      0.01%    cprog  libc-2.11.2.so        [.] vfprintf               libc-2.11.2.so        [.] __errno_location
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_doallocbuf         libc-2.11.2.so        [.] _IO_file_overflow
>      0.01%    cprog  cprog                 [.] __do_global_dtors_aux  [unknown]             [.] 0xf7a9fc74
>      0.01%    cprog  [unknown]             [.] 0xf7a9fca4             cprog                 [.] _fini
>
> (6) perf record -j any_call,ind_call -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object           Source Symbol  Target Shared Object           Target Symbol
> # ........  .......  ....................  ......................  ....................  ......................
> #
>     17.38%    cprog  [unknown]             [.] 00000000            cprog                 [.] sw_3_1
>      7.76%    cprog  cprog                 [.] sw_4_2              cprog                 [.] lr_addr
>      7.64%    cprog  [unknown]             [.] 00000000            cprog                 [.] callme
>      6.00%    cprog  cprog                 [.] sw_3_1              cprog                 [.] sw_3_1_1
>      6.00%    cprog  cprog                 [.] callme              cprog                 [.] sw_3_1
>      5.98%    cprog  cprog                 [.] sw_4_1              cprog                 [.] ctr_addr
>      5.97%    cprog  cprog                 [.] hw_1_1              cprog                 [.] symbol1
>      5.97%    cprog  cprog                 [.] hw_1_2              cprog                 [.] symbol2
>      5.97%    cprog  cprog                 [.] sw_3_1              cprog                 [.] sw_3_1_3
>      5.97%    cprog  cprog                 [.] callme              cprog                 [.] hw_1_1
>      5.97%    cprog  cprog                 [.] callme              cprog                 [.] hw_1_2
>      5.96%    cprog  cprog                 [.] callme              cprog                 [.] sw_4_2
>      5.95%    cprog  cprog                 [.] sw_3_1              cprog                 [.] sw_3_1_2
>      1.83%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_2
>      1.82%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_1
>      1.82%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_3
>      1.82%    cprog  cprog                 [.] callme              cprog                 [.] sw_3_2
>      0.14%    cprog  [unknown]             [k] 00000000            cprog                 [k] callme
>      0.01%    cprog  libc-2.11.2.so        [.] vfprintf            libc-2.11.2.so        [.] strchrnul
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_file_xsputn     libc-2.11.2.so        [.] _IO_default_xsputn
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_default_xsputn  libc-2.11.2.so        [.] _IO_file_overflow
>      0.01%    cprog  ld-2.11.2.so          [.] calloc              [unknown]             [.] 0xf795b390
>      0.01%    cprog  [unknown]             [.] 0x0fee00fc          libc-2.11.2.so        [.] _IO_file_overflow
>      0.01%    cprog  [unknown]             [.] 00000000            ld-2.11.2.so          [.] calloc
>      0.01%    cprog  [unknown]             [.] 0xf794b41c          [unknown]             [.] 0xf794ab70
>
> (7) perf record -j cond,any_ret -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object           Source Symbol  Target Shared Object           Target Symbol
> # ........  .......  ....................  ......................  ....................  ......................
> #
>     12.43%    cprog  [unknown]             [.] 00000000            cprog                 [.] sw_3_1
>      4.91%    cprog  cprog                 [.] lr_addr             cprog                 [.] sw_4_2
>      4.89%    cprog  [unknown]             [.] 00000000            cprog                 [.] callme
>      4.87%    cprog  cprog                 [.] sw_4_2              cprog                 [.] lr_addr
>      4.87%    cprog  cprog                 [.] symbol1             cprog                 [.] hw_1_1
>      4.19%    cprog  cprog                 [.] hw_2_2              cprog                 [.] address2
>      4.19%    cprog  cprog                 [.] back2               cprog                 [.] callme
>      4.19%    cprog  cprog                 [.] sw_3_2              cprog                 [.] callme
>      4.18%    cprog  cprog                 [.] hw_1_1              cprog                 [.] callme
>      4.18%    cprog  cprog                 [.] success_3_1_2       cprog                 [.] sw_3_1
>      4.18%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_2
>      4.16%    cprog  cprog                 [.] sw_4_2              cprog                 [.] callme
>      4.13%    cprog  cprog                 [.] ctr_addr            cprog                 [.] sw_4_1
>      4.12%    cprog  cprog                 [.] symbol2             cprog                 [.] hw_1_2
>      4.12%    cprog  cprog                 [.] success_3_1_3       cprog                 [.] sw_3_1
>      3.43%    cprog  cprog                 [.] callme              cprog                 [.] main
>      3.42%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_3
>      3.41%    cprog  cprog                 [.] success_3_1_1       cprog                 [.] sw_3_1
>      3.41%    cprog  cprog                 [.] sw_3_1              cprog                 [.] success_3_1_1
>      3.41%    cprog  cprog                 [.] sw_4_1              cprog                 [.] callme
>      3.40%    cprog  cprog                 [.] hw_1_2              cprog                 [.] callme
>      0.73%    cprog  cprog                 [.] sw_3_1_3            cprog                 [.] sw_3_1
>      0.73%    cprog  cprog                 [.] sw_4_1              cprog                 [.] ctr_addr
>      0.72%    cprog  cprog                 [.] hw_1_2              cprog                 [.] symbol2
>      0.72%    cprog  cprog                 [.] sw_3_1_1            cprog                 [.] sw_3_1
>      0.70%    cprog  cprog                 [.] hw_2_1              cprog                 [.] address1
>      0.70%    cprog  cprog                 [.] back1               cprog                 [.] callme
>      0.70%    cprog  cprog                 [.] sw_3_1_2            cprog                 [.] sw_3_1
>      0.70%    cprog  cprog                 [.] sw_3_1              cprog                 [.] callme
>      0.19%    cprog  [unknown]             [.] 0xf7c12328          [unknown]             [.] 0xf7c12320
>      0.01%    cprog  libc-2.11.2.so        [.] __errno_location    libc-2.11.2.so        [.] vfprintf
>      0.01%    cprog  libc-2.11.2.so        [.] vfprintf            libc-2.11.2.so        [.] vfprintf
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow   [unknown]             [.] 0x0fee0100
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_default_xsputn  libc-2.11.2.so        [.] _IO_default_xsputn
>      0.01%    cprog  [unknown]             [.] 00000000            libc-2.11.2.so        [.] _IO_file_overflow
>
> (8) perf record -j cond,ind_call -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
> # ........  .......  ....................  ..............  ....................  .................
> #
>     20.70%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1
>      9.99%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr
>      9.91%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme
>      9.45%    cprog  cprog                 [.] sw_3_1_3    cprog                 [.] sw_3_1
>      9.44%    cprog  cprog                 [.] hw_2_1      cprog                 [.] address1
>      9.43%    cprog  cprog                 [.] sw_3_1_1    cprog                 [.] sw_3_1
>      9.42%    cprog  cprog                 [.] hw_1_2      cprog                 [.] symbol2
>      9.42%    cprog  cprog                 [.] sw_3_1_2    cprog                 [.] sw_3_1
>      9.42%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr
>      0.65%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_1
>      0.62%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_3
>      0.56%    cprog  cprog                 [.] hw_2_2      cprog                 [.] address2
>      0.55%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_2
>      0.29%    cprog  [unknown]             [.] 0xf7f72328  [unknown]             [.] 0xf7f72320
>      0.10%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme
>      0.02%    cprog  libc-2.11.2.so        [.] _IO_setb    libc-2.11.2.so        [.] _IO_setb
>
> (9) perf record -e branch-misses:u -j any_call,any_ret,ind_call,cond ./cprog
>
> # Overhead  Command  Source Shared Object       Source Symbol  Target Shared Object            Target Symbol
> # ........  .......  ....................  ..................  ....................  .......................
> #
>      9.31%    cprog  [unknown]             [.] 00000000        cprog                 [.] sw_3_1
>      4.04%    cprog  cprog                 [.] symbol1         cprog                 [.] hw_1_1
>      4.03%    cprog  cprog                 [.] lr_addr         cprog                 [.] sw_4_2
>      4.03%    cprog  cprog                 [.] sw_4_2          cprog                 [.] lr_addr
>      4.00%    cprog  [unknown]             [.] 00000000        cprog                 [.] callme
>      3.88%    cprog  cprog                 [.] ctr_addr        cprog                 [.] sw_4_1
>      3.87%    cprog  cprog                 [.] sw_4_2          cprog                 [.] callme
>      3.86%    cprog  cprog                 [.] symbol2         cprog                 [.] hw_1_2
>      3.86%    cprog  cprog                 [.] success_3_1_3   cprog                 [.] sw_3_1
>      2.49%    cprog  cprog                 [.] sw_4_1          cprog                 [.] ctr_addr
>      2.47%    cprog  cprog                 [.] hw_1_1          cprog                 [.] symbol1
>      2.47%    cprog  cprog                 [.] sw_3_1_1        cprog                 [.] sw_3_1
>      2.47%    cprog  cprog                 [.] sw_3_1          cprog                 [.] sw_3_1_1
>      2.47%    cprog  cprog                 [.] callme          cprog                 [.] hw_1_1
>      2.47%    cprog  cprog                 [.] callme          cprog                 [.] sw_3_1
>      2.47%    cprog  cprog                 [.] hw_1_2          cprog                 [.] symbol2
>      2.47%    cprog  cprog                 [.] hw_2_1          cprog                 [.] address1
>      2.47%    cprog  cprog                 [.] back1           cprog                 [.] callme
>      2.47%    cprog  cprog                 [.] sw_3_1_3        cprog                 [.] sw_3_1
>      2.47%    cprog  cprog                 [.] sw_3_1          cprog                 [.] sw_3_1_3
>      2.47%    cprog  cprog                 [.] sw_3_1          cprog                 [.] callme
>      2.47%    cprog  cprog                 [.] callme          cprog                 [.] hw_1_2
>      2.47%    cprog  cprog                 [.] callme          cprog                 [.] sw_4_2
>      2.46%    cprog  cprog                 [.] sw_3_1_2        cprog                 [.] sw_3_1
>      2.46%    cprog  cprog                 [.] sw_3_1          cprog                 [.] sw_3_1_2
>      1.57%    cprog  cprog                 [.] success_3_1_2   cprog                 [.] sw_3_1
>      1.57%    cprog  cprog                 [.] sw_3_1          cprog                 [.] success_3_1_2
>      1.57%    cprog  cprog                 [.] hw_1_1          cprog                 [.] callme
>      1.56%    cprog  cprog                 [.] hw_2_2          cprog                 [.] address2
>      1.56%    cprog  cprog                 [.] back2           cprog                 [.] callme
>      1.56%    cprog  cprog                 [.] sw_3_2          cprog                 [.] callme
>      1.56%    cprog  cprog                 [.] callme          cprog                 [.] sw_3_2
>      1.41%    cprog  cprog                 [.] success_3_1_1   cprog                 [.] sw_3_1
>      1.41%    cprog  cprog                 [.] sw_3_1          cprog                 [.] success_3_1_1
>      1.40%    cprog  cprog                 [.] sw_4_1          cprog                 [.] callme
>      1.39%    cprog  cprog                 [.] hw_1_2          cprog                 [.] callme
>      1.39%    cprog  cprog                 [.] sw_3_1          cprog                 [.] success_3_1_3
>      1.39%    cprog  cprog                 [.] callme          cprog                 [.] main
>      0.14%    cprog  [unknown]             [.] 0xf7d72328      [unknown]             [.] 0xf7d72320
>      0.03%    cprog  [unknown]             [k] 00000000        cprog                 [k] callme
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_doallocbuf  libc-2.11.2.so        [.] _IO_doallocbuf
>      0.01%    cprog  libc-2.11.2.so        [.] printf          cprog                 [.] main
>      0.01%    cprog  libc-2.11.2.so        [.] _IO_doallocbuf  libc-2.11.2.so        [.] _IO_file_doallocate
>      0.01%    cprog  ld-2.11.2.so          [.] malloc          [unknown]             [.] 0xf7d8b380
>      0.01%    cprog  cprog                 [.] main            [unknown]             [.] 0x0fe7f63c
>      0.01%    cprog  [unknown]             [.] 0xf7d8b388      ld-2.11.2.so          [.] __libc_memalign
>      0.01%    cprog  [unknown]             [.] 00000000        ld-2.11.2.so          [.] malloc
>
> Please refer to the V4 version of the patchset to learn about the sample test case and it's makefile.
>
> Anshuman Khandual (11):
>   perf: Add PERF_SAMPLE_BRANCH_COND
>   perf, tool: Conditional branch filter 'cond' added to perf record
>   x86, perf: Add conditional branch filtering support
>   perf, documentation: Description for conditional branch filter
>   powerpc, perf: Re-arrange BHRB processing
>   powerpc, perf: Re-arrange PMU based branch filter processing in POWER8
>   powerpc, perf: Change the name of HW PMU branch filter tracking variable
>   powerpc, lib: Add new branch analysis support functions
>   powerpc, perf: Enable SW filtering in branch stack sampling framework
>   power8, perf: Adapt BHRB PMU configuration to work with SW filters
>   powerpc, perf: Enable privilege mode SW branch filters
>
>  arch/powerpc/include/asm/code-patching.h     |  16 ++
>  arch/powerpc/include/asm/perf_event_server.h |   6 +-
>  arch/powerpc/lib/code-patching.c             |  80 +++++++
>  arch/powerpc/perf/core-book3s.c              | 323 ++++++++++++++++++++++-----
>  arch/powerpc/perf/power8-pmu.c               |  70 ++++--
>  arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
>  include/uapi/linux/perf_event.h              |   3 +-
>  tools/perf/Documentation/perf-record.txt     |   3 +-
>  tools/perf/builtin-record.c                  |   1 +
>  9 files changed, 429 insertions(+), 78 deletions(-)
>
> --
> 1.7.11.7
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [V6 00/11] perf: New conditional branch filter
  2014-05-27 12:09 ` [V6 00/11] perf: New conditional branch filter Stephane Eranian
@ 2014-05-28  8:04   ` Anshuman Khandual
  2014-06-02 12:59     ` Stephane Eranian
  0 siblings, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2014-05-28  8:04 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Michael Neuling, ak, Peter Zijlstra, LKML, Michael Ellerman,
	Linux PPC dev, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu,
	Ingo Molnar

On 05/27/2014 05:39 PM, Stephane Eranian wrote:
> I have been looking at those patches and ran some tests.
> And I found a few issues so far.
> 
> I am running:
> $ perf record -j any_ret -e cycles:u test_program
> $ perf report -D
> 
> Most entries are okay and match the filter, however some do not make sense:
> 
> 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921:
> 0x10001170 period: 613678 addr: 0
> .... branch stack: nr:9
> .....  0: 00000000100011cc -> 0000000010000e38
> .....  1: 0000000010001150 -> 00000000100011bc
> .....  2: 0000000010001208 -> 0000000010000e38
> .....  3: 0000000010001160 -> 00000000100011f8
> .....  4: 00000000100011cc -> 0000000010000e38
> .....  5: 0000000010001150 -> 00000000100011bc
> .....  6: 0000000010001208 -> 0000000010000e38
> .....  7: 0000000010001160 -> 00000000100011f8
> .....  8: 0000000000000000 -> 0000000010001160
> ^^^^^^
> Entry 8 does not make sense, unless 0x0 is a valid return branch
> instruction address.
> If an address is invalid, the whole entry needs to be eliminated. It
> is okay to have
> less than the max number of entries supported by HW.

Hey Stephane,

Okay. The same behaviour is also reflected in the test results what I have
shared in the patchset. Here is that section.

(3) perf record -j any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
    15.61%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1           
     6.28%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2           
     6.28%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1           
     6.26%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1           
     6.24%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1           
     6.24%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme           
     6.21%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme           
     6.19%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2           
     3.16%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme           
     3.15%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1           
     3.15%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme           
     3.14%    cprog  cprog                 [.] callme             cprog                 [.] main             
     3.13%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme

So a lot of samples above have 0x0 as the "from" address. This originates from the code
section here inside the function "power_pmu_bhrb_read", where we hit two back to back
target addresses. So we zero out the from address for the first target address and re-read
the second address over again. So thats how we get zero as the from address. This is how the
HW capture the samples. I was reluctant to drop these samples but I agree that these kind of
samples can be dropped if we need to.

if (val & BHRB_TARGET) {
	/* Shouldn't have two targets in a
	   row.. Reset index and try again */
	r_index--;
	addr = 0;
}

> I also had cases where monitoring only at the user level, got me
> branch addresses in the
> 0xc0000000...... range. My test program is linked statically.
> 

Thats weird. I would need more information and details on this. BTW
what is the system you are running on ? Could you please share the
/proc/cpuinfo details of the same ?

> when eliminating the bogus entries, my tests yielded only return
> branch instruction addresses
> which is good. Will run more tests.

Sure. Thanks for the tests and comments.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [V6 00/11] perf: New conditional branch filter
  2014-05-28  8:04   ` Anshuman Khandual
@ 2014-06-02 12:59     ` Stephane Eranian
  2014-06-02 16:04       ` Anshuman Khandual
  2014-06-02 22:52       ` Michael Neuling
  0 siblings, 2 replies; 18+ messages in thread
From: Stephane Eranian @ 2014-06-02 12:59 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Michael Neuling, ak, Peter Zijlstra, LKML, Michael Ellerman,
	Linux PPC dev, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu,
	Ingo Molnar

On Wed, May 28, 2014 at 10:04 AM, Anshuman Khandual
<khandual@linux.vnet.ibm.com> wrote:
> On 05/27/2014 05:39 PM, Stephane Eranian wrote:
>> I have been looking at those patches and ran some tests.
>> And I found a few issues so far.
>>
>> I am running:
>> $ perf record -j any_ret -e cycles:u test_program
>> $ perf report -D
>>
>> Most entries are okay and match the filter, however some do not make sense:
>>
>> 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921:
>> 0x10001170 period: 613678 addr: 0
>> .... branch stack: nr:9
>> .....  0: 00000000100011cc -> 0000000010000e38
>> .....  1: 0000000010001150 -> 00000000100011bc
>> .....  2: 0000000010001208 -> 0000000010000e38
>> .....  3: 0000000010001160 -> 00000000100011f8
>> .....  4: 00000000100011cc -> 0000000010000e38
>> .....  5: 0000000010001150 -> 00000000100011bc
>> .....  6: 0000000010001208 -> 0000000010000e38
>> .....  7: 0000000010001160 -> 00000000100011f8
>> .....  8: 0000000000000000 -> 0000000010001160
>> ^^^^^^
>> Entry 8 does not make sense, unless 0x0 is a valid return branch
>> instruction address.
>> If an address is invalid, the whole entry needs to be eliminated. It
>> is okay to have
>> less than the max number of entries supported by HW.
>
> Hey Stephane,
>
> Okay. The same behaviour is also reflected in the test results what I have
> shared in the patchset. Here is that section.
>
> (3) perf record -j any_ret -e branch-misses:u ./cprog
>
> # Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
> # ........  .......  ....................  .....................  ....................  .....................
> #
>     15.61%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1
>      6.28%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2
>      6.28%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1
>      6.26%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1
>      6.24%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1
>      6.24%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme
>      6.21%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme
>      6.19%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2
>      3.16%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme
>      3.15%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1
>      3.15%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme
>      3.14%    cprog  cprog                 [.] callme             cprog                 [.] main
>      3.13%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme
>
> So a lot of samples above have 0x0 as the "from" address. This originates from the code
> section here inside the function "power_pmu_bhrb_read", where we hit two back to back

Could you explain the back-to-back case a bit more here?
Back-to-back returns to me means something like:

int foo()
{
  ...
   return bar();
}

int bar()
{
  return 0;
}

Not counting the leaf optimization here, bar return to foo which
immediately returns: 2 back-2-back returns.
Is that the case you're talking about here?

> target addresses. So we zero out the from address for the first target address and re-read
> the second address over again. So thats how we get zero as the from address. This is how the
> HW capture the samples. I was reluctant to drop these samples but I agree that these kind of
> samples can be dropped if we need to.
>
I think we need to make it as simple as possible for tools, i.e.,
avoid having to decode the
disassembly to figure out what happened. Here address 0 is not exploitable.

> if (val & BHRB_TARGET) {
>         /* Shouldn't have two targets in a
>            row.. Reset index and try again */
>         r_index--;
>         addr = 0;
> }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [V6 00/11] perf: New conditional branch filter
  2014-06-02 12:59     ` Stephane Eranian
@ 2014-06-02 16:04       ` Anshuman Khandual
  2014-06-02 16:25         ` Stephane Eranian
  2014-06-02 22:52       ` Michael Neuling
  1 sibling, 1 reply; 18+ messages in thread
From: Anshuman Khandual @ 2014-06-02 16:04 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Michael Neuling, ak, Peter Zijlstra, LKML, Michael Ellerman,
	Linux PPC dev, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu,
	Ingo Molnar

On 06/02/2014 06:29 PM, Stephane Eranian wrote:
> On Wed, May 28, 2014 at 10:04 AM, Anshuman Khandual
> <khandual@linux.vnet.ibm.com> wrote:
>> On 05/27/2014 05:39 PM, Stephane Eranian wrote:
>>> I have been looking at those patches and ran some tests.
>>> And I found a few issues so far.
>>>
>>> I am running:
>>> $ perf record -j any_ret -e cycles:u test_program
>>> $ perf report -D
>>>
>>> Most entries are okay and match the filter, however some do not make sense:
>>>
>>> 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921:
>>> 0x10001170 period: 613678 addr: 0
>>> .... branch stack: nr:9
>>> .....  0: 00000000100011cc -> 0000000010000e38
>>> .....  1: 0000000010001150 -> 00000000100011bc
>>> .....  2: 0000000010001208 -> 0000000010000e38
>>> .....  3: 0000000010001160 -> 00000000100011f8
>>> .....  4: 00000000100011cc -> 0000000010000e38
>>> .....  5: 0000000010001150 -> 00000000100011bc
>>> .....  6: 0000000010001208 -> 0000000010000e38
>>> .....  7: 0000000010001160 -> 00000000100011f8
>>> .....  8: 0000000000000000 -> 0000000010001160
>>> ^^^^^^
>>> Entry 8 does not make sense, unless 0x0 is a valid return branch
>>> instruction address.
>>> If an address is invalid, the whole entry needs to be eliminated. It
>>> is okay to have
>>> less than the max number of entries supported by HW.
>>
>> Hey Stephane,
>>
>> Okay. The same behaviour is also reflected in the test results what I have
>> shared in the patchset. Here is that section.
>>
>> (3) perf record -j any_ret -e branch-misses:u ./cprog
>>
>> # Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
>> # ........  .......  ....................  .....................  ....................  .....................
>> #
>>     15.61%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1
>>      6.28%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2
>>      6.28%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1
>>      6.26%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1
>>      6.24%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1
>>      6.24%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme
>>      6.21%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme
>>      6.19%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2
>>      3.16%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme
>>      3.15%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1
>>      3.15%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme
>>      3.14%    cprog  cprog                 [.] callme             cprog                 [.] main
>>      3.13%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme
>>
>> So a lot of samples above have 0x0 as the "from" address. This originates from the code
>> section here inside the function "power_pmu_bhrb_read", where we hit two back to back
> 
> Could you explain the back-to-back case a bit more here?
> Back-to-back returns to me means something like:
> 
> int foo()
> {
>   ...
>    return bar();
> }
> 
> int bar()
> {
>   return 0;
> }
> 
> Not counting the leaf optimization here, bar return to foo which
> immediately returns: 2 back-2-back returns.
> Is that the case you're talking about here?
> 

No. Filtering of return branches has been implemented in SW only. So PMU as such does not capture
return only branches. It captures all the branches what it encounters. During the capture process
PMU might *record* two back to back "target addresses" (without capturing the from address for the
first one) for which we are unable to figure out the "from address". This leaves us with one branch
record where we have the target address not from address and so we make it zero. With the current
logic all branch records with "from address" as zero get filtered through and become the part of the
final set. I was not too sure how to deal with these cases.

>> target addresses. So we zero out the from address for the first target address and re-read
>> the second address over again. So thats how we get zero as the from address. This is how the
>> HW capture the samples. I was reluctant to drop these samples but I agree that these kind of
>> samples can be dropped if we need to.
>>
> I think we need to make it as simple as possible for tools, i.e.,
> avoid having to decode the
> disassembly to figure out what happened. Here address 0 is not exploitable.

Thats right. Dropping the branch record where we have only the target address not the from address
might just solve this problem.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [V6 00/11] perf: New conditional branch filter
  2014-06-02 16:04       ` Anshuman Khandual
@ 2014-06-02 16:25         ` Stephane Eranian
  0 siblings, 0 replies; 18+ messages in thread
From: Stephane Eranian @ 2014-06-02 16:25 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Michael Neuling, ak, Peter Zijlstra, LKML, Michael Ellerman,
	Linux PPC dev, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu,
	Ingo Molnar

On Mon, Jun 2, 2014 at 6:04 PM, Anshuman Khandual
<khandual@linux.vnet.ibm.com> wrote:
> On 06/02/2014 06:29 PM, Stephane Eranian wrote:
>> On Wed, May 28, 2014 at 10:04 AM, Anshuman Khandual
>> <khandual@linux.vnet.ibm.com> wrote:
>>> On 05/27/2014 05:39 PM, Stephane Eranian wrote:
>>>> I have been looking at those patches and ran some tests.
>>>> And I found a few issues so far.
>>>>
>>>> I am running:
>>>> $ perf record -j any_ret -e cycles:u test_program
>>>> $ perf report -D
>>>>
>>>> Most entries are okay and match the filter, however some do not make sense:
>>>>
>>>> 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921:
>>>> 0x10001170 period: 613678 addr: 0
>>>> .... branch stack: nr:9
>>>> .....  0: 00000000100011cc -> 0000000010000e38
>>>> .....  1: 0000000010001150 -> 00000000100011bc
>>>> .....  2: 0000000010001208 -> 0000000010000e38
>>>> .....  3: 0000000010001160 -> 00000000100011f8
>>>> .....  4: 00000000100011cc -> 0000000010000e38
>>>> .....  5: 0000000010001150 -> 00000000100011bc
>>>> .....  6: 0000000010001208 -> 0000000010000e38
>>>> .....  7: 0000000010001160 -> 00000000100011f8
>>>> .....  8: 0000000000000000 -> 0000000010001160
>>>> ^^^^^^
>>>> Entry 8 does not make sense, unless 0x0 is a valid return branch
>>>> instruction address.
>>>> If an address is invalid, the whole entry needs to be eliminated. It
>>>> is okay to have
>>>> less than the max number of entries supported by HW.
>>>
>>> Hey Stephane,
>>>
>>> Okay. The same behaviour is also reflected in the test results what I have
>>> shared in the patchset. Here is that section.
>>>
>>> (3) perf record -j any_ret -e branch-misses:u ./cprog
>>>
>>> # Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
>>> # ........  .......  ....................  .....................  ....................  .....................
>>> #
>>>     15.61%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1
>>>      6.28%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2
>>>      6.28%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1
>>>      6.26%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1
>>>      6.24%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1
>>>      6.24%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme
>>>      6.21%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme
>>>      6.19%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2
>>>      3.16%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme
>>>      3.15%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1
>>>      3.15%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme
>>>      3.14%    cprog  cprog                 [.] callme             cprog                 [.] main
>>>      3.13%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme
>>>
>>> So a lot of samples above have 0x0 as the "from" address. This originates from the code
>>> section here inside the function "power_pmu_bhrb_read", where we hit two back to back
>>
>> Could you explain the back-to-back case a bit more here?
>> Back-to-back returns to me means something like:
>>
>> int foo()
>> {
>>   ...
>>    return bar();
>> }
>>
>> int bar()
>> {
>>   return 0;
>> }
>>
>> Not counting the leaf optimization here, bar return to foo which
>> immediately returns: 2 back-2-back returns.
>> Is that the case you're talking about here?
>>
>
> No. Filtering of return branches has been implemented in SW only. So PMU as such does not capture
> return only branches. It captures all the branches what it encounters. During the capture process
> PMU might *record* two back to back "target addresses" (without capturing the from address for the
> first one) for which we are unable to figure out the "from address". This leaves us with one branch
> record where we have the target address not from address and so we make it zero. With the current
> logic all branch records with "from address" as zero get filtered through and become the part of the
> final set. I was not too sure how to deal with these cases.
>
So PPC8 captures all branches, no HW filter. But then in SW you filter
out non return branches.
Given you're description, I have to believe that sometimes the HW does
not even capture the
from address. If so, then in that case, I think it is best to drop the
sample. Because the target
address may be the target of an indirect branch for which there is no
way to find the source.
In other words, the record cannot be exploited.

But why does the HW not capture some from addresses?
I am worried this might create some bias in the samples.

>>> target addresses. So we zero out the from address for the first target address and re-read
>>> the second address over again. So thats how we get zero as the from address. This is how the
>>> HW capture the samples. I was reluctant to drop these samples but I agree that these kind of
>>> samples can be dropped if we need to.
>>>
>> I think we need to make it as simple as possible for tools, i.e.,
>> avoid having to decode the
>> disassembly to figure out what happened. Here address 0 is not exploitable.
>
> Thats right. Dropping the branch record where we have only the target address not the from address
> might just solve this problem.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [V6 00/11] perf: New conditional branch filter
  2014-06-02 12:59     ` Stephane Eranian
  2014-06-02 16:04       ` Anshuman Khandual
@ 2014-06-02 22:52       ` Michael Neuling
  1 sibling, 0 replies; 18+ messages in thread
From: Michael Neuling @ 2014-06-02 22:52 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: ak, Peter Zijlstra, LKML, Michael Ellerman, Linux PPC dev,
	Arnaldo Carvalho de Melo, Sukadev Bhattiprolu, Ingo Molnar,
	Anshuman Khandual

On Mon, 2014-06-02 at 14:59 +0200, Stephane Eranian wrote:
> On Wed, May 28, 2014 at 10:04 AM, Anshuman Khandual
> <khandual@linux.vnet.ibm.com> wrote:
> > On 05/27/2014 05:39 PM, Stephane Eranian wrote:
> >> I have been looking at those patches and ran some tests.
> >> And I found a few issues so far.
> >>
> >> I am running:
> >> $ perf record -j any_ret -e cycles:u test_program
> >> $ perf report -D
> >>
> >> Most entries are okay and match the filter, however some do not make s=
ense:
> >>
> >> 3642586996762 0x15d0 [0x108]: PERF_RECORD_SAMPLE(IP, 2): 17921/17921:
> >> 0x10001170 period: 613678 addr: 0
> >> .... branch stack: nr:9
> >> .....  0: 00000000100011cc -> 0000000010000e38
> >> .....  1: 0000000010001150 -> 00000000100011bc
> >> .....  2: 0000000010001208 -> 0000000010000e38
> >> .....  3: 0000000010001160 -> 00000000100011f8
> >> .....  4: 00000000100011cc -> 0000000010000e38
> >> .....  5: 0000000010001150 -> 00000000100011bc
> >> .....  6: 0000000010001208 -> 0000000010000e38
> >> .....  7: 0000000010001160 -> 00000000100011f8
> >> .....  8: 0000000000000000 -> 0000000010001160
> >> ^^^^^^
> >> Entry 8 does not make sense, unless 0x0 is a valid return branch
> >> instruction address.
> >> If an address is invalid, the whole entry needs to be eliminated. It
> >> is okay to have
> >> less than the max number of entries supported by HW.
> >
> > Hey Stephane,
> >
> > Okay. The same behaviour is also reflected in the test results what I h=
ave
> > shared in the patchset. Here is that section.
> >
> > (3) perf record -j any_ret -e branch-misses:u ./cprog
> >
> > # Overhead  Command  Source Shared Object          Source Symbol  Targe=
t Shared Object          Target Symbol
> > # ........  .......  ....................  .....................  .....=
...............  .....................
> > #
> >     15.61%    cprog  [unknown]             [.] 00000000           cprog=
                 [.] sw_3_1
> >      6.28%    cprog  cprog                 [.] symbol2            cprog=
                 [.] hw_1_2
> >      6.28%    cprog  cprog                 [.] ctr_addr           cprog=
                 [.] sw_4_1
> >      6.26%    cprog  cprog                 [.] success_3_1_3      cprog=
                 [.] sw_3_1
> >      6.24%    cprog  cprog                 [.] symbol1            cprog=
                 [.] hw_1_1
> >      6.24%    cprog  cprog                 [.] sw_4_2             cprog=
                 [.] callme
> >      6.21%    cprog  [unknown]             [.] 00000000           cprog=
                 [.] callme
> >      6.19%    cprog  cprog                 [.] lr_addr            cprog=
                 [.] sw_4_2
> >      3.16%    cprog  cprog                 [.] hw_1_2             cprog=
                 [.] callme
> >      3.15%    cprog  cprog                 [.] success_3_1_1      cprog=
                 [.] sw_3_1
> >      3.15%    cprog  cprog                 [.] sw_4_1             cprog=
                 [.] callme
> >      3.14%    cprog  cprog                 [.] callme             cprog=
                 [.] main
> >      3.13%    cprog  cprog                 [.] hw_1_1             cprog=
                 [.] callme
> >
> > So a lot of samples above have 0x0 as the "from" address. This originat=
es from the code
> > section here inside the function "power_pmu_bhrb_read", where we hit tw=
o back to back
>=20
> Could you explain the back-to-back case a bit more here?
> Back-to-back returns to me means something like:
>=20
> int foo()
> {
>   ...
>    return bar();
> }
>=20
> int bar()
> {
>   return 0;
> }
>=20
> Not counting the leaf optimization here, bar return to foo which
> immediately returns: 2 back-2-back returns.
> Is that the case you're talking about here?
>=20
> > target addresses. So we zero out the from address for the first target =
address and re-read
> > the second address over again. So thats how we get zero as the from add=
ress. This is how the
> > HW capture the samples. I was reluctant to drop these samples but I agr=
ee that these kind of
> > samples can be dropped if we need to.
> >
> I think we need to make it as simple as possible for tools, i.e.,
> avoid having to decode the
> disassembly to figure out what happened. Here address 0 is not exploitabl=
e.

This was my fault.  I figured if we only had partial information from
the hardware, it was best to at least export that to the tools.  If you
disagree then we can we remove them.  There was a discussion a while
back on this here:
  https://lkml.org/lkml/2013/5/8/543

Because of the way the branch buffer is structured, we can certainly
lose the from address of the oldest branch in the buffer.  I've not seen
the hardware lose the from branches in the middle of the buffer but I
guess it's possible.  We'll have to get back to you on how or why this
would occur (and associated bias) after talking to some hardware folk.

FWIW, there was some discussion on how the POWER8 branch buffer works a
while back here (same thread as before):
  https://lkml.org/lkml/2013/5/8/541

Mikey

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-06-02 22:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-05  9:09 [V6 00/11] perf: New conditional branch filter Anshuman Khandual
2014-05-05  9:09 ` [V6 01/11] perf: Add PERF_SAMPLE_BRANCH_COND Anshuman Khandual
2014-05-05  9:09 ` [V6 02/11] perf, tool: Conditional branch filter 'cond' added to perf record Anshuman Khandual
2014-05-05  9:09 ` [V6 03/11] x86, perf: Add conditional branch filtering support Anshuman Khandual
2014-05-05  9:09 ` [V6 04/11] perf, documentation: Description for conditional branch filter Anshuman Khandual
2014-05-05  9:09 ` [V6 05/11] powerpc, perf: Re-arrange BHRB processing Anshuman Khandual
2014-05-05  9:09 ` [V6 06/11] powerpc, perf: Re-arrange PMU based branch filter processing in POWER8 Anshuman Khandual
2014-05-05  9:09 ` [V6 07/11] powerpc, perf: Change the name of HW PMU branch filter tracking variable Anshuman Khandual
2014-05-05  9:09 ` [V6 08/11] powerpc, lib: Add new branch analysis support functions Anshuman Khandual
2014-05-05  9:09 ` [V6 09/11] powerpc, perf: Enable SW filtering in branch stack sampling framework Anshuman Khandual
2014-05-05  9:09 ` [V6 10/11] power8, perf: Adapt BHRB PMU configuration to work with SW filters Anshuman Khandual
2014-05-05  9:09 ` [V6 11/11] powerpc, perf: Enable privilege mode SW branch filters Anshuman Khandual
2014-05-27 12:09 ` [V6 00/11] perf: New conditional branch filter Stephane Eranian
2014-05-28  8:04   ` Anshuman Khandual
2014-06-02 12:59     ` Stephane Eranian
2014-06-02 16:04       ` Anshuman Khandual
2014-06-02 16:25         ` Stephane Eranian
2014-06-02 22:52       ` Michael Neuling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).