All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 00/10] perf: New conditional branch filter
@ 2013-12-04 10:32 ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

		This patchset is the re-spin of the original branch stack sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
also enables SW based branch filtering support for book3s powerpc platforms which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample test
application program (included here).

Changes in V2
=============
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=============
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code
(3) Added new instruction analysis functionality into powerpc code-patching library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

Changes in V4
=============
(1) Changed the commit message for patch (01/10)
(2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman
(3) Rebased the patchset against latest Linus's tree

PMU HW branch filters
=====================
(1) perf record -j any_call -e branch-misses:u ./cprog
# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object             Target Symbol
# ........  .......  ....................  .....................  ....................  ........................
#
     7.00%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_2            
     6.99%    cprog  cprog                 [.] hw_1_1             cprog                 [.] symbol1             
     6.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_2       
     5.41%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_3            
     5.40%    cprog  cprog                 [.] hw_1_2             cprog                 [.] symbol2             
     5.40%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_2              
     5.40%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_1       
     5.40%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_1              
     5.39%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_1            
     5.39%    cprog  cprog                 [.] sw_4_2             cprog                 [.] lr_addr             
     5.39%    cprog  cprog                 [.] callme             cprog                 [.] sw_4_2              
     5.37%    cprog  [unknown]             [.] 00000000           cprog                 [.] ctr_addr            
     4.30%    cprog  cprog                 [.] callme             cprog                 [.] hw_2_1              
     4.28%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_1              
     3.82%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_3       
     3.81%    cprog  cprog                 [.] callme             cprog                 [.] hw_2_2              
     3.81%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_2              
     2.71%    cprog  [unknown]             [.] 00000000           cprog                 [.] lr_addr             
     2.70%    cprog  cprog                 [.] main               cprog                 [.] callme              
     2.70%    cprog  cprog                 [.] sw_4_1             cprog                 [.] ctr_addr            
     2.70%    cprog  cprog                 [.] callme             cprog                 [.] sw_4_1              
     0.08%    cprog  [unknown]             [.] 0xf78676c4         [unknown]             [.] 0xf78522c0          
     0.02%    cprog  [unknown]             [k] 00000000           cprog                 [k] ctr_addr            
     0.01%    cprog  [kernel.kallsyms]     [.] .power_pmu_enable  [kernel.kallsyms]     [.] .power8_compute_mmcr
     0.00%    cprog  ld-2.11.2.so          [.] malloc             [unknown]             [.] 0xf786b380          
     0.00%    cprog  ld-2.11.2.so          [.] calloc             [unknown]             [.] 0xf786b390          
     0.00%    cprog  cprog                 [.] main               [unknown]             [.] 0x10000950          
     0.00%    cprog  [unknown]             [.] 00000000           [kernel.kallsyms]     [.] .power_pmu_enable  
    
(2) perf record -j cond -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  .......................  ....................  .......................
#
    27.73%    cprog  [unknown]             [.] 00000000             cprog                 [.] callme             
    13.03%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1             
     5.64%    cprog  [unknown]             [.] 00000000             cprog                 [.] main               
     5.62%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_4_2             
     5.46%    cprog  cprog                 [.] sw_4_2               cprog                 [.] lr_addr            
     5.40%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_4_1             
     3.72%    cprog  cprog                 [.] hw_2_1               cprog                 [.] callme             
     3.71%    cprog  cprog                 [.] main                 cprog                 [.] hw_1_1             
     3.71%    cprog  cprog                 [.] sw_3_1_2             cprog                 [.] sw_3_1             
     3.70%    cprog  cprog                 [.] sw_3_1_3             cprog                 [.] sw_3_1             
     3.70%    cprog  cprog                 [.] sw_4_1               cprog                 [.] ctr_addr           
     3.69%    cprog  cprog                 [.] hw_1_2               cprog                 [.] hw_1_2             
     3.69%    cprog  cprog                 [.] hw_2_2               cprog                 [.] callme             
     3.68%    cprog  cprog                 [.] sw_3_1_1             cprog                 [.] sw_3_1             
     1.93%    cprog  [unknown]             [.] 00000000             cprog                 [.] lr_addr            
     1.78%    cprog  [unknown]             [.] 00000000             cprog                 [.] hw_1_2             
     1.78%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_3_1             
     1.76%    cprog  [unknown]             [.] 00000000             cprog                 [.] hw_1_1             
     0.12%    cprog  [unknown]             [.] 0xf7bb25dc           [unknown]             [.] 0xf7bb27e4         
     0.07%    cprog  [unknown]             [k] 00000000             cprog                 [k] callme             
     0.07%    cprog  [unknown]             [k] 00000000             cprog                 [k] sw_4_1             
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] _IO_file_doallocate
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] isatty             
     0.00%    cprog  [unknown]             [.] 00000000             libc-2.11.2.so        [.] _IO_file_doallocate

SW based branch filters
=======================
(3) perf record -j any_ret -e branch-misses:u ./cprog 

# Overhead  Command  Source Shared Object         Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  ....................  ....................  .....................
#
    15.37%    cprog  [unknown]             [.] 00000000          cprog                 [.] sw_3_1           
     6.46%    cprog  cprog                 [.] success_3_1_3     cprog                 [.] sw_3_1           
     6.45%    cprog  cprog                 [.] symbol1           cprog                 [.] hw_1_1           
     6.41%    cprog  [unknown]             [.] 00000000          cprog                 [.] callme           
     6.39%    cprog  cprog                 [.] ctr_addr          cprog                 [.] sw_4_1           
     6.37%    cprog  cprog                 [.] symbol2           cprog                 [.] hw_1_2           
     6.36%    cprog  cprog                 [.] sw_4_2            cprog                 [.] callme           
     6.35%    cprog  cprog                 [.] lr_addr           cprog                 [.] sw_4_2           
     3.97%    cprog  cprog                 [.] back1             cprog                 [.] callme           
     3.93%    cprog  cprog                 [.] sw_3_1_2          cprog                 [.] sw_3_1           
     3.93%    cprog  cprog                 [.] sw_3_1            cprog                 [.] callme           
     3.86%    cprog  cprog                 [.] sw_3_1_3          cprog                 [.] sw_3_1           
     3.84%    cprog  cprog                 [.] sw_3_1_1          cprog                 [.] sw_3_1           
     2.54%    cprog  cprog                 [.] success_3_1_1     cprog                 [.] sw_3_1           
     2.54%    cprog  cprog                 [.] sw_4_1            cprog                 [.] callme           
     2.54%    cprog  cprog                 [.] hw_1_1            cprog                 [.] callme           
     2.53%    cprog  cprog                 [.] sw_3_2            cprog                 [.] callme           
     2.52%    cprog  cprog                 [.] callme            cprog                 [.] main             
     2.51%    cprog  cprog                 [.] hw_1_2            cprog                 [.] callme           
     2.51%    cprog  cprog                 [.] back2             cprog                 [.] callme           
     2.51%    cprog  cprog                 [.] success_3_1_2     cprog                 [.] sw_3_1           
     0.07%    cprog  [unknown]             [k] 00000000          cprog                 [k] callme           
     0.02%    cprog  [unknown]             [.] 00000000          [unknown]             [.] 0xf7e5c004       
     0.01%    cprog  libc-2.11.2.so        [.] __errno_location  libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  [unknown]             [.] 00000000          libc-2.11.2.so        [.] _IO_file_overflow

(4) perf record -j ind_call  -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object        Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  ...................  ....................  .....................
#
    48.04%    cprog  [unknown]             [.] 00000000         cprog                 [.] sw_3_1           
    19.96%    cprog  cprog                 [.] sw_4_2           cprog                 [.] lr_addr          
    19.69%    cprog  [unknown]             [.] 00000000         cprog                 [.] callme           
    12.04%    cprog  cprog                 [.] sw_4_1           cprog                 [.] ctr_addr         
     0.18%    cprog  [unknown]             [k] 00000000         cprog                 [k] callme           
     0.02%    cprog  libc-2.11.2.so        [.] _IO_file_xsputn  libc-2.11.2.so        [.] _IO_file_overflow
     0.02%    cprog  [unknown]             [.] 00000000         libc-2.11.2.so        [.] _IO_file_xsputn  
     0.02%    cprog  [unknown]             [.] 00000000         ld-2.11.2.so          [.] malloc           
     0.02%    cprog  [unknown]             [k] 00000000         cprog                 [k] sw_3_1           

(5) perf record -j any_call,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  .......................  ....................  .......................
#
    10.36%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_3_1             
     4.18%    cprog  cprog                 [.] symbol1              cprog                 [.] hw_1_1             
     4.18%    cprog  cprog                 [.] success_3_1_3        cprog                 [.] sw_3_1             
     4.17%    cprog  cprog                 [.] sw_4_2               cprog                 [.] lr_addr            
     4.16%    cprog  cprog                 [.] sw_4_2               cprog                 [.] callme             
     4.15%    cprog  cprog                 [.] ctr_addr             cprog                 [.] sw_4_1             
     4.15%    cprog  cprog                 [.] lr_addr              cprog                 [.] sw_4_2             
     4.14%    cprog  cprog                 [.] symbol2              cprog                 [.] hw_1_2             
     4.14%    cprog  [unknown]             [.] 00000000             cprog                 [.] callme             
     2.15%    cprog  cprog                 [.] sw_3_1               cprog                 [.] callme             
     2.14%    cprog  cprog                 [.] hw_1_1               cprog                 [.] symbol1            
     2.14%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_1             
     2.14%    cprog  cprog                 [.] callme               cprog                 [.] sw_4_2             
     2.13%    cprog  cprog                 [.] back1                cprog                 [.] callme             
     2.12%    cprog  cprog                 [.] sw_3_1_2             cprog                 [.] sw_3_1             
     2.12%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_2           
     2.11%    cprog  cprog                 [.] sw_3_1_3             cprog                 [.] sw_3_1             
     2.11%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_3           
     2.11%    cprog  cprog                 [.] sw_4_1               cprog                 [.] ctr_addr           
     2.10%    cprog  cprog                 [.] hw_1_2               cprog                 [.] symbol2            
     2.10%    cprog  cprog                 [.] sw_3_1_1             cprog                 [.] sw_3_1             
     2.10%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_1           
     2.10%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_2             
     2.10%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_1             
     2.05%    cprog  cprog                 [.] success_3_1_1        cprog                 [.] sw_3_1             
     2.05%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_1      
     2.05%    cprog  cprog                 [.] success_3_1_2        cprog                 [.] sw_3_1             
     2.05%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_2      
     2.04%    cprog  cprog                 [.] hw_1_1               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] back2                cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] sw_4_1               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] callme               cprog                 [.] main               
     2.04%    cprog  cprog                 [.] hw_1_2               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] sw_3_2               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_2             
     2.03%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_3      
     0.03%    cprog  [unknown]             [k] 00000000             cprog                 [k] callme             
     0.01%    cprog  [unknown]             [.] 0xf7e79bb0           [unknown]             [.] 0xf7e64088         
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] mmap               
     0.00%    cprog  libc-2.11.2.so        [.] mmap                 libc-2.11.2.so        [.] _IO_file_doallocate
     0.00%    cprog  [unknown]             [.] 0xf7e7589c           libc-2.11.2.so        [.] printf             
     0.00%    cprog  [unknown]             [k] 00000000             cprog                 [k] sw_3_1          

(6) perf record -j any_call,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
# ........  .......  ....................  ..............  ....................  .................
#
    23.09%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1       
     8.99%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr      
     8.92%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme       
     5.18%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_2
     5.16%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_1
     5.16%    cprog  cprog                 [.] callme      cprog                 [.] sw_3_2       
     5.12%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_3
     3.85%    cprog  cprog                 [.] sw_3_1      cprog                 [.] sw_3_1_1     
     3.85%    cprog  cprog                 [.] callme      cprog                 [.] sw_3_1       
     3.84%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr     
     3.82%    cprog  cprog                 [.] hw_1_1      cprog                 [.] symbol1      
     3.82%    cprog  cprog                 [.] sw_3_1      cprog                 [.] sw_3_1_2     
     3.82%    cprog  cprog                 [.] sw_3_1      cprog                 [.] sw_3_1_3     
     3.82%    cprog  cprog                 [.] callme      cprog                 [.] hw_1_1       
     3.81%    cprog  cprog                 [.] hw_1_2      cprog                 [.] symbol2      
     3.81%    cprog  cprog                 [.] callme      cprog                 [.] hw_1_2       
     3.81%    cprog  cprog                 [.] callme      cprog                 [.] sw_4_2       
     0.05%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme       
     0.03%    cprog  [unknown]             [.] 0xf7f7232c  [unknown]             [.] 0xf7f72334   
     0.01%    cprog  ld-2.11.2.so          [.] malloc      [unknown]             [.] 0xf7f8b380   
     0.01%    cprog  cprog                 [.] main        [unknown]             [.] 0x10000950   
     0.01%    cprog  [unknown]             [.] 00000000    ld-2.11.2.so          [.] malloc       
     0.01%    cprog  [unknown]             [.] 00000000    cprog                 [.] main         

(7) perf record -j cond,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
    12.18%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1           
     4.90%    cprog  cprog                 [.] sw_4_2             cprog                 [.] lr_addr          
     4.88%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme           
     4.88%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2           
     4.88%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme           
     4.86%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1           
     4.86%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1           
     4.85%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2           
     4.85%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1           
     2.47%    cprog  cprog                 [.] sw_3_1_3           cprog                 [.] sw_3_1           
     2.46%    cprog  cprog                 [.] back1              cprog                 [.] callme           
     2.45%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme           
     2.45%    cprog  cprog                 [.] hw_2_1             cprog                 [.] address1         
     2.44%    cprog  cprog                 [.] hw_1_2             cprog                 [.] symbol2          
     2.44%    cprog  cprog                 [.] sw_3_1_1           cprog                 [.] sw_3_1           
     2.44%    cprog  cprog                 [.] sw_3_2             cprog                 [.] callme           
     2.44%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1           
     2.44%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_1    
     2.44%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_3    
     2.43%    cprog  cprog                 [.] callme             cprog                 [.] main             
     2.43%    cprog  cprog                 [.] hw_2_2             cprog                 [.] address2         
     2.43%    cprog  cprog                 [.] sw_3_1_2           cprog                 [.] sw_3_1           
     2.43%    cprog  cprog                 [.] success_3_1_2      cprog                 [.] sw_3_1           
     2.43%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_2    
     2.43%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme           
     2.42%    cprog  cprog                 [.] sw_3_1             cprog                 [.] callme           
     2.42%    cprog  cprog                 [.] sw_4_1             cprog                 [.] ctr_addr         
     2.42%    cprog  cprog                 [.] back2              cprog                 [.] callme           
     2.40%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme           
     0.10%    cprog  [unknown]             [.] 0xf78923e0         [unknown]             [.] 0xf78923c0       
     0.03%    cprog  [unknown]             [k] 00000000           cprog                 [k] callme           
     0.01%    cprog  [unknown]             [k] 00000000           cprog                 [k] sw_3_1           
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf           libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow  [unknown]             [.] 0x0fee0100       
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul          libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul          libc-2.11.2.so        [.] strchrnul        
     0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_overflow


(8) perf record -j cond,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object        Target Symbol
# ........  .......  ....................  ..............  ....................  ...................
#
    26.21%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1         
    10.50%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr        
    10.38%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme         
     5.31%    cprog  cprog                 [.] sw_3_1_2    cprog                 [.] sw_3_1         
     5.30%    cprog  cprog                 [.] sw_3_1_1    cprog                 [.] sw_3_1         
     5.27%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_2  
     5.26%    cprog  cprog                 [.] hw_2_2      cprog                 [.] address2       
     5.25%    cprog  cprog                 [.] hw_1_2      cprog                 [.] symbol2        
     5.25%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_3  
     5.24%    cprog  cprog                 [.] hw_2_1      cprog                 [.] address1       
     5.23%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr       
     5.20%    cprog  cprog                 [.] sw_3_1_3    cprog                 [.] sw_3_1         
     5.19%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_1  
     0.24%    cprog  [unknown]             [.] 0xf7cf23e0  [unknown]             [.] 0xf7cf23c0     
     0.11%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme         
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf    libc-2.11.2.so        [.] vfprintf       
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf    libc-2.11.2.so        [.] _IO_file_xsputn
     0.01%    cprog  [unknown]             [.] 00000000    libc-2.11.2.so        [.] vfprintf       
     0.01%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_3_1         

(9) perf record -j any_call,cond,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .................  ....................  .....................
#
     9.96%    cprog  [unknown]             [.] 00000000       cprog                 [.] sw_3_1           
     4.06%    cprog  cprog                 [.] sw_4_2         cprog                 [.] lr_addr          
     4.04%    cprog  cprog                 [.] lr_addr        cprog                 [.] sw_4_2           
     4.03%    cprog  cprog                 [.] symbol1        cprog                 [.] hw_1_1           
     4.02%    cprog  [unknown]             [.] 00000000       cprog                 [.] callme           
     3.96%    cprog  cprog                 [.] ctr_addr       cprog                 [.] sw_4_1           
     3.94%    cprog  cprog                 [.] symbol2        cprog                 [.] hw_1_2           
     3.94%    cprog  cprog                 [.] success_3_1_3  cprog                 [.] sw_3_1           
     3.93%    cprog  cprog                 [.] sw_4_2         cprog                 [.] callme           
     2.08%    cprog  cprog                 [.] sw_3_2         cprog                 [.] callme           
     2.08%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_2           
     2.07%    cprog  cprog                 [.] hw_2_2         cprog                 [.] address2         
     2.07%    cprog  cprog                 [.] success_3_1_2  cprog                 [.] sw_3_1           
     2.07%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_2    
     2.07%    cprog  cprog                 [.] back2          cprog                 [.] callme           
     2.06%    cprog  cprog                 [.] hw_1_1         cprog                 [.] callme           
     1.99%    cprog  cprog                 [.] sw_4_1         cprog                 [.] ctr_addr         
     1.98%    cprog  cprog                 [.] sw_3_1_3       cprog                 [.] sw_3_1           
     1.98%    cprog  cprog                 [.] success_3_1_1  cprog                 [.] sw_3_1           
     1.98%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_3         
     1.98%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_1    
     1.98%    cprog  cprog                 [.] callme         cprog                 [.] sw_4_2           
     1.98%    cprog  cprog                 [.] back1          cprog                 [.] callme           
     1.97%    cprog  cprog                 [.] hw_1_1         cprog                 [.] symbol1          
     1.97%    cprog  cprog                 [.] hw_2_1         cprog                 [.] address1         
     1.97%    cprog  cprog                 [.] sw_3_1_1       cprog                 [.] sw_3_1           
     1.97%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_1         
     1.97%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_3    
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_1           
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_1           
     1.97%    cprog  cprog                 [.] hw_1_2         cprog                 [.] symbol2          
     1.97%    cprog  cprog                 [.] hw_1_2         cprog                 [.] callme           
     1.97%    cprog  cprog                 [.] sw_4_1         cprog                 [.] callme           
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] main             
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_2           
     1.96%    cprog  cprog                 [.] sw_3_1         cprog                 [.] callme           
     1.96%    cprog  cprog                 [.] sw_3_1_2       cprog                 [.] sw_3_1           
     1.96%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_2         
     0.12%    cprog  [unknown]             [.] 0xf7ab23e0     [unknown]             [.] 0xf7ab23c0       
     0.04%    cprog  [unknown]             [k] 00000000       cprog                 [k] callme           
     0.01%    cprog  [unknown]             [k] 00000000       cprog                 [k] sw_3_1           
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf       libc-2.11.2.so        [.] vfprintf         
     0.00%    cprog  libc-2.11.2.so        [.] _IO_do_write   libc-2.11.2.so        [.] _IO_do_write     
     0.00%    cprog  libc-2.11.2.so        [.] _IO_do_write   libc-2.11.2.so        [.] _IO_file_overflow
     0.00%    cprog  libc-2.11.2.so        [.] strchrnul      libc-2.11.2.so        [.] vfprintf         
     0.00%    cprog  libc-2.11.2.so        [.] strchrnul      libc-2.11.2.so        [.] strchrnul        
     0.00%    cprog  cprog                 [.] callme         cprog                 [.] hw_2_2           
     0.00%    cprog  [unknown]             [.] 00000000       libc-2.11.2.so        [.] _IO_do_write     

(10) perf record -j any_call,cond,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
    17.81%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1           
     7.19%    cprog  cprog                 [.] sw_4_2             cprog                 [.] lr_addr          
     7.12%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme           
     3.71%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_2    
     3.68%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_2           
     3.67%    cprog  cprog                 [.] hw_2_2             cprog                 [.] address2         
     3.57%    cprog  cprog                 [.] hw_2_1             cprog                 [.] address1         
     3.55%    cprog  cprog                 [.] hw_1_1             cprog                 [.] symbol1          
     3.55%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_1    
     3.55%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_1           
     3.54%    cprog  cprog                 [.] sw_3_1_1           cprog                 [.] sw_3_1           
     3.54%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_1         
     3.54%    cprog  cprog                 [.] sw_4_1             cprog                 [.] ctr_addr         
     3.54%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_1           
     3.52%    cprog  cprog                 [.] sw_3_1_3           cprog                 [.] sw_3_1           
     3.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_3         
     3.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_3    
     3.52%    cprog  cprog                 [.] sw_3_1_2           cprog                 [.] sw_3_1           
     3.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_2         
     3.51%    cprog  cprog                 [.] hw_1_2             cprog                 [.] symbol2          
     3.51%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_2           
     3.49%    cprog  cprog                 [.] callme             cprog                 [.] sw_4_2           
     0.22%    cprog  [unknown]             [.] 0xf7ca23f4         [unknown]             [.] 0xf7ca25d0       
     0.05%    cprog  [unknown]             [k] 00000000           cprog                 [k] callme           
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf           libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf           libc-2.11.2.so        [.] strchrnul        
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow  libc-2.11.2.so        [.] _IO_file_overflow
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul          libc-2.11.2.so        [.] strchrnul        
     0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_overflow
     0.01%    cprog  [unknown]             [k] 00000000           cprog                 [k] sw_3_1        

(11) perf record -j any_call,cond,any_ret,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object        Target Symbol
# ........  .......  ....................  .................  ....................  ...................
#
     9.72%    cprog  [unknown]             [.] 00000000       cprog                 [.] sw_3_1         
     3.99%    cprog  cprog                 [.] ctr_addr       cprog                 [.] sw_4_1         
     3.98%    cprog  cprog                 [.] success_3_1_3  cprog                 [.] sw_3_1         
     3.98%    cprog  cprog                 [.] symbol1        cprog                 [.] hw_1_1         
     3.98%    cprog  cprog                 [.] symbol2        cprog                 [.] hw_1_2         
     3.98%    cprog  cprog                 [.] sw_4_2         cprog                 [.] lr_addr        
     3.98%    cprog  cprog                 [.] sw_4_2         cprog                 [.] callme         
     3.97%    cprog  cprog                 [.] lr_addr        cprog                 [.] sw_4_2         
     3.91%    cprog  [unknown]             [.] 00000000       cprog                 [.] callme         
     2.22%    cprog  cprog                 [.] sw_4_1         cprog                 [.] ctr_addr       
     2.22%    cprog  cprog                 [.] callme         cprog                 [.] sw_4_2         
     2.22%    cprog  cprog                 [.] hw_2_1         cprog                 [.] address1       
     2.22%    cprog  cprog                 [.] back1          cprog                 [.] callme         
     2.21%    cprog  cprog                 [.] hw_1_2         cprog                 [.] symbol2        
     2.21%    cprog  cprog                 [.] sw_3_1         cprog                 [.] callme         
     2.21%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_2         
     2.21%    cprog  cprog                 [.] sw_3_1_1       cprog                 [.] sw_3_1         
     2.21%    cprog  cprog                 [.] sw_3_1_3       cprog                 [.] sw_3_1         
     2.21%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_1       
     2.21%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_3       
     2.21%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_1         
     2.20%    cprog  cprog                 [.] hw_1_1         cprog                 [.] symbol1        
     2.20%    cprog  cprog                 [.] sw_3_1_2       cprog                 [.] sw_3_1         
     2.20%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_2       
     2.20%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_1         
     1.77%    cprog  cprog                 [.] hw_1_1         cprog                 [.] callme         
     1.77%    cprog  cprog                 [.] success_3_1_1  cprog                 [.] sw_3_1         
     1.77%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_1  
     1.77%    cprog  cprog                 [.] success_3_1_2  cprog                 [.] sw_3_1         
     1.77%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_2  
     1.77%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_3  
     1.76%    cprog  cprog                 [.] hw_1_2         cprog                 [.] callme         
     1.76%    cprog  cprog                 [.] sw_4_1         cprog                 [.] callme         
     1.76%    cprog  cprog                 [.] sw_3_2         cprog                 [.] callme         
     1.76%    cprog  cprog                 [.] callme         cprog                 [.] main           
     1.76%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_2         
     1.75%    cprog  cprog                 [.] hw_2_2         cprog                 [.] address2       
     1.75%    cprog  cprog                 [.] back2          cprog                 [.] callme         
     0.13%    cprog  [unknown]             [.] 0xf7dd23e0     [unknown]             [.] 0xf7dd23c0     
     0.07%    cprog  [unknown]             [k] 00000000       cprog                 [k] callme         
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf       libc-2.11.2.so        [.] vfprintf       
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf       libc-2.11.2.so        [.] _IO_file_xsputn
     0.00%    cprog  [unknown]             [.] 00000000       libc-2.11.2.so        [.] vfprintf       

Test application program
========================
(1) Makefile:
--------------------------------------------
all: sample.o cprog of.cprog of.sample

sample.o: sample.s
        as -o sample.o sample.s
cprog: cprog.c sample.o
        gcc -o cprog cprog.c sample.o
of.sample: sample.o
        objdump -d sample.o > of.sample
of.cprog: cprog
        objdump -d cprog > of.cprog
clean:
        rm sample.o cprog of.sample of.cprog
---------------------------------------------
(2) cprog.c
---------------------------------------------
#include <stdio.h>
#define LOOP_COUNT 10000

extern void callme(void);

int main(int argc, char *argv[])
{
        int i;
        for(i = 0; i < LOOP_COUNT; i++)
                callme();

        printf("end");
        return 0;
}
---------------------------------------------
(3) sample.S
---------------------------------------------
# r25, r26, r27 will be used as first level, second level
# and third level stack for LR. Register r20, r21, r22, r23
# r24 will be used for general programming purpose.

.data

msg:
	.string "BHRB filter tests\n"
	len = . - msg
msg_1_1:
	.string "Test: hw_1_1\n"
	len_1_1 = 13
msg_1_2:
	.string "Test: hw_1_2\n"
	len_1_2 = 13
msg_2_1:
	.string "Test: hw_2_1\n"
	len_2_1 = 13
msg_2_2:
	.string "Test: hw_2_2\n"
	len_2_2 = 13
msg_3_1:
	.string "Test: sw_3_1\n"
	len_3_1 = 13
msg_3_1_1:
	.string "Test: sw_3_1_1\n"
	len_3_1_1 = 15
msg_3_1_2:
	.string "Test: sw_3_1_2\n"
	len_3_1_2 = 15
msg_3_1_3:
        .string "Test: sw_3_1_3\n"
        len_3_1_3 = 15
msg_3_2:
	.string "Test: sw_3_2\n"
	len_3_3 = 13
msg_4_1:
	.string "Test: sw_4_1\n"
	len_4_1 = 13
msg_4_2:
	.string "Test: sw_4_2\n"
	len_4_2 = 13

hw_3_1_1_passed:
	.string "\thw_3_1_1_passed\n\n"
	len_hw_3_1_1_passed = 18
hw_3_1_2_passed:
	.string "\thw_3_1_2_passed\n\n"
	len_hw_3_1_2_passed = 18
hw_3_1_3_passed:
	.string "\thw_3_1_3_passed\n\n"
	len_hw_3_1_3_passed = 18

hw_2_1_passed:
	.string "\thw_2_1_passed\n\n"
	len_hw_2_1_passed = 16

hw_2_2_passed:
	.string "\thw_2_2_passed\n\n"
	len_hw_2_2_passed = 16

hw_1_1_passed:
	.string "\thw_1_1_passed\n\n"
	len_hw_1_1_passed = 16

hw_1_2_passed:
	.string "\thw_1_2_passed\n\n"
	len_hw_1_2_passed = 16

hw_4_1_passed:
	.string "\thw_4_1_passed\n\n"
	len_hw_4_1_passed = 16

hw_4_2_passed:
	.string "\thw_4_2_passed\n\n"
	len_hw_4_2_passed = 16

msg_error:
	.string "\tError\n"
	len_error = 7
.text
	.global callme
	.global hw_1_1
	.global hw_1_2
	.global hw_2_1
	.global hw_2_2

# HW filter test symbols
symbol1:
	# Print "hw_1_1_passed"
	li      0, 4
	li      3, 1
	lis     4, hw_1_1_passed@ha
	addi    4, 4, hw_1_1_passed@l
	li      5, len_hw_1_1_passed
	sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET

hw_1_1:
        # Save LR - second level
        mflr 26

	# Print "hw_1_1 called"
	li      0, 4
	li      3, 1
	lis     4, msg_1_1@ha
	addi    4, 4, msg_1_1@l
	li      5, len_1_1
	sc

	bl symbol1			# PERF_SAMPLE_BRANCH_ANY_CALL

	# Restore LR
	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

symbol2:
        # Print "Symbol2 taken"
        li      0, 4
        li      3, 1
        lis     4, hw_1_2_passed@ha
        addi    4, 4, hw_1_2_passed@l
        li      5, len_hw_1_2_passed
        sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET
hw_1_2:
	# Save LR - second level
	mflr 26

        # Print "hw_1_2 called"
        li      0, 4
        li      3, 1
        lis     4, msg_1_2@ha
        addi    4, 4, msg_1_2@l
        li      5, len_1_2
        sc

	li 4,20
	cmpi 0,4,20
	bcl 12, 4*cr0+2, symbol2	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# HW filter test

address1: 
	# Print "hw_2_1_passed"
        li      0, 4
        li      3, 1
        lis     4, hw_2_1_passed@ha
        addi    4, 4, hw_2_1_passed@l
        li      5, len_hw_2_1_passed
        sc
	b  back1			# PERF_SAMPLE_BRANCH_ANY

hw_2_1:
	# Print "hw_2_1 called"
	li      0, 4
	li      3, 1
	lis     4, msg_2_1@ha
	addi    4, 4, msg_2_1@l
	li      5, len_2_1
	sc
	
	# Simple conditional branch (equal)
	li	20, 12
	cmpi	3, 20, 12
	bc	12, 4*cr3+2, address1	# PERF_SAMPLE_BRANCH_COND

back1:
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

address2:
        # Print "hw_2_2_passed"
        li      0, 4
        li      3, 1
        lis     4, hw_2_2_passed@ha
        addi    4, 4, hw_2_2_passed@l
        li      5, len_hw_2_2_passed
        sc
        b  back2			# PERF_SAMPLE_BRANCH_ANY

hw_2_2:
        # Print "hw_2_2 called"
	li      0, 4
	li      3, 1
	lis     4, msg_2_2@ha
	addi    4, 4, msg_2_2@l
	li      5, len_2_2
	sc

	# Simple conditional branch (less than)
	li	20, 12
	cmpi	4, 20, 20
	bc	12, 4*cr4+0, address2	# PERF_SAMPLE_BRANCH_COND
back2:
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# SW filter test symbols
sw_3_1_1:
	# Print "Test: sw_3_1_1"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_1@ha
        addi    4, 4, msg_3_1_1@l
        li      5, len_3_1_1
        sc

	li	22,0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 10
	bclr	12, 2			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND

	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc
	
	# Mark the error
	li 	22, 1
	
	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_3_1_2:
        # Print "Test: sw_3_1_2"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_2@ha
        addi    4, 4, msg_3_1_2@l
        li      5, len_3_1_2
        sc

	li	23, 0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 20
	bclr	12, 0			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
        
	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc

	# Mark the error
	li 	23, 1

	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_3_1_3:
	# Print "Test: sw_3_1_3"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_3@ha
        addi    4, 4, msg_3_1_3@l
        li      5, len_3_1_3
        sc

	li	24, 0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 5
	bclr	12, 1			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
	
	# Mark the error
	li 	24, 1

	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc

	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

success_3_1_1:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_1_passed@ha
        addi    4, 4, hw_3_1_1_passed@l
        li      5, len_hw_3_1_1_passed
        sc
	blr

success_3_1_2:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_2_passed@ha
        addi    4, 4, hw_3_1_2_passed@l
        li      5, len_hw_3_1_2_passed
        sc
	blr

success_3_1_3:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_3_passed@ha
        addi    4, 4, hw_3_1_3_passed@l
        li      5, len_hw_3_1_3_passed
        sc
	blr

sw_3_1:
	# Save LR
	mflr 26

        # Print "Test: sw_3_1"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1@ha
        addi    4, 4, msg_3_1@l
        li      5, len_3_1
        sc

	# Equal comparison condition
	bl sw_3_1_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 22, 0
	bcl	12, 2, success_3_1_1	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	# LT comparison condition
	bl sw_3_1_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 23, 0
	bcl	12, 2, success_3_1_2	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	# GT comparison condition
	bl sw_3_1_3			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 24, 0
	bcl	12, 2, success_3_1_3	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
sw_3_2:
	# Print "Test: sw_3_2"
	li      0, 4
	li      3, 1
	lis     4, msg_3_2@ha
	addi    4, 4, msg_3_2@l
	li      5, len_3_1
	sc

	# FIXME: Anything more here ?
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# Indirect call tests

# CTR
ctr_addr:
        # Print "bcctr taken"
        li      0, 4
        li      3, 1
        lis     4, hw_4_1_passed@ha
        addi    4, 4, hw_4_1_passed@l
        li      5, len_hw_4_1_passed
        sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET
sw_4_1:
	# Save LR
	mflr	26

	# Print "sw_4_1 called"
        li      0, 4
        li      3, 1
        lis     4, msg_4_1@ha
        addi    4, 4, msg_4_1@l
        li      5, len_4_1
        sc

	# Save address in CTR
	lis 	20, ctr_addr@ha
	addi	20, 20, ctr_addr@l
	mtctr   20


	# Compare and jump to CTR
	li 	21, 10
	cmpi	0, 21, 10
	bcctrl  12, 4*cr0+2		# PERF_SAMPLE_BRANCH_IND_CALL

	mtlr	26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
# LR
lr_addr:
	# Print "bclrl taken"
	li      0, 4
	li      3, 1
	lis     4, hw_4_2_passed@ha
	addi    4, 4, hw_4_2_passed@l
	li      5, len_hw_4_2_passed
	sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_4_2:
	# Save LR
	mflr	26

        # Print "Test: sw_4_2"
        li      0, 4
        li      3, 1
        lis     4, msg_4_2@ha
        addi    4, 4, msg_4_2@l
        li      5, len_4_2
        sc

	# Save address in LR
	lis 	20, lr_addr@ha
	addi	20, 20, lr_addr@l
	mtlr	20


	# Compare and jump to CTR
	li 	21, 10
	cmpi	0, 21, 10
	bclrl   12, 4*cr0+2		# PERF_SAMPLE_BRANCH_IND_CALL

	# Restore LR
	mtlr	26	
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

callme:
	# Save LR
	mflr	25

	# Print "Branch filter Test"
	li	0, 4
	li	3, 1
	lis 	4, msg@ha
	addi	4, 4, msg@l
	li	5, len
	sc

	# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_1_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_1_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	# PERF_SAMPLE_BRANCH_COND
	bl hw_2_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_2_2			# PERF_SAMPLE_BRANCH_ANY_CALL

	# PERF_SAMPLE_BRANCH_ANY_RET
	bl sw_3_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl sw_3_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	# PERF_SAMPLE_BRANCH_IND_CALL
	bl sw_4_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl sw_4_2			# PERF_SAMPLE_BRANCH_ANY_CALL

	# Restore LR
	mtlr 25
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
--------------------------------------------------------------------

Anshuman Khandual (10):
  perf: Add PERF_SAMPLE_BRANCH_COND
  powerpc, perf: Enable conditional branch filter for POWER8
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter
  powerpc, perf: Change the name of HW PMU branch filter tracking
    variable
  powerpc, lib: Add new branch instruction analysis support functions
  powerpc, perf: Enable SW filtering in branch stack sampling framework
  power8, perf: Change BHRB branch filter configuration
  powerpc, perf: Cleanup SW branch filter list look up

 arch/powerpc/include/asm/code-patching.h     |  30 ++++
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/lib/code-patching.c             |  54 +++++-
 arch/powerpc/perf/core-book3s.c              | 260 +++++++++++++++++++++++++--
 arch/powerpc/perf/power8-pmu.c               |  75 ++++++--
 arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
 include/uapi/linux/perf_event.h              |   3 +-
 tools/perf/Documentation/perf-record.txt     |   3 +-
 tools/perf/builtin-record.c                  |   1 +
 9 files changed, 404 insertions(+), 33 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH V4 00/10] perf: New conditional branch filter
@ 2013-12-04 10:32 ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

		This patchset is the re-spin of the original branch stack sampling
patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
also enables SW based branch filtering support for book3s powerpc platforms which
have PMU HW backed branch stack sampling support. 

Summary of code changes in this patchset:

(1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
(2) Add the "cond" branch filter options in the "perf record" tool
(3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
(4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
(5) Update the documentation regarding "perf record" tool
(6) Add some new powerpc instruction analysis functions in code-patching library
(7) Enable SW based branch filter support for powerpc book3s
(8) Changed BHRB configuration in POWER8 to accommodate SW branch filters 

With this new SW enablement, the branch filter support for book3s platforms have
been extended to include all these combinations discussed below with a sample test
application program (included here).

Changes in V2
=============
(1) Enabled PPC64 SW branch filtering support
(2) Incorporated changes required for all previous comments

Changes in V3
=============
(1) Split the SW branch filter enablement into multiple patches
(2) Added PMU neutral SW branch filtering code, PMU specific HW branch filtering code
(3) Added new instruction analysis functionality into powerpc code-patching library
(4) Changed name for some of the functions
(5) Fixed couple of spelling mistakes
(6) Changed code documentation in multiple places

Changes in V4
=============
(1) Changed the commit message for patch (01/10)
(2) Changed the patch (02/10) to accommodate review comments from Michael Ellerman
(3) Rebased the patchset against latest Linus's tree

PMU HW branch filters
=====================
(1) perf record -j any_call -e branch-misses:u ./cprog
# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object             Target Symbol
# ........  .......  ....................  .....................  ....................  ........................
#
     7.00%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_2            
     6.99%    cprog  cprog                 [.] hw_1_1             cprog                 [.] symbol1             
     6.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_2       
     5.41%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_3            
     5.40%    cprog  cprog                 [.] hw_1_2             cprog                 [.] symbol2             
     5.40%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_2              
     5.40%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_1       
     5.40%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_1              
     5.39%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_1            
     5.39%    cprog  cprog                 [.] sw_4_2             cprog                 [.] lr_addr             
     5.39%    cprog  cprog                 [.] callme             cprog                 [.] sw_4_2              
     5.37%    cprog  [unknown]             [.] 00000000           cprog                 [.] ctr_addr            
     4.30%    cprog  cprog                 [.] callme             cprog                 [.] hw_2_1              
     4.28%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_1              
     3.82%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_3       
     3.81%    cprog  cprog                 [.] callme             cprog                 [.] hw_2_2              
     3.81%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_2              
     2.71%    cprog  [unknown]             [.] 00000000           cprog                 [.] lr_addr             
     2.70%    cprog  cprog                 [.] main               cprog                 [.] callme              
     2.70%    cprog  cprog                 [.] sw_4_1             cprog                 [.] ctr_addr            
     2.70%    cprog  cprog                 [.] callme             cprog                 [.] sw_4_1              
     0.08%    cprog  [unknown]             [.] 0xf78676c4         [unknown]             [.] 0xf78522c0          
     0.02%    cprog  [unknown]             [k] 00000000           cprog                 [k] ctr_addr            
     0.01%    cprog  [kernel.kallsyms]     [.] .power_pmu_enable  [kernel.kallsyms]     [.] .power8_compute_mmcr
     0.00%    cprog  ld-2.11.2.so          [.] malloc             [unknown]             [.] 0xf786b380          
     0.00%    cprog  ld-2.11.2.so          [.] calloc             [unknown]             [.] 0xf786b390          
     0.00%    cprog  cprog                 [.] main               [unknown]             [.] 0x10000950          
     0.00%    cprog  [unknown]             [.] 00000000           [kernel.kallsyms]     [.] .power_pmu_enable  
    
(2) perf record -j cond -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  .......................  ....................  .......................
#
    27.73%    cprog  [unknown]             [.] 00000000             cprog                 [.] callme             
    13.03%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1             
     5.64%    cprog  [unknown]             [.] 00000000             cprog                 [.] main               
     5.62%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_4_2             
     5.46%    cprog  cprog                 [.] sw_4_2               cprog                 [.] lr_addr            
     5.40%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_4_1             
     3.72%    cprog  cprog                 [.] hw_2_1               cprog                 [.] callme             
     3.71%    cprog  cprog                 [.] main                 cprog                 [.] hw_1_1             
     3.71%    cprog  cprog                 [.] sw_3_1_2             cprog                 [.] sw_3_1             
     3.70%    cprog  cprog                 [.] sw_3_1_3             cprog                 [.] sw_3_1             
     3.70%    cprog  cprog                 [.] sw_4_1               cprog                 [.] ctr_addr           
     3.69%    cprog  cprog                 [.] hw_1_2               cprog                 [.] hw_1_2             
     3.69%    cprog  cprog                 [.] hw_2_2               cprog                 [.] callme             
     3.68%    cprog  cprog                 [.] sw_3_1_1             cprog                 [.] sw_3_1             
     1.93%    cprog  [unknown]             [.] 00000000             cprog                 [.] lr_addr            
     1.78%    cprog  [unknown]             [.] 00000000             cprog                 [.] hw_1_2             
     1.78%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_3_1             
     1.76%    cprog  [unknown]             [.] 00000000             cprog                 [.] hw_1_1             
     0.12%    cprog  [unknown]             [.] 0xf7bb25dc           [unknown]             [.] 0xf7bb27e4         
     0.07%    cprog  [unknown]             [k] 00000000             cprog                 [k] callme             
     0.07%    cprog  [unknown]             [k] 00000000             cprog                 [k] sw_4_1             
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] _IO_file_doallocate
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] isatty             
     0.00%    cprog  [unknown]             [.] 00000000             libc-2.11.2.so        [.] _IO_file_doallocate

SW based branch filters
=======================
(3) perf record -j any_ret -e branch-misses:u ./cprog 

# Overhead  Command  Source Shared Object         Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  ....................  ....................  .....................
#
    15.37%    cprog  [unknown]             [.] 00000000          cprog                 [.] sw_3_1           
     6.46%    cprog  cprog                 [.] success_3_1_3     cprog                 [.] sw_3_1           
     6.45%    cprog  cprog                 [.] symbol1           cprog                 [.] hw_1_1           
     6.41%    cprog  [unknown]             [.] 00000000          cprog                 [.] callme           
     6.39%    cprog  cprog                 [.] ctr_addr          cprog                 [.] sw_4_1           
     6.37%    cprog  cprog                 [.] symbol2           cprog                 [.] hw_1_2           
     6.36%    cprog  cprog                 [.] sw_4_2            cprog                 [.] callme           
     6.35%    cprog  cprog                 [.] lr_addr           cprog                 [.] sw_4_2           
     3.97%    cprog  cprog                 [.] back1             cprog                 [.] callme           
     3.93%    cprog  cprog                 [.] sw_3_1_2          cprog                 [.] sw_3_1           
     3.93%    cprog  cprog                 [.] sw_3_1            cprog                 [.] callme           
     3.86%    cprog  cprog                 [.] sw_3_1_3          cprog                 [.] sw_3_1           
     3.84%    cprog  cprog                 [.] sw_3_1_1          cprog                 [.] sw_3_1           
     2.54%    cprog  cprog                 [.] success_3_1_1     cprog                 [.] sw_3_1           
     2.54%    cprog  cprog                 [.] sw_4_1            cprog                 [.] callme           
     2.54%    cprog  cprog                 [.] hw_1_1            cprog                 [.] callme           
     2.53%    cprog  cprog                 [.] sw_3_2            cprog                 [.] callme           
     2.52%    cprog  cprog                 [.] callme            cprog                 [.] main             
     2.51%    cprog  cprog                 [.] hw_1_2            cprog                 [.] callme           
     2.51%    cprog  cprog                 [.] back2             cprog                 [.] callme           
     2.51%    cprog  cprog                 [.] success_3_1_2     cprog                 [.] sw_3_1           
     0.07%    cprog  [unknown]             [k] 00000000          cprog                 [k] callme           
     0.02%    cprog  [unknown]             [.] 00000000          [unknown]             [.] 0xf7e5c004       
     0.01%    cprog  libc-2.11.2.so        [.] __errno_location  libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  [unknown]             [.] 00000000          libc-2.11.2.so        [.] _IO_file_overflow

(4) perf record -j ind_call  -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object        Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  ...................  ....................  .....................
#
    48.04%    cprog  [unknown]             [.] 00000000         cprog                 [.] sw_3_1           
    19.96%    cprog  cprog                 [.] sw_4_2           cprog                 [.] lr_addr          
    19.69%    cprog  [unknown]             [.] 00000000         cprog                 [.] callme           
    12.04%    cprog  cprog                 [.] sw_4_1           cprog                 [.] ctr_addr         
     0.18%    cprog  [unknown]             [k] 00000000         cprog                 [k] callme           
     0.02%    cprog  libc-2.11.2.so        [.] _IO_file_xsputn  libc-2.11.2.so        [.] _IO_file_overflow
     0.02%    cprog  [unknown]             [.] 00000000         libc-2.11.2.so        [.] _IO_file_xsputn  
     0.02%    cprog  [unknown]             [.] 00000000         ld-2.11.2.so          [.] malloc           
     0.02%    cprog  [unknown]             [k] 00000000         cprog                 [k] sw_3_1           

(5) perf record -j any_call,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object            Source Symbol  Target Shared Object            Target Symbol
# ........  .......  ....................  .......................  ....................  .......................
#
    10.36%    cprog  [unknown]             [.] 00000000             cprog                 [.] sw_3_1             
     4.18%    cprog  cprog                 [.] symbol1              cprog                 [.] hw_1_1             
     4.18%    cprog  cprog                 [.] success_3_1_3        cprog                 [.] sw_3_1             
     4.17%    cprog  cprog                 [.] sw_4_2               cprog                 [.] lr_addr            
     4.16%    cprog  cprog                 [.] sw_4_2               cprog                 [.] callme             
     4.15%    cprog  cprog                 [.] ctr_addr             cprog                 [.] sw_4_1             
     4.15%    cprog  cprog                 [.] lr_addr              cprog                 [.] sw_4_2             
     4.14%    cprog  cprog                 [.] symbol2              cprog                 [.] hw_1_2             
     4.14%    cprog  [unknown]             [.] 00000000             cprog                 [.] callme             
     2.15%    cprog  cprog                 [.] sw_3_1               cprog                 [.] callme             
     2.14%    cprog  cprog                 [.] hw_1_1               cprog                 [.] symbol1            
     2.14%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_1             
     2.14%    cprog  cprog                 [.] callme               cprog                 [.] sw_4_2             
     2.13%    cprog  cprog                 [.] back1                cprog                 [.] callme             
     2.12%    cprog  cprog                 [.] sw_3_1_2             cprog                 [.] sw_3_1             
     2.12%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_2           
     2.11%    cprog  cprog                 [.] sw_3_1_3             cprog                 [.] sw_3_1             
     2.11%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_3           
     2.11%    cprog  cprog                 [.] sw_4_1               cprog                 [.] ctr_addr           
     2.10%    cprog  cprog                 [.] hw_1_2               cprog                 [.] symbol2            
     2.10%    cprog  cprog                 [.] sw_3_1_1             cprog                 [.] sw_3_1             
     2.10%    cprog  cprog                 [.] sw_3_1               cprog                 [.] sw_3_1_1           
     2.10%    cprog  cprog                 [.] callme               cprog                 [.] hw_1_2             
     2.10%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_1             
     2.05%    cprog  cprog                 [.] success_3_1_1        cprog                 [.] sw_3_1             
     2.05%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_1      
     2.05%    cprog  cprog                 [.] success_3_1_2        cprog                 [.] sw_3_1             
     2.05%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_2      
     2.04%    cprog  cprog                 [.] hw_1_1               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] back2                cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] sw_4_1               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] callme               cprog                 [.] main               
     2.04%    cprog  cprog                 [.] hw_1_2               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] sw_3_2               cprog                 [.] callme             
     2.04%    cprog  cprog                 [.] callme               cprog                 [.] sw_3_2             
     2.03%    cprog  cprog                 [.] sw_3_1               cprog                 [.] success_3_1_3      
     0.03%    cprog  [unknown]             [k] 00000000             cprog                 [k] callme             
     0.01%    cprog  [unknown]             [.] 0xf7e79bb0           [unknown]             [.] 0xf7e64088         
     0.00%    cprog  libc-2.11.2.so        [.] _IO_file_doallocate  libc-2.11.2.so        [.] mmap               
     0.00%    cprog  libc-2.11.2.so        [.] mmap                 libc-2.11.2.so        [.] _IO_file_doallocate
     0.00%    cprog  [unknown]             [.] 0xf7e7589c           libc-2.11.2.so        [.] printf             
     0.00%    cprog  [unknown]             [k] 00000000             cprog                 [k] sw_3_1          

(6) perf record -j any_call,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object      Target Symbol
# ........  .......  ....................  ..............  ....................  .................
#
    23.09%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1       
     8.99%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr      
     8.92%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme       
     5.18%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_2
     5.16%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_1
     5.16%    cprog  cprog                 [.] callme      cprog                 [.] sw_3_2       
     5.12%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_3
     3.85%    cprog  cprog                 [.] sw_3_1      cprog                 [.] sw_3_1_1     
     3.85%    cprog  cprog                 [.] callme      cprog                 [.] sw_3_1       
     3.84%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr     
     3.82%    cprog  cprog                 [.] hw_1_1      cprog                 [.] symbol1      
     3.82%    cprog  cprog                 [.] sw_3_1      cprog                 [.] sw_3_1_2     
     3.82%    cprog  cprog                 [.] sw_3_1      cprog                 [.] sw_3_1_3     
     3.82%    cprog  cprog                 [.] callme      cprog                 [.] hw_1_1       
     3.81%    cprog  cprog                 [.] hw_1_2      cprog                 [.] symbol2      
     3.81%    cprog  cprog                 [.] callme      cprog                 [.] hw_1_2       
     3.81%    cprog  cprog                 [.] callme      cprog                 [.] sw_4_2       
     0.05%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme       
     0.03%    cprog  [unknown]             [.] 0xf7f7232c  [unknown]             [.] 0xf7f72334   
     0.01%    cprog  ld-2.11.2.so          [.] malloc      [unknown]             [.] 0xf7f8b380   
     0.01%    cprog  cprog                 [.] main        [unknown]             [.] 0x10000950   
     0.01%    cprog  [unknown]             [.] 00000000    ld-2.11.2.so          [.] malloc       
     0.01%    cprog  [unknown]             [.] 00000000    cprog                 [.] main         

(7) perf record -j cond,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
    12.18%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1           
     4.90%    cprog  cprog                 [.] sw_4_2             cprog                 [.] lr_addr          
     4.88%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme           
     4.88%    cprog  cprog                 [.] lr_addr            cprog                 [.] sw_4_2           
     4.88%    cprog  cprog                 [.] sw_4_2             cprog                 [.] callme           
     4.86%    cprog  cprog                 [.] symbol1            cprog                 [.] hw_1_1           
     4.86%    cprog  cprog                 [.] success_3_1_3      cprog                 [.] sw_3_1           
     4.85%    cprog  cprog                 [.] symbol2            cprog                 [.] hw_1_2           
     4.85%    cprog  cprog                 [.] ctr_addr           cprog                 [.] sw_4_1           
     2.47%    cprog  cprog                 [.] sw_3_1_3           cprog                 [.] sw_3_1           
     2.46%    cprog  cprog                 [.] back1              cprog                 [.] callme           
     2.45%    cprog  cprog                 [.] hw_1_1             cprog                 [.] callme           
     2.45%    cprog  cprog                 [.] hw_2_1             cprog                 [.] address1         
     2.44%    cprog  cprog                 [.] hw_1_2             cprog                 [.] symbol2          
     2.44%    cprog  cprog                 [.] sw_3_1_1           cprog                 [.] sw_3_1           
     2.44%    cprog  cprog                 [.] sw_3_2             cprog                 [.] callme           
     2.44%    cprog  cprog                 [.] success_3_1_1      cprog                 [.] sw_3_1           
     2.44%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_1    
     2.44%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_3    
     2.43%    cprog  cprog                 [.] callme             cprog                 [.] main             
     2.43%    cprog  cprog                 [.] hw_2_2             cprog                 [.] address2         
     2.43%    cprog  cprog                 [.] sw_3_1_2           cprog                 [.] sw_3_1           
     2.43%    cprog  cprog                 [.] success_3_1_2      cprog                 [.] sw_3_1           
     2.43%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_2    
     2.43%    cprog  cprog                 [.] sw_4_1             cprog                 [.] callme           
     2.42%    cprog  cprog                 [.] sw_3_1             cprog                 [.] callme           
     2.42%    cprog  cprog                 [.] sw_4_1             cprog                 [.] ctr_addr         
     2.42%    cprog  cprog                 [.] back2              cprog                 [.] callme           
     2.40%    cprog  cprog                 [.] hw_1_2             cprog                 [.] callme           
     0.10%    cprog  [unknown]             [.] 0xf78923e0         [unknown]             [.] 0xf78923c0       
     0.03%    cprog  [unknown]             [k] 00000000           cprog                 [k] callme           
     0.01%    cprog  [unknown]             [k] 00000000           cprog                 [k] sw_3_1           
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf           libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow  [unknown]             [.] 0x0fee0100       
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul          libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul          libc-2.11.2.so        [.] strchrnul        
     0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_overflow


(8) perf record -j cond,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object   Source Symbol  Target Shared Object        Target Symbol
# ........  .......  ....................  ..............  ....................  ...................
#
    26.21%    cprog  [unknown]             [.] 00000000    cprog                 [.] sw_3_1         
    10.50%    cprog  cprog                 [.] sw_4_2      cprog                 [.] lr_addr        
    10.38%    cprog  [unknown]             [.] 00000000    cprog                 [.] callme         
     5.31%    cprog  cprog                 [.] sw_3_1_2    cprog                 [.] sw_3_1         
     5.30%    cprog  cprog                 [.] sw_3_1_1    cprog                 [.] sw_3_1         
     5.27%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_2  
     5.26%    cprog  cprog                 [.] hw_2_2      cprog                 [.] address2       
     5.25%    cprog  cprog                 [.] hw_1_2      cprog                 [.] symbol2        
     5.25%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_3  
     5.24%    cprog  cprog                 [.] hw_2_1      cprog                 [.] address1       
     5.23%    cprog  cprog                 [.] sw_4_1      cprog                 [.] ctr_addr       
     5.20%    cprog  cprog                 [.] sw_3_1_3    cprog                 [.] sw_3_1         
     5.19%    cprog  cprog                 [.] sw_3_1      cprog                 [.] success_3_1_1  
     0.24%    cprog  [unknown]             [.] 0xf7cf23e0  [unknown]             [.] 0xf7cf23c0     
     0.11%    cprog  [unknown]             [k] 00000000    cprog                 [k] callme         
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf    libc-2.11.2.so        [.] vfprintf       
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf    libc-2.11.2.so        [.] _IO_file_xsputn
     0.01%    cprog  [unknown]             [.] 00000000    libc-2.11.2.so        [.] vfprintf       
     0.01%    cprog  [unknown]             [k] 00000000    cprog                 [k] sw_3_1         

(9) perf record -j any_call,cond,any_ret -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .................  ....................  .....................
#
     9.96%    cprog  [unknown]             [.] 00000000       cprog                 [.] sw_3_1           
     4.06%    cprog  cprog                 [.] sw_4_2         cprog                 [.] lr_addr          
     4.04%    cprog  cprog                 [.] lr_addr        cprog                 [.] sw_4_2           
     4.03%    cprog  cprog                 [.] symbol1        cprog                 [.] hw_1_1           
     4.02%    cprog  [unknown]             [.] 00000000       cprog                 [.] callme           
     3.96%    cprog  cprog                 [.] ctr_addr       cprog                 [.] sw_4_1           
     3.94%    cprog  cprog                 [.] symbol2        cprog                 [.] hw_1_2           
     3.94%    cprog  cprog                 [.] success_3_1_3  cprog                 [.] sw_3_1           
     3.93%    cprog  cprog                 [.] sw_4_2         cprog                 [.] callme           
     2.08%    cprog  cprog                 [.] sw_3_2         cprog                 [.] callme           
     2.08%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_2           
     2.07%    cprog  cprog                 [.] hw_2_2         cprog                 [.] address2         
     2.07%    cprog  cprog                 [.] success_3_1_2  cprog                 [.] sw_3_1           
     2.07%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_2    
     2.07%    cprog  cprog                 [.] back2          cprog                 [.] callme           
     2.06%    cprog  cprog                 [.] hw_1_1         cprog                 [.] callme           
     1.99%    cprog  cprog                 [.] sw_4_1         cprog                 [.] ctr_addr         
     1.98%    cprog  cprog                 [.] sw_3_1_3       cprog                 [.] sw_3_1           
     1.98%    cprog  cprog                 [.] success_3_1_1  cprog                 [.] sw_3_1           
     1.98%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_3         
     1.98%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_1    
     1.98%    cprog  cprog                 [.] callme         cprog                 [.] sw_4_2           
     1.98%    cprog  cprog                 [.] back1          cprog                 [.] callme           
     1.97%    cprog  cprog                 [.] hw_1_1         cprog                 [.] symbol1          
     1.97%    cprog  cprog                 [.] hw_2_1         cprog                 [.] address1         
     1.97%    cprog  cprog                 [.] sw_3_1_1       cprog                 [.] sw_3_1           
     1.97%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_1         
     1.97%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_3    
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_1           
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_1           
     1.97%    cprog  cprog                 [.] hw_1_2         cprog                 [.] symbol2          
     1.97%    cprog  cprog                 [.] hw_1_2         cprog                 [.] callme           
     1.97%    cprog  cprog                 [.] sw_4_1         cprog                 [.] callme           
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] main             
     1.97%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_2           
     1.96%    cprog  cprog                 [.] sw_3_1         cprog                 [.] callme           
     1.96%    cprog  cprog                 [.] sw_3_1_2       cprog                 [.] sw_3_1           
     1.96%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_2         
     0.12%    cprog  [unknown]             [.] 0xf7ab23e0     [unknown]             [.] 0xf7ab23c0       
     0.04%    cprog  [unknown]             [k] 00000000       cprog                 [k] callme           
     0.01%    cprog  [unknown]             [k] 00000000       cprog                 [k] sw_3_1           
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf       libc-2.11.2.so        [.] vfprintf         
     0.00%    cprog  libc-2.11.2.so        [.] _IO_do_write   libc-2.11.2.so        [.] _IO_do_write     
     0.00%    cprog  libc-2.11.2.so        [.] _IO_do_write   libc-2.11.2.so        [.] _IO_file_overflow
     0.00%    cprog  libc-2.11.2.so        [.] strchrnul      libc-2.11.2.so        [.] vfprintf         
     0.00%    cprog  libc-2.11.2.so        [.] strchrnul      libc-2.11.2.so        [.] strchrnul        
     0.00%    cprog  cprog                 [.] callme         cprog                 [.] hw_2_2           
     0.00%    cprog  [unknown]             [.] 00000000       libc-2.11.2.so        [.] _IO_do_write     

(10) perf record -j any_call,cond,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object          Source Symbol  Target Shared Object          Target Symbol
# ........  .......  ....................  .....................  ....................  .....................
#
    17.81%    cprog  [unknown]             [.] 00000000           cprog                 [.] sw_3_1           
     7.19%    cprog  cprog                 [.] sw_4_2             cprog                 [.] lr_addr          
     7.12%    cprog  [unknown]             [.] 00000000           cprog                 [.] callme           
     3.71%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_2    
     3.68%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_2           
     3.67%    cprog  cprog                 [.] hw_2_2             cprog                 [.] address2         
     3.57%    cprog  cprog                 [.] hw_2_1             cprog                 [.] address1         
     3.55%    cprog  cprog                 [.] hw_1_1             cprog                 [.] symbol1          
     3.55%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_1    
     3.55%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_1           
     3.54%    cprog  cprog                 [.] sw_3_1_1           cprog                 [.] sw_3_1           
     3.54%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_1         
     3.54%    cprog  cprog                 [.] sw_4_1             cprog                 [.] ctr_addr         
     3.54%    cprog  cprog                 [.] callme             cprog                 [.] sw_3_1           
     3.52%    cprog  cprog                 [.] sw_3_1_3           cprog                 [.] sw_3_1           
     3.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_3         
     3.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] success_3_1_3    
     3.52%    cprog  cprog                 [.] sw_3_1_2           cprog                 [.] sw_3_1           
     3.52%    cprog  cprog                 [.] sw_3_1             cprog                 [.] sw_3_1_2         
     3.51%    cprog  cprog                 [.] hw_1_2             cprog                 [.] symbol2          
     3.51%    cprog  cprog                 [.] callme             cprog                 [.] hw_1_2           
     3.49%    cprog  cprog                 [.] callme             cprog                 [.] sw_4_2           
     0.22%    cprog  [unknown]             [.] 0xf7ca23f4         [unknown]             [.] 0xf7ca25d0       
     0.05%    cprog  [unknown]             [k] 00000000           cprog                 [k] callme           
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf           libc-2.11.2.so        [.] vfprintf         
     0.01%    cprog  libc-2.11.2.so        [.] vfprintf           libc-2.11.2.so        [.] strchrnul        
     0.01%    cprog  libc-2.11.2.so        [.] _IO_file_overflow  libc-2.11.2.so        [.] _IO_file_overflow
     0.01%    cprog  libc-2.11.2.so        [.] strchrnul          libc-2.11.2.so        [.] strchrnul        
     0.01%    cprog  [unknown]             [.] 00000000           libc-2.11.2.so        [.] _IO_file_overflow
     0.01%    cprog  [unknown]             [k] 00000000           cprog                 [k] sw_3_1        

(11) perf record -j any_call,cond,any_ret,ind_call -e branch-misses:u ./cprog

# Overhead  Command  Source Shared Object      Source Symbol  Target Shared Object        Target Symbol
# ........  .......  ....................  .................  ....................  ...................
#
     9.72%    cprog  [unknown]             [.] 00000000       cprog                 [.] sw_3_1         
     3.99%    cprog  cprog                 [.] ctr_addr       cprog                 [.] sw_4_1         
     3.98%    cprog  cprog                 [.] success_3_1_3  cprog                 [.] sw_3_1         
     3.98%    cprog  cprog                 [.] symbol1        cprog                 [.] hw_1_1         
     3.98%    cprog  cprog                 [.] symbol2        cprog                 [.] hw_1_2         
     3.98%    cprog  cprog                 [.] sw_4_2         cprog                 [.] lr_addr        
     3.98%    cprog  cprog                 [.] sw_4_2         cprog                 [.] callme         
     3.97%    cprog  cprog                 [.] lr_addr        cprog                 [.] sw_4_2         
     3.91%    cprog  [unknown]             [.] 00000000       cprog                 [.] callme         
     2.22%    cprog  cprog                 [.] sw_4_1         cprog                 [.] ctr_addr       
     2.22%    cprog  cprog                 [.] callme         cprog                 [.] sw_4_2         
     2.22%    cprog  cprog                 [.] hw_2_1         cprog                 [.] address1       
     2.22%    cprog  cprog                 [.] back1          cprog                 [.] callme         
     2.21%    cprog  cprog                 [.] hw_1_2         cprog                 [.] symbol2        
     2.21%    cprog  cprog                 [.] sw_3_1         cprog                 [.] callme         
     2.21%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_2         
     2.21%    cprog  cprog                 [.] sw_3_1_1       cprog                 [.] sw_3_1         
     2.21%    cprog  cprog                 [.] sw_3_1_3       cprog                 [.] sw_3_1         
     2.21%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_1       
     2.21%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_3       
     2.21%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_1         
     2.20%    cprog  cprog                 [.] hw_1_1         cprog                 [.] symbol1        
     2.20%    cprog  cprog                 [.] sw_3_1_2       cprog                 [.] sw_3_1         
     2.20%    cprog  cprog                 [.] sw_3_1         cprog                 [.] sw_3_1_2       
     2.20%    cprog  cprog                 [.] callme         cprog                 [.] hw_1_1         
     1.77%    cprog  cprog                 [.] hw_1_1         cprog                 [.] callme         
     1.77%    cprog  cprog                 [.] success_3_1_1  cprog                 [.] sw_3_1         
     1.77%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_1  
     1.77%    cprog  cprog                 [.] success_3_1_2  cprog                 [.] sw_3_1         
     1.77%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_2  
     1.77%    cprog  cprog                 [.] sw_3_1         cprog                 [.] success_3_1_3  
     1.76%    cprog  cprog                 [.] hw_1_2         cprog                 [.] callme         
     1.76%    cprog  cprog                 [.] sw_4_1         cprog                 [.] callme         
     1.76%    cprog  cprog                 [.] sw_3_2         cprog                 [.] callme         
     1.76%    cprog  cprog                 [.] callme         cprog                 [.] main           
     1.76%    cprog  cprog                 [.] callme         cprog                 [.] sw_3_2         
     1.75%    cprog  cprog                 [.] hw_2_2         cprog                 [.] address2       
     1.75%    cprog  cprog                 [.] back2          cprog                 [.] callme         
     0.13%    cprog  [unknown]             [.] 0xf7dd23e0     [unknown]             [.] 0xf7dd23c0     
     0.07%    cprog  [unknown]             [k] 00000000       cprog                 [k] callme         
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf       libc-2.11.2.so        [.] vfprintf       
     0.00%    cprog  libc-2.11.2.so        [.] vfprintf       libc-2.11.2.so        [.] _IO_file_xsputn
     0.00%    cprog  [unknown]             [.] 00000000       libc-2.11.2.so        [.] vfprintf       

Test application program
========================
(1) Makefile:
--------------------------------------------
all: sample.o cprog of.cprog of.sample

sample.o: sample.s
        as -o sample.o sample.s
cprog: cprog.c sample.o
        gcc -o cprog cprog.c sample.o
of.sample: sample.o
        objdump -d sample.o > of.sample
of.cprog: cprog
        objdump -d cprog > of.cprog
clean:
        rm sample.o cprog of.sample of.cprog
---------------------------------------------
(2) cprog.c
---------------------------------------------
#include <stdio.h>
#define LOOP_COUNT 10000

extern void callme(void);

int main(int argc, char *argv[])
{
        int i;
        for(i = 0; i < LOOP_COUNT; i++)
                callme();

        printf("end");
        return 0;
}
---------------------------------------------
(3) sample.S
---------------------------------------------
# r25, r26, r27 will be used as first level, second level
# and third level stack for LR. Register r20, r21, r22, r23
# r24 will be used for general programming purpose.

.data

msg:
	.string "BHRB filter tests\n"
	len = . - msg
msg_1_1:
	.string "Test: hw_1_1\n"
	len_1_1 = 13
msg_1_2:
	.string "Test: hw_1_2\n"
	len_1_2 = 13
msg_2_1:
	.string "Test: hw_2_1\n"
	len_2_1 = 13
msg_2_2:
	.string "Test: hw_2_2\n"
	len_2_2 = 13
msg_3_1:
	.string "Test: sw_3_1\n"
	len_3_1 = 13
msg_3_1_1:
	.string "Test: sw_3_1_1\n"
	len_3_1_1 = 15
msg_3_1_2:
	.string "Test: sw_3_1_2\n"
	len_3_1_2 = 15
msg_3_1_3:
        .string "Test: sw_3_1_3\n"
        len_3_1_3 = 15
msg_3_2:
	.string "Test: sw_3_2\n"
	len_3_3 = 13
msg_4_1:
	.string "Test: sw_4_1\n"
	len_4_1 = 13
msg_4_2:
	.string "Test: sw_4_2\n"
	len_4_2 = 13

hw_3_1_1_passed:
	.string "\thw_3_1_1_passed\n\n"
	len_hw_3_1_1_passed = 18
hw_3_1_2_passed:
	.string "\thw_3_1_2_passed\n\n"
	len_hw_3_1_2_passed = 18
hw_3_1_3_passed:
	.string "\thw_3_1_3_passed\n\n"
	len_hw_3_1_3_passed = 18

hw_2_1_passed:
	.string "\thw_2_1_passed\n\n"
	len_hw_2_1_passed = 16

hw_2_2_passed:
	.string "\thw_2_2_passed\n\n"
	len_hw_2_2_passed = 16

hw_1_1_passed:
	.string "\thw_1_1_passed\n\n"
	len_hw_1_1_passed = 16

hw_1_2_passed:
	.string "\thw_1_2_passed\n\n"
	len_hw_1_2_passed = 16

hw_4_1_passed:
	.string "\thw_4_1_passed\n\n"
	len_hw_4_1_passed = 16

hw_4_2_passed:
	.string "\thw_4_2_passed\n\n"
	len_hw_4_2_passed = 16

msg_error:
	.string "\tError\n"
	len_error = 7
.text
	.global callme
	.global hw_1_1
	.global hw_1_2
	.global hw_2_1
	.global hw_2_2

# HW filter test symbols
symbol1:
	# Print "hw_1_1_passed"
	li      0, 4
	li      3, 1
	lis     4, hw_1_1_passed@ha
	addi    4, 4, hw_1_1_passed@l
	li      5, len_hw_1_1_passed
	sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET

hw_1_1:
        # Save LR - second level
        mflr 26

	# Print "hw_1_1 called"
	li      0, 4
	li      3, 1
	lis     4, msg_1_1@ha
	addi    4, 4, msg_1_1@l
	li      5, len_1_1
	sc

	bl symbol1			# PERF_SAMPLE_BRANCH_ANY_CALL

	# Restore LR
	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

symbol2:
        # Print "Symbol2 taken"
        li      0, 4
        li      3, 1
        lis     4, hw_1_2_passed@ha
        addi    4, 4, hw_1_2_passed@l
        li      5, len_hw_1_2_passed
        sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET
hw_1_2:
	# Save LR - second level
	mflr 26

        # Print "hw_1_2 called"
        li      0, 4
        li      3, 1
        lis     4, msg_1_2@ha
        addi    4, 4, msg_1_2@l
        li      5, len_1_2
        sc

	li 4,20
	cmpi 0,4,20
	bcl 12, 4*cr0+2, symbol2	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# HW filter test

address1: 
	# Print "hw_2_1_passed"
        li      0, 4
        li      3, 1
        lis     4, hw_2_1_passed@ha
        addi    4, 4, hw_2_1_passed@l
        li      5, len_hw_2_1_passed
        sc
	b  back1			# PERF_SAMPLE_BRANCH_ANY

hw_2_1:
	# Print "hw_2_1 called"
	li      0, 4
	li      3, 1
	lis     4, msg_2_1@ha
	addi    4, 4, msg_2_1@l
	li      5, len_2_1
	sc
	
	# Simple conditional branch (equal)
	li	20, 12
	cmpi	3, 20, 12
	bc	12, 4*cr3+2, address1	# PERF_SAMPLE_BRANCH_COND

back1:
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

address2:
        # Print "hw_2_2_passed"
        li      0, 4
        li      3, 1
        lis     4, hw_2_2_passed@ha
        addi    4, 4, hw_2_2_passed@l
        li      5, len_hw_2_2_passed
        sc
        b  back2			# PERF_SAMPLE_BRANCH_ANY

hw_2_2:
        # Print "hw_2_2 called"
	li      0, 4
	li      3, 1
	lis     4, msg_2_2@ha
	addi    4, 4, msg_2_2@l
	li      5, len_2_2
	sc

	# Simple conditional branch (less than)
	li	20, 12
	cmpi	4, 20, 20
	bc	12, 4*cr4+0, address2	# PERF_SAMPLE_BRANCH_COND
back2:
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# SW filter test symbols
sw_3_1_1:
	# Print "Test: sw_3_1_1"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_1@ha
        addi    4, 4, msg_3_1_1@l
        li      5, len_3_1_1
        sc

	li	22,0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 10
	bclr	12, 2			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND

	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc
	
	# Mark the error
	li 	22, 1
	
	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_3_1_2:
        # Print "Test: sw_3_1_2"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_2@ha
        addi    4, 4, msg_3_1_2@l
        li      5, len_3_1_2
        sc

	li	23, 0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 20
	bclr	12, 0			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
        
	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc

	# Mark the error
	li 	23, 1

	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_3_1_3:
	# Print "Test: sw_3_1_3"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1_3@ha
        addi    4, 4, msg_3_1_3@l
        li      5, len_3_1_3
        sc

	li	24, 0
	# Test the condition and return
	li	21, 10
	cmpi	0, 21, 5
	bclr	12, 1			# PERF_SAMPLE_BRANCH_ANY_RET | PERF_SAMPLE_BRANCH_COND
	
	# Mark the error
	li 	24, 1

	# Should not have come here
	li      0, 4
	li      3, 1
        lis     4, msg_error@ha
        addi    4, 4, msg_error@l
        li      5, len_error
        sc

	# Safe fall back
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

success_3_1_1:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_1_passed@ha
        addi    4, 4, hw_3_1_1_passed@l
        li      5, len_hw_3_1_1_passed
        sc
	blr

success_3_1_2:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_2_passed@ha
        addi    4, 4, hw_3_1_2_passed@l
        li      5, len_hw_3_1_2_passed
        sc
	blr

success_3_1_3:
	li      0, 4
	li      3, 1
        lis     4, hw_3_1_3_passed@ha
        addi    4, 4, hw_3_1_3_passed@l
        li      5, len_hw_3_1_3_passed
        sc
	blr

sw_3_1:
	# Save LR
	mflr 26

        # Print "Test: sw_3_1"
        li      0, 4
        li      3, 1
        lis     4, msg_3_1@ha
        addi    4, 4, msg_3_1@l
        li      5, len_3_1
        sc

	# Equal comparison condition
	bl sw_3_1_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 22, 0
	bcl	12, 2, success_3_1_1	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	# LT comparison condition
	bl sw_3_1_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 23, 0
	bcl	12, 2, success_3_1_2	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	# GT comparison condition
	bl sw_3_1_3			# PERF_SAMPLE_BRANCH_ANY_CALL
	cmpi	0, 24, 0
	bcl	12, 2, success_3_1_3	# PERF_SAMPLE_BRANCH_ANY_CALL | PERF_SAMPLE_BRANCH_COND

	mtlr 26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
sw_3_2:
	# Print "Test: sw_3_2"
	li      0, 4
	li      3, 1
	lis     4, msg_3_2@ha
	addi    4, 4, msg_3_2@l
	li      5, len_3_1
	sc

	# FIXME: Anything more here ?
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

# Indirect call tests

# CTR
ctr_addr:
        # Print "bcctr taken"
        li      0, 4
        li      3, 1
        lis     4, hw_4_1_passed@ha
        addi    4, 4, hw_4_1_passed@l
        li      5, len_hw_4_1_passed
        sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET
sw_4_1:
	# Save LR
	mflr	26

	# Print "sw_4_1 called"
        li      0, 4
        li      3, 1
        lis     4, msg_4_1@ha
        addi    4, 4, msg_4_1@l
        li      5, len_4_1
        sc

	# Save address in CTR
	lis 	20, ctr_addr@ha
	addi	20, 20, ctr_addr@l
	mtctr   20


	# Compare and jump to CTR
	li 	21, 10
	cmpi	0, 21, 10
	bcctrl  12, 4*cr0+2		# PERF_SAMPLE_BRANCH_IND_CALL

	mtlr	26
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
# LR
lr_addr:
	# Print "bclrl taken"
	li      0, 4
	li      3, 1
	lis     4, hw_4_2_passed@ha
	addi    4, 4, hw_4_2_passed@l
	li      5, len_hw_4_2_passed
	sc

	blr				# PERF_SAMPLE_BRANCH_ANY_RET

sw_4_2:
	# Save LR
	mflr	26

        # Print "Test: sw_4_2"
        li      0, 4
        li      3, 1
        lis     4, msg_4_2@ha
        addi    4, 4, msg_4_2@l
        li      5, len_4_2
        sc

	# Save address in LR
	lis 	20, lr_addr@ha
	addi	20, 20, lr_addr@l
	mtlr	20


	# Compare and jump to CTR
	li 	21, 10
	cmpi	0, 21, 10
	bclrl   12, 4*cr0+2		# PERF_SAMPLE_BRANCH_IND_CALL

	# Restore LR
	mtlr	26	
	blr				# PERF_SAMPLE_BRANCH_ANY_RET

callme:
	# Save LR
	mflr	25

	# Print "Branch filter Test"
	li	0, 4
	li	3, 1
	lis 	4, msg@ha
	addi	4, 4, msg@l
	li	5, len
	sc

	# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_1_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_1_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	# PERF_SAMPLE_BRANCH_COND
	bl hw_2_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl hw_2_2			# PERF_SAMPLE_BRANCH_ANY_CALL

	# PERF_SAMPLE_BRANCH_ANY_RET
	bl sw_3_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl sw_3_2			# PERF_SAMPLE_BRANCH_ANY_CALL
	# PERF_SAMPLE_BRANCH_IND_CALL
	bl sw_4_1			# PERF_SAMPLE_BRANCH_ANY_CALL
	bl sw_4_2			# PERF_SAMPLE_BRANCH_ANY_CALL

	# Restore LR
	mtlr 25
	blr				# PERF_SAMPLE_BRANCH_ANY_RET
--------------------------------------------------------------------

Anshuman Khandual (10):
  perf: Add PERF_SAMPLE_BRANCH_COND
  powerpc, perf: Enable conditional branch filter for POWER8
  perf, tool: Conditional branch filter 'cond' added to perf record
  x86, perf: Add conditional branch filtering support
  perf, documentation: Description for conditional branch filter
  powerpc, perf: Change the name of HW PMU branch filter tracking
    variable
  powerpc, lib: Add new branch instruction analysis support functions
  powerpc, perf: Enable SW filtering in branch stack sampling framework
  power8, perf: Change BHRB branch filter configuration
  powerpc, perf: Cleanup SW branch filter list look up

 arch/powerpc/include/asm/code-patching.h     |  30 ++++
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/lib/code-patching.c             |  54 +++++-
 arch/powerpc/perf/core-book3s.c              | 260 +++++++++++++++++++++++++--
 arch/powerpc/perf/power8-pmu.c               |  75 ++++++--
 arch/x86/kernel/cpu/perf_event_intel_lbr.c   |   5 +
 include/uapi/linux/perf_event.h              |   3 +-
 tools/perf/Documentation/perf-record.txt     |   3 +-
 tools/perf/builtin-record.c                  |   1 +
 9 files changed, 404 insertions(+), 33 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH V4 01/10] perf: Add PERF_SAMPLE_BRANCH_COND
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..e2d8b8b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_ABORT_TX	= 1U << 7, /* transaction aborts */
 	PERF_SAMPLE_BRANCH_IN_TX	= 1U << 8, /* in transaction */
 	PERF_SAMPLE_BRANCH_NO_TX	= 1U << 9, /* not in transaction */
+	PERF_SAMPLE_BRANCH_COND		= 1U << 10, /* conditional branches */
 
-	PERF_SAMPLE_BRANCH_MAX		= 1U << 10, /* non-ABI */
+	PERF_SAMPLE_BRANCH_MAX		= 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 01/10] perf: Add PERF_SAMPLE_BRANCH_COND
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

POWER8 PMU based BHRB supports filtering for conditional branches.
This patch introduces new branch filter PERF_SAMPLE_BRANCH_COND which
will extend the existing perf ABI. Other architectures can provide
this functionality with either HW filtering support (if present) or
with SW filtering of instructions.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..e2d8b8b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -163,8 +163,9 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_ABORT_TX	= 1U << 7, /* transaction aborts */
 	PERF_SAMPLE_BRANCH_IN_TX	= 1U << 8, /* in transaction */
 	PERF_SAMPLE_BRANCH_NO_TX	= 1U << 9, /* not in transaction */
+	PERF_SAMPLE_BRANCH_COND		= 1U << 10, /* conditional branches */
 
-	PERF_SAMPLE_BRANCH_MAX		= 1U << 10, /* non-ABI */
+	PERF_SAMPLE_BRANCH_MAX		= 1U << 11, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 02/10] powerpc, perf: Enable conditional branch filter for POWER8
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

Enables conditional branch filter support for POWER8
utilizing MMCRA register based filter and also invalidates
any BHRB branch filter combination.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index a3f7abd..e88b9cb 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -586,6 +586,16 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 		return pmu_bhrb_filter;
 	}
 
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		return pmu_bhrb_filter;
+	}
+
+	/* PMU does not support ANY combination of HW BHRB filters */
+	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
+			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
+		return -1;
+
 	/* Every thing else is unsupported */
 	return -1;
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 02/10] powerpc, perf: Enable conditional branch filter for POWER8
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Enables conditional branch filter support for POWER8
utilizing MMCRA register based filter and also invalidates
any BHRB branch filter combination.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index a3f7abd..e88b9cb 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -586,6 +586,16 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 		return pmu_bhrb_filter;
 	}
 
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		return pmu_bhrb_filter;
+	}
+
+	/* PMU does not support ANY combination of HW BHRB filters */
+	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
+			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
+		return -1;
+
 	/* Every thing else is unsupported */
 	return -1;
 }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7c8020a..34040f7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
 	BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
 	BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
 	BRANCH_END
 };
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Adding perf record support for new branch stack filter criteria
PERF_SAMPLE_BRANCH_COND.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-record.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7c8020a..34040f7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -583,6 +583,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORT_TX),
 	BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_IN_TX),
 	BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NO_TX),
+	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
 	BRANCH_END
 };
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 04/10] x86, perf: Add conditional branch filtering support
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 	if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
 		mask |= X86_BR_NO_TX;
 
+	if (br_type & PERF_SAMPLE_BRANCH_COND)
+		mask |= X86_BR_JCC;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 	 */
 	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
 					| LBR_FAR,
 	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 04/10] x86, perf: Add conditional branch filtering support
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch adds conditional branch filtering support,
enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
stack sampling framework by utilizing an available
software filter X86_BR_JCC.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..9dd2459 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 	if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
 		mask |= X86_BR_NO_TX;
 
+	if (br_type & PERF_SAMPLE_BRANCH_COND)
+		mask |= X86_BR_JCC;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 	 */
 	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
 };
 
 static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
@@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
 					| LBR_FAR,
 	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
 };
 
 /* core */
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 05/10] perf, documentation: Description for conditional branch filter
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 43b42c4..5ecc405 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -183,9 +183,10 @@ following filters are defined:
 	- in_tx: only when the target is in a hardware transaction
 	- no_tx: only when the target is not in a hardware transaction
 	- abort_tx: only when the target is a hardware transaction abort
+	- cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, ind_call.
+The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
 levels are subject to permissions.  When sampling on multiple events, branch stack sampling
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 05/10] perf, documentation: Description for conditional branch filter
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Adding documentation support for conditional branch filter.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-record.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 43b42c4..5ecc405 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -183,9 +183,10 @@ following filters are defined:
 	- in_tx: only when the target is in a hardware transaction
 	- no_tx: only when the target is not in a hardware transaction
 	- abort_tx: only when the target is a hardware transaction abort
+	- cond: conditional branches
 
 +
-The option requires at least one branch type among any, any_call, any_ret, ind_call.
+The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
 The privilege levels may be omitted, in which case, the privilege levels of the associated
 event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
 levels are subject to permissions.  When sampling on multiple events, branch stack sampling
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

This patch simply changes the name of the variable from "bhrb_filter" to
"bhrb_hw_filter" in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 29b89e8..2de7d48 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
 	int n_txn_start;
 
 	/* BHRB bits */
-	u64				bhrb_filter;	/* BHRB HW branch filter */
+	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -1159,7 +1159,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
  out:
 	if (cpuhw->bhrb_users)
-		ppmu->config_bhrb(cpuhw->bhrb_filter);
+		ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
 	local_irq_restore(flags);
 }
@@ -1254,7 +1254,7 @@ nocheck:
  out:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
 					event->attr.branch_sample_type);
 	}
 
@@ -1637,10 +1637,10 @@ static int power_pmu_event_init(struct perf_event *event)
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
 					event->attr.branch_sample_type);
 
-		if(cpuhw->bhrb_filter == -1)
+		if(cpuhw->bhrb_hw_filter == -1)
 			return -EOPNOTSUPP;
 	}
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch simply changes the name of the variable from "bhrb_filter" to
"bhrb_hw_filter" in order to add one more variable which will track SW
filters in generic powerpc book3s code which will be implemented in the
subsequent patch.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 29b89e8..2de7d48 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -47,7 +47,7 @@ struct cpu_hw_events {
 	int n_txn_start;
 
 	/* BHRB bits */
-	u64				bhrb_filter;	/* BHRB HW branch filter */
+	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -1159,7 +1159,7 @@ static void power_pmu_enable(struct pmu *pmu)
 
  out:
 	if (cpuhw->bhrb_users)
-		ppmu->config_bhrb(cpuhw->bhrb_filter);
+		ppmu->config_bhrb(cpuhw->bhrb_hw_filter);
 
 	local_irq_restore(flags);
 }
@@ -1254,7 +1254,7 @@ nocheck:
  out:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
 					event->attr.branch_sample_type);
 	}
 
@@ -1637,10 +1637,10 @@ static int power_pmu_event_init(struct perf_event *event)
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
 					event->attr.branch_sample_type);
 
-		if(cpuhw->bhrb_filter == -1)
+		if(cpuhw->bhrb_hw_filter == -1)
 			return -EOPNOTSUPP;
 	}
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

Generic powerpc branch instruction analysis support added in the code
patching library which will help the subsequent patch on SW based
filtering of branch records in perf. This patch also converts and
exports some of the existing local static functions through the header
file to be used else where.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h | 30 ++++++++++++++++++
 arch/powerpc/lib/code-patching.c         | 54 ++++++++++++++++++++++++++++++--
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..8bab417 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,36 @@
 #define BRANCH_SET_LINK	0x1
 #define BRANCH_ABSOLUTE	0x2
 
+#define XL_FORM_LR  0x4C000020
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS    0x02800000
+#define BO_CTR       0x02000000
+#define BO_CRBI_OFF  0x00800000
+#define BO_CRBI_ON   0x01800000
+#define BO_CRBI_HINT 0x00400000
+
+/* Forms of branch instruction */
+int instr_is_branch_iform(unsigned int instr);
+int instr_is_branch_bform(unsigned int instr);
+int instr_is_branch_xlform(unsigned int instr);
+
+/* Classification of XL-form instruction */
+int is_xlform_lr(unsigned int instr);
+int is_xlform_ctr(unsigned int instr);
+int is_xlform_tar(unsigned int instr);
+
+/* Branch instruction is a call */
+int is_branch_link_set(unsigned int instr);
+
+/* BO field analysis (B-form or XL-form) */
+int is_bo_always(unsigned int instr);
+int is_bo_ctr(unsigned int instr);
+int is_bo_crbi_off(unsigned int instr);
+int is_bo_crbi_on(unsigned int instr);
+int is_bo_crbi_hint(unsigned int instr);
+
 unsigned int create_branch(const unsigned int *addr,
 			   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..cb62bd8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,16 +77,66 @@ static unsigned int branch_opcode(unsigned int instr)
 	return (instr >> 26) & 0x3F;
 }
 
-static int instr_is_branch_iform(unsigned int instr)
+int instr_is_branch_iform(unsigned int instr)
 {
 	return branch_opcode(instr) == 18;
 }
 
-static int instr_is_branch_bform(unsigned int instr)
+int instr_is_branch_bform(unsigned int instr)
 {
 	return branch_opcode(instr) == 16;
 }
 
+int instr_is_branch_xlform(unsigned int instr)
+{
+	return branch_opcode(instr) == 19;
+}
+
+int is_xlform_lr(unsigned int instr)
+{
+	return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+int is_xlform_ctr(unsigned int instr)
+{
+	return (instr & XL_FORM_CTR) == XL_FORM_CTR;
+}
+
+int is_xlform_tar(unsigned int instr)
+{
+	return (instr & XL_FORM_TAR) == XL_FORM_TAR;
+}
+
+int is_branch_link_set(unsigned int instr)
+{
+	return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+int is_bo_always(unsigned int instr)
+{
+	return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+int is_bo_ctr(unsigned int instr)
+{
+	return (instr & BO_CTR) == BO_CTR;
+}
+
+int is_bo_crbi_off(unsigned int instr)
+{
+	return (instr & BO_CRBI_OFF) == BO_CRBI_OFF;
+}
+
+int is_bo_crbi_on(unsigned int instr)
+{
+	return (instr & BO_CRBI_ON) == BO_CRBI_ON;
+}
+
+int is_bo_crbi_hint(unsigned int instr)
+{
+	return (instr & BO_CRBI_HINT) == BO_CRBI_HINT;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
 	if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Generic powerpc branch instruction analysis support added in the code
patching library which will help the subsequent patch on SW based
filtering of branch records in perf. This patch also converts and
exports some of the existing local static functions through the header
file to be used else where.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h | 30 ++++++++++++++++++
 arch/powerpc/lib/code-patching.c         | 54 ++++++++++++++++++++++++++++++--
 2 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..8bab417 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,36 @@
 #define BRANCH_SET_LINK	0x1
 #define BRANCH_ABSOLUTE	0x2
 
+#define XL_FORM_LR  0x4C000020
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS    0x02800000
+#define BO_CTR       0x02000000
+#define BO_CRBI_OFF  0x00800000
+#define BO_CRBI_ON   0x01800000
+#define BO_CRBI_HINT 0x00400000
+
+/* Forms of branch instruction */
+int instr_is_branch_iform(unsigned int instr);
+int instr_is_branch_bform(unsigned int instr);
+int instr_is_branch_xlform(unsigned int instr);
+
+/* Classification of XL-form instruction */
+int is_xlform_lr(unsigned int instr);
+int is_xlform_ctr(unsigned int instr);
+int is_xlform_tar(unsigned int instr);
+
+/* Branch instruction is a call */
+int is_branch_link_set(unsigned int instr);
+
+/* BO field analysis (B-form or XL-form) */
+int is_bo_always(unsigned int instr);
+int is_bo_ctr(unsigned int instr);
+int is_bo_crbi_off(unsigned int instr);
+int is_bo_crbi_on(unsigned int instr);
+int is_bo_crbi_hint(unsigned int instr);
+
 unsigned int create_branch(const unsigned int *addr,
 			   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..cb62bd8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,16 +77,66 @@ static unsigned int branch_opcode(unsigned int instr)
 	return (instr >> 26) & 0x3F;
 }
 
-static int instr_is_branch_iform(unsigned int instr)
+int instr_is_branch_iform(unsigned int instr)
 {
 	return branch_opcode(instr) == 18;
 }
 
-static int instr_is_branch_bform(unsigned int instr)
+int instr_is_branch_bform(unsigned int instr)
 {
 	return branch_opcode(instr) == 16;
 }
 
+int instr_is_branch_xlform(unsigned int instr)
+{
+	return branch_opcode(instr) == 19;
+}
+
+int is_xlform_lr(unsigned int instr)
+{
+	return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+int is_xlform_ctr(unsigned int instr)
+{
+	return (instr & XL_FORM_CTR) == XL_FORM_CTR;
+}
+
+int is_xlform_tar(unsigned int instr)
+{
+	return (instr & XL_FORM_TAR) == XL_FORM_TAR;
+}
+
+int is_branch_link_set(unsigned int instr)
+{
+	return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+int is_bo_always(unsigned int instr)
+{
+	return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+int is_bo_ctr(unsigned int instr)
+{
+	return (instr & BO_CTR) == BO_CTR;
+}
+
+int is_bo_crbi_off(unsigned int instr)
+{
+	return (instr & BO_CRBI_OFF) == BO_CRBI_OFF;
+}
+
+int is_bo_crbi_on(unsigned int instr)
+{
+	return (instr & BO_CRBI_ON) == BO_CRBI_ON;
+}
+
+int is_bo_crbi_hint(unsigned int instr)
+{
+	return (instr & BO_CRBI_HINT) == BO_CRBI_HINT;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
 	if (instr & BRANCH_ABSOLUTE)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

	Introduced two new variables track various filter values and mask

	(a) bhrb_sw_filter	Tracks SW implemented branch filter flags
	(b) filter_mask		Tracks both (SW and HW) branch filter flags

(2) Event creation

	Kernel will figure out supported BHRB branch filters through a PMU call
	back 'bhrb_filter_map'. This function will find out how many of the
	requested branch filters can be supported in the PMU HW. It will not
	try to invalidate any branch filter combinations. Event creation will not
	error out because of lack of HW based branch filters. Meanwhile it will
	track the overall supported branch filters in the "filter_mask" variable.

	Once the PMU call back returns kernel will process the user branch filter
	request against available SW filters while looking at the "filter_mask".
	During this phase all the branch filters which are still pending from the
	user requested list will have to be supported in SW failing which the
	event creation will error out.

(3) SW branch filter

	During the BHRB data capture inside the PMU interrupt context, each
	of the captured 'perf_branch_entry.from' will be checked for compliance
	with applicable SW branch filters. If the entry does not conform to the
	filter requirements, it will be discarded from the final perf branch
	stack buffer.

(4) Supported SW based branch filters

	(a) PERF_SAMPLE_BRANCH_ANY_RETURN
	(b) PERF_SAMPLE_BRANCH_IND_CALL
	(c) PERF_SAMPLE_BRANCH_ANY_CALL
	(d) PERF_SAMPLE_BRANCH_COND

	Please refer patch to understand the classification of instructions into
	these branch filter categories.

(5) Multiple branch filter semantics

	Book3 sever implementation follows the same OR semantics (as implemented in
	x86) while dealing with multiple branch filters at any point of time. SW
	branch filter analysis is carried on the data set captured in the PMU HW.
	So the resulting set of data (after applying the SW filters) will inherently
	be an AND with the HW captured set. Hence any combination of HW and SW branch
	filters will be invalid. HW based branch filters are more efficient and faster
	compared to SW implemented branch filters. So at first the PMU should decide
	whether it can support all the requested branch filters itself or not. In case
	it can support all the branch filters in an OR manner, we dont apply any SW
	branch filter on top of the HW captured set (which is the final set). This
	preserves the OR semantic of multiple branch filters as required. But in case
	where the PMU cannot support all the requested branch filters in an OR manner,
	it should not apply any it's filters and leave it upto the SW to handle them
	all. Its the PMU code's responsibility to uphold this protocol to be able to
	conform to the overall OR semantic of perf branch stack sampling framework.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c              | 266 ++++++++++++++++++++++++++-
 arch/powerpc/perf/power8-pmu.c               |   2 +-
 3 files changed, 262 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3fd2f1b..846d710 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -18,6 +18,10 @@
 #define MAX_EVENT_ALTERNATIVES	8
 #define MAX_LIMITED_HWCOUNTERS	2
 
+#define for_each_branch_sample_type(x) \
+        for ((x) = PERF_SAMPLE_BRANCH_USER; \
+             (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -34,7 +38,7 @@ struct power_pmu {
 				unsigned long *valp);
 	int		(*get_alternatives)(u64 event_id, unsigned int flags,
 				u64 alt[]);
-	u64             (*bhrb_filter_map)(u64 branch_sample_type);
+	u64             (*bhrb_filter_map)(u64 branch_sample_type, u64 *filter_mask);
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 2de7d48..54d39a5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -48,6 +48,8 @@ struct cpu_hw_events {
 
 	/* BHRB bits */
 	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
+	u64				bhrb_sw_filter;	/* BHRB SW branch filter */
+	u64				filter_mask;	/* Branch filter mask */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -400,6 +402,228 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	return target - (unsigned long)&instr + addr;
 }
 
+/*
+ * Instruction opcode analysis
+ *
+ * Analyse instruction opcodes and classify them
+ * into various branch filter options available.
+ * This follows the standard semantics of OR which
+ * means that instructions which conforms to `any`
+ * of the requested branch filters get picked up.
+ */
+static bool validate_instruction(unsigned int *addr, u64 bhrb_sw_filter)
+{
+	bool result = false;
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+
+		/* XL-form instruction */
+		if (instr_is_branch_xlform(*addr)) {
+
+			/* LR should not be set */
+			if (!is_branch_link_set(*addr)) {
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to LR register.
+			 	 */
+				if (is_xlform_lr(*addr))
+					result = true;
+			}
+		}
+	}
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
+		/* XL-form instruction */
+		if (instr_is_branch_xlform(*addr)) {
+
+			/* LR should be set */
+			if (is_branch_link_set(*addr)) {
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to CTR.
+			 	 */
+				if (is_xlform_ctr(*addr))
+					result = true;
+
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to LR.
+			 	 */
+				if (is_xlform_lr(*addr))
+					result = true;
+
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to TAR.
+			 	 */
+				if (is_xlform_tar(*addr))
+					result = true;
+			}
+		}
+	}
+
+	/* Any-form branch */
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		/* LR should be set */
+		if (is_branch_link_set(*addr))
+			result = true;
+	}
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
+
+		/* I-form instruction - excluded */
+		if (instr_is_branch_iform(*addr))
+			goto out;
+
+		/* B-form or XL-form instruction */
+		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
+
+			/* Not branch always  */
+			if (!is_bo_always(*addr)) {
+
+				/* Conditional branch to CTR register */
+				if (is_bo_ctr(*addr))
+					goto out;
+
+				/* CR[BI] conditional branch with static hint */
+				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
+					if (is_bo_crbi_hint(*addr))
+						goto out;
+				}
+
+				result = true;
+			}
+		}
+	}
+out:
+	return result;
+}
+
+static bool check_instruction(u64 addr, u64 bhrb_sw_filter)
+{
+	unsigned int instr;
+	bool ret;
+
+	if (bhrb_sw_filter == 0)
+		return true;
+
+	if (is_kernel_addr(addr)) {
+		ret = validate_instruction((unsigned int *) addr, bhrb_sw_filter);
+	} else {
+		/*
+		 * Userspace address needs to be
+		 * copied first before analysis.
+		 */
+		pagefault_disable();
+		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
+
+		/*
+		 * If the instruction could not be accessible
+		 * from user space, we still 'okay' the entry.
+		 */
+		if (ret) {
+			pagefault_enable();
+			return true;
+		}
+		pagefault_enable();
+		ret = validate_instruction(&instr, bhrb_sw_filter);
+	}
+	return ret;
+}
+
+/*
+ * Validate whether all requested branch filters
+ * are getting processed either in the PMU or in SW.
+ */
+static int match_filters(u64 branch_sample_type, u64 filter_mask)
+{
+	u64 x;
+
+	if (filter_mask == PERF_SAMPLE_BRANCH_ANY)
+		return true;
+
+	for_each_branch_sample_type(x) {
+		if (!(branch_sample_type & x))
+			continue;
+		/*
+		 * Privilege filter requests have been already
+		 * taken care during the base PMU configuration.
+		 */
+		if (x == PERF_SAMPLE_BRANCH_USER)
+			continue;
+		if (x == PERF_SAMPLE_BRANCH_KERNEL)
+			continue;
+		if (x == PERF_SAMPLE_BRANCH_HV)
+			continue;
+
+		/*
+		 * Requested filter not available either
+		 * in PMU or in SW.
+		 */
+		if (!(filter_mask & x))
+			return false;
+	}
+	return true;
+}
+
+/*
+ * Required SW based branch filters
+ *
+ * This is called after figuring out what all branch filters the
+ * PMU HW supports for the requested branch filter set. Here we
+ * will go through all the SW implemented branch filters one by
+ * one and pick them up if its not already supported in the PMU.
+ */
+static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
+			     					u64 *filter_mask)
+{
+	u64 branch_sw_filter = 0;
+
+	/* No branch filter requested */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		WARN_ON(pmu_bhrb_filter != 0);
+		WARN_ON(*filter_mask != PERF_SAMPLE_BRANCH_ANY);
+		return branch_sw_filter;
+	}
+
+	/*
+	 * PMU supported branch filters must also be implemented in SW
+	 * in the event when the PMU is unable to process them for some
+	 * reason. This all those branch filters can be satisfied with
+	 * SW implemented filters. But right now, there is now way to
+	 * initimate the user about this decision.
+	 */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
+			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
+		}
+	}
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
+			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
+		}
+	}
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
+			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
+		}
+	}
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
+			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+		}
+	}
+
+	return branch_sw_filter;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
@@ -459,17 +683,29 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 					addr = 0;
 				}
 				cpuhw->bhrb_entries[u_index].from = addr;
+
+				if (!check_instruction(cpuhw->
+						bhrb_entries[u_index].from,
+							cpuhw->bhrb_sw_filter))
+					u_index--;
 			} else {
 				/* Branches to immediate field 
 				   (ie I or B form) */
 				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to =
-					power_pmu_bhrb_to(addr);
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
+				if (check_instruction(cpuhw->
+						bhrb_entries[u_index].from,
+						cpuhw->bhrb_sw_filter)) {
+					cpuhw->bhrb_entries[u_index].
+						to = power_pmu_bhrb_to(addr);
+					cpuhw->bhrb_entries[u_index].
+						mispred = pred;
+					cpuhw->bhrb_entries[u_index].
+						predicted = ~pred;
+				} else {
+					u_index--;
+				}
 			}
 			u_index++;
-
 		}
 	}
 	cpuhw->bhrb_stack.nr = u_index;
@@ -1255,7 +1491,11 @@ nocheck:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
 		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
+					event->attr.branch_sample_type,
+					&cpuhw->filter_mask);
+		cpuhw->bhrb_sw_filter = branch_filter_map
+					(event->attr.branch_sample_type,
+					cpuhw->bhrb_hw_filter, &cpuhw->filter_mask);
 	}
 
 	perf_pmu_enable(event->pmu);
@@ -1637,10 +1877,16 @@ static int power_pmu_event_init(struct perf_event *event)
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
-
-		if(cpuhw->bhrb_hw_filter == -1)
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
+				(event->attr.branch_sample_type,
+				&cpuhw->filter_mask);
+		cpuhw->bhrb_sw_filter = branch_filter_map
+				(event->attr.branch_sample_type,
+				cpuhw->bhrb_hw_filter,
+				&cpuhw->filter_mask);
+
+		if(!match_filters(event->attr.branch_sample_type,
+						cpuhw->filter_mask))
 			return -EOPNOTSUPP;
 	}
 
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index e88b9cb..03c5b8d 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -559,7 +559,7 @@ static int power8_generic_events[] = {
 	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
 };
 
-static u64 power8_bhrb_filter_map(u64 branch_sample_type)
+static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
 	u64 pmu_bhrb_filter = 0;
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch enables SW based post processing of BHRB captured branches
to be able to meet more user defined branch filtration criteria in perf
branch stack sampling framework. These changes increase the number of
branch filters and their valid combinations on any powerpc64 server
platform with BHRB support. Find the summary of code changes here.

(1) struct cpu_hw_events

	Introduced two new variables track various filter values and mask

	(a) bhrb_sw_filter	Tracks SW implemented branch filter flags
	(b) filter_mask		Tracks both (SW and HW) branch filter flags

(2) Event creation

	Kernel will figure out supported BHRB branch filters through a PMU call
	back 'bhrb_filter_map'. This function will find out how many of the
	requested branch filters can be supported in the PMU HW. It will not
	try to invalidate any branch filter combinations. Event creation will not
	error out because of lack of HW based branch filters. Meanwhile it will
	track the overall supported branch filters in the "filter_mask" variable.

	Once the PMU call back returns kernel will process the user branch filter
	request against available SW filters while looking at the "filter_mask".
	During this phase all the branch filters which are still pending from the
	user requested list will have to be supported in SW failing which the
	event creation will error out.

(3) SW branch filter

	During the BHRB data capture inside the PMU interrupt context, each
	of the captured 'perf_branch_entry.from' will be checked for compliance
	with applicable SW branch filters. If the entry does not conform to the
	filter requirements, it will be discarded from the final perf branch
	stack buffer.

(4) Supported SW based branch filters

	(a) PERF_SAMPLE_BRANCH_ANY_RETURN
	(b) PERF_SAMPLE_BRANCH_IND_CALL
	(c) PERF_SAMPLE_BRANCH_ANY_CALL
	(d) PERF_SAMPLE_BRANCH_COND

	Please refer patch to understand the classification of instructions into
	these branch filter categories.

(5) Multiple branch filter semantics

	Book3 sever implementation follows the same OR semantics (as implemented in
	x86) while dealing with multiple branch filters at any point of time. SW
	branch filter analysis is carried on the data set captured in the PMU HW.
	So the resulting set of data (after applying the SW filters) will inherently
	be an AND with the HW captured set. Hence any combination of HW and SW branch
	filters will be invalid. HW based branch filters are more efficient and faster
	compared to SW implemented branch filters. So at first the PMU should decide
	whether it can support all the requested branch filters itself or not. In case
	it can support all the branch filters in an OR manner, we dont apply any SW
	branch filter on top of the HW captured set (which is the final set). This
	preserves the OR semantic of multiple branch filters as required. But in case
	where the PMU cannot support all the requested branch filters in an OR manner,
	it should not apply any it's filters and leave it upto the SW to handle them
	all. Its the PMU code's responsibility to uphold this protocol to be able to
	conform to the overall OR semantic of perf branch stack sampling framework.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |   6 +-
 arch/powerpc/perf/core-book3s.c              | 266 ++++++++++++++++++++++++++-
 arch/powerpc/perf/power8-pmu.c               |   2 +-
 3 files changed, 262 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3fd2f1b..846d710 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -18,6 +18,10 @@
 #define MAX_EVENT_ALTERNATIVES	8
 #define MAX_LIMITED_HWCOUNTERS	2
 
+#define for_each_branch_sample_type(x) \
+        for ((x) = PERF_SAMPLE_BRANCH_USER; \
+             (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
+
 /*
  * This struct provides the constants and functions needed to
  * describe the PMU on a particular POWER-family CPU.
@@ -34,7 +38,7 @@ struct power_pmu {
 				unsigned long *valp);
 	int		(*get_alternatives)(u64 event_id, unsigned int flags,
 				u64 alt[]);
-	u64             (*bhrb_filter_map)(u64 branch_sample_type);
+	u64             (*bhrb_filter_map)(u64 branch_sample_type, u64 *filter_mask);
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 2de7d48..54d39a5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -48,6 +48,8 @@ struct cpu_hw_events {
 
 	/* BHRB bits */
 	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
+	u64				bhrb_sw_filter;	/* BHRB SW branch filter */
+	u64				filter_mask;	/* Branch filter mask */
 	int				bhrb_users;
 	void				*bhrb_context;
 	struct	perf_branch_stack	bhrb_stack;
@@ -400,6 +402,228 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	return target - (unsigned long)&instr + addr;
 }
 
+/*
+ * Instruction opcode analysis
+ *
+ * Analyse instruction opcodes and classify them
+ * into various branch filter options available.
+ * This follows the standard semantics of OR which
+ * means that instructions which conforms to `any`
+ * of the requested branch filters get picked up.
+ */
+static bool validate_instruction(unsigned int *addr, u64 bhrb_sw_filter)
+{
+	bool result = false;
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+
+		/* XL-form instruction */
+		if (instr_is_branch_xlform(*addr)) {
+
+			/* LR should not be set */
+			if (!is_branch_link_set(*addr)) {
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to LR register.
+			 	 */
+				if (is_xlform_lr(*addr))
+					result = true;
+			}
+		}
+	}
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
+		/* XL-form instruction */
+		if (instr_is_branch_xlform(*addr)) {
+
+			/* LR should be set */
+			if (is_branch_link_set(*addr)) {
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to CTR.
+			 	 */
+				if (is_xlform_ctr(*addr))
+					result = true;
+
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to LR.
+			 	 */
+				if (is_xlform_lr(*addr))
+					result = true;
+
+				/*
+			 	 * Conditional and unconditional
+			 	 * branch to TAR.
+			 	 */
+				if (is_xlform_tar(*addr))
+					result = true;
+			}
+		}
+	}
+
+	/* Any-form branch */
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		/* LR should be set */
+		if (is_branch_link_set(*addr))
+			result = true;
+	}
+
+	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
+
+		/* I-form instruction - excluded */
+		if (instr_is_branch_iform(*addr))
+			goto out;
+
+		/* B-form or XL-form instruction */
+		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
+
+			/* Not branch always  */
+			if (!is_bo_always(*addr)) {
+
+				/* Conditional branch to CTR register */
+				if (is_bo_ctr(*addr))
+					goto out;
+
+				/* CR[BI] conditional branch with static hint */
+				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
+					if (is_bo_crbi_hint(*addr))
+						goto out;
+				}
+
+				result = true;
+			}
+		}
+	}
+out:
+	return result;
+}
+
+static bool check_instruction(u64 addr, u64 bhrb_sw_filter)
+{
+	unsigned int instr;
+	bool ret;
+
+	if (bhrb_sw_filter == 0)
+		return true;
+
+	if (is_kernel_addr(addr)) {
+		ret = validate_instruction((unsigned int *) addr, bhrb_sw_filter);
+	} else {
+		/*
+		 * Userspace address needs to be
+		 * copied first before analysis.
+		 */
+		pagefault_disable();
+		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
+
+		/*
+		 * If the instruction could not be accessible
+		 * from user space, we still 'okay' the entry.
+		 */
+		if (ret) {
+			pagefault_enable();
+			return true;
+		}
+		pagefault_enable();
+		ret = validate_instruction(&instr, bhrb_sw_filter);
+	}
+	return ret;
+}
+
+/*
+ * Validate whether all requested branch filters
+ * are getting processed either in the PMU or in SW.
+ */
+static int match_filters(u64 branch_sample_type, u64 filter_mask)
+{
+	u64 x;
+
+	if (filter_mask == PERF_SAMPLE_BRANCH_ANY)
+		return true;
+
+	for_each_branch_sample_type(x) {
+		if (!(branch_sample_type & x))
+			continue;
+		/*
+		 * Privilege filter requests have been already
+		 * taken care during the base PMU configuration.
+		 */
+		if (x == PERF_SAMPLE_BRANCH_USER)
+			continue;
+		if (x == PERF_SAMPLE_BRANCH_KERNEL)
+			continue;
+		if (x == PERF_SAMPLE_BRANCH_HV)
+			continue;
+
+		/*
+		 * Requested filter not available either
+		 * in PMU or in SW.
+		 */
+		if (!(filter_mask & x))
+			return false;
+	}
+	return true;
+}
+
+/*
+ * Required SW based branch filters
+ *
+ * This is called after figuring out what all branch filters the
+ * PMU HW supports for the requested branch filter set. Here we
+ * will go through all the SW implemented branch filters one by
+ * one and pick them up if its not already supported in the PMU.
+ */
+static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
+			     					u64 *filter_mask)
+{
+	u64 branch_sw_filter = 0;
+
+	/* No branch filter requested */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		WARN_ON(pmu_bhrb_filter != 0);
+		WARN_ON(*filter_mask != PERF_SAMPLE_BRANCH_ANY);
+		return branch_sw_filter;
+	}
+
+	/*
+	 * PMU supported branch filters must also be implemented in SW
+	 * in the event when the PMU is unable to process them for some
+	 * reason. This all those branch filters can be satisfied with
+	 * SW implemented filters. But right now, there is now way to
+	 * initimate the user about this decision.
+	 */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
+			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
+		}
+	}
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
+			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
+		}
+	}
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
+			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
+		}
+	}
+
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
+		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
+			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
+			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+		}
+	}
+
+	return branch_sw_filter;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
@@ -459,17 +683,29 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 					addr = 0;
 				}
 				cpuhw->bhrb_entries[u_index].from = addr;
+
+				if (!check_instruction(cpuhw->
+						bhrb_entries[u_index].from,
+							cpuhw->bhrb_sw_filter))
+					u_index--;
 			} else {
 				/* Branches to immediate field 
 				   (ie I or B form) */
 				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to =
-					power_pmu_bhrb_to(addr);
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
+				if (check_instruction(cpuhw->
+						bhrb_entries[u_index].from,
+						cpuhw->bhrb_sw_filter)) {
+					cpuhw->bhrb_entries[u_index].
+						to = power_pmu_bhrb_to(addr);
+					cpuhw->bhrb_entries[u_index].
+						mispred = pred;
+					cpuhw->bhrb_entries[u_index].
+						predicted = ~pred;
+				} else {
+					u_index--;
+				}
 			}
 			u_index++;
-
 		}
 	}
 	cpuhw->bhrb_stack.nr = u_index;
@@ -1255,7 +1491,11 @@ nocheck:
 	if (has_branch_stack(event)) {
 		power_pmu_bhrb_enable(event);
 		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
+					event->attr.branch_sample_type,
+					&cpuhw->filter_mask);
+		cpuhw->bhrb_sw_filter = branch_filter_map
+					(event->attr.branch_sample_type,
+					cpuhw->bhrb_hw_filter, &cpuhw->filter_mask);
 	}
 
 	perf_pmu_enable(event->pmu);
@@ -1637,10 +1877,16 @@ static int power_pmu_event_init(struct perf_event *event)
 	err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
 	if (has_branch_stack(event)) {
-		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
-					event->attr.branch_sample_type);
-
-		if(cpuhw->bhrb_hw_filter == -1)
+		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
+				(event->attr.branch_sample_type,
+				&cpuhw->filter_mask);
+		cpuhw->bhrb_sw_filter = branch_filter_map
+				(event->attr.branch_sample_type,
+				cpuhw->bhrb_hw_filter,
+				&cpuhw->filter_mask);
+
+		if(!match_filters(event->attr.branch_sample_type,
+						cpuhw->filter_mask))
 			return -EOPNOTSUPP;
 	}
 
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index e88b9cb..03c5b8d 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -559,7 +559,7 @@ static int power8_generic_events[] = {
 	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
 };
 
-static u64 power8_bhrb_filter_map(u64 branch_sample_type)
+static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
 	u64 pmu_bhrb_filter = 0;
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU does support 3 branch filters (out of which two are getting used in
perf branch stack) which are mutually exclussive and cannot be ORed with each
other. This implies that PMU can only handle one HW based branch filter request
at any point of time. For all other combinations PMU will pass it on to the SW.

Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND
can now be handled in SW, hence we dont error them out anymore.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 73 +++++++++++++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 03c5b8d..6021349 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -561,7 +561,56 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
-	u64 pmu_bhrb_filter = 0;
+	u64 x, tmp, pmu_bhrb_filter = 0;
+	*filter_mask = 0;
+
+	/* No branch filter requested */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		*filter_mask = PERF_SAMPLE_BRANCH_ANY;
+		return pmu_bhrb_filter;
+	}
+
+	/*
+	 * P8 does not support oring of PMU HW branch filters. Hence
+	 * if multiple branch filters are requested which includes filters
+	 * supported in PMU, still go ahead and clear the PMU based HW branch
+	 * filter component as in this case all the filters will be processed
+ 	 * in SW.
+	 */
+	tmp = branch_sample_type;
+
+	/* Remove privilege filters before comparison */
+	tmp &= ~PERF_SAMPLE_BRANCH_USER;
+	tmp &= ~PERF_SAMPLE_BRANCH_KERNEL;
+	tmp &= ~PERF_SAMPLE_BRANCH_HV;
+
+	for_each_branch_sample_type(x) {
+		/* Ignore privilege requests */
+		if ((x == PERF_SAMPLE_BRANCH_USER) || (x == PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV))
+			continue;
+
+		if (!(tmp & x))
+			continue;
+
+               /* Supported HW PMU filters */
+		if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) {
+			tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+			if (tmp) {
+				pmu_bhrb_filter = 0;
+				*filter_mask = 0;
+				return pmu_bhrb_filter;
+			}
+		}
+
+		if (tmp & PERF_SAMPLE_BRANCH_COND) {
+			tmp &= ~PERF_SAMPLE_BRANCH_COND;
+			if (tmp) {
+				pmu_bhrb_filter = 0;
+				*filter_mask = 0;
+				return pmu_bhrb_filter;
+			}
+		}
+	}
 
 	/* BHRB and regular PMU events share the same privilege state
 	 * filter configuration. BHRB is always recorded along with a
@@ -570,34 +619,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 	 * PMU event, we ignore any separate BHRB specific request.
 	 */
 
-	/* No branch filter requested */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-		return pmu_bhrb_filter;
-
-	/* Invalid branch filter options - HW does not support */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-		return -1;
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-		return -1;
-
+	/* Supported individual branch filters */
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+		*filter_mask    |= PERF_SAMPLE_BRANCH_ANY_CALL;
 		return pmu_bhrb_filter;
 	}
 
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		*filter_mask    |= PERF_SAMPLE_BRANCH_COND;
 		return pmu_bhrb_filter;
 	}
 
-	/* PMU does not support ANY combination of HW BHRB filters */
-	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
-			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
-		return -1;
-
-	/* Every thing else is unsupported */
-	return -1;
+	return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

Powerpc kernel now supports SW based branch filters for book3s systems with some
specifc requirements while dealing with HW supported branch filters in order to
achieve overall OR semantics prevailing in perf branch stack sampling framework.
This patch adapts the BHRB branch filter configuration to meet those protocols.
POWER8 PMU does support 3 branch filters (out of which two are getting used in
perf branch stack) which are mutually exclussive and cannot be ORed with each
other. This implies that PMU can only handle one HW based branch filter request
at any point of time. For all other combinations PMU will pass it on to the SW.

Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND
can now be handled in SW, hence we dont error them out anymore.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c | 73 +++++++++++++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 03c5b8d..6021349 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -561,7 +561,56 @@ static int power8_generic_events[] = {
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 {
-	u64 pmu_bhrb_filter = 0;
+	u64 x, tmp, pmu_bhrb_filter = 0;
+	*filter_mask = 0;
+
+	/* No branch filter requested */
+	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
+		*filter_mask = PERF_SAMPLE_BRANCH_ANY;
+		return pmu_bhrb_filter;
+	}
+
+	/*
+	 * P8 does not support oring of PMU HW branch filters. Hence
+	 * if multiple branch filters are requested which includes filters
+	 * supported in PMU, still go ahead and clear the PMU based HW branch
+	 * filter component as in this case all the filters will be processed
+ 	 * in SW.
+	 */
+	tmp = branch_sample_type;
+
+	/* Remove privilege filters before comparison */
+	tmp &= ~PERF_SAMPLE_BRANCH_USER;
+	tmp &= ~PERF_SAMPLE_BRANCH_KERNEL;
+	tmp &= ~PERF_SAMPLE_BRANCH_HV;
+
+	for_each_branch_sample_type(x) {
+		/* Ignore privilege requests */
+		if ((x == PERF_SAMPLE_BRANCH_USER) || (x == PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV))
+			continue;
+
+		if (!(tmp & x))
+			continue;
+
+               /* Supported HW PMU filters */
+		if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) {
+			tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
+			if (tmp) {
+				pmu_bhrb_filter = 0;
+				*filter_mask = 0;
+				return pmu_bhrb_filter;
+			}
+		}
+
+		if (tmp & PERF_SAMPLE_BRANCH_COND) {
+			tmp &= ~PERF_SAMPLE_BRANCH_COND;
+			if (tmp) {
+				pmu_bhrb_filter = 0;
+				*filter_mask = 0;
+				return pmu_bhrb_filter;
+			}
+		}
+	}
 
 	/* BHRB and regular PMU events share the same privilege state
 	 * filter configuration. BHRB is always recorded along with a
@@ -570,34 +619,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
 	 * PMU event, we ignore any separate BHRB specific request.
 	 */
 
-	/* No branch filter requested */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
-		return pmu_bhrb_filter;
-
-	/* Invalid branch filter options - HW does not support */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
-		return -1;
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
-		return -1;
-
+	/* Supported individual branch filters */
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
+		*filter_mask    |= PERF_SAMPLE_BRANCH_ANY_CALL;
 		return pmu_bhrb_filter;
 	}
 
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
 		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
+		*filter_mask    |= PERF_SAMPLE_BRANCH_COND;
 		return pmu_bhrb_filter;
 	}
 
-	/* PMU does not support ANY combination of HW BHRB filters */
-	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
-			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
-		return -1;
-
-	/* Every thing else is unsupported */
-	return -1;
+	return pmu_bhrb_filter;
 }
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-04 10:32   ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: michael, mikey, sukadev, eranian, acme, ak, mingo

This patch adds enumeration for all available SW branch filters
in powerpc book3s code and also streamlines the look for the
SW branch filter entries while trying to figure out which all
branch filters can be supported in SW.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 54d39a5..42c6428 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -566,6 +566,12 @@ static int match_filters(u64 branch_sample_type, u64 filter_mask)
 	return true;
 }
 
+/* SW implemented branch filters */
+static unsigned int power_sw_filter[] =	      { PERF_SAMPLE_BRANCH_ANY_CALL,
+						PERF_SAMPLE_BRANCH_COND,
+						PERF_SAMPLE_BRANCH_ANY_RETURN,
+						PERF_SAMPLE_BRANCH_IND_CALL };
+
 /*
  * Required SW based branch filters
  *
@@ -578,6 +584,7 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
 			     					u64 *filter_mask)
 {
 	u64 branch_sw_filter = 0;
+	unsigned int i;
 
 	/* No branch filter requested */
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
@@ -593,34 +600,15 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
 	 * SW implemented filters. But right now, there is now way to
 	 * initimate the user about this decision.
 	 */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
-			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
-		}
-	}
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
-			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
-		}
-	}
 
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-		}
-	}
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
-			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+	for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) {
+		if (branch_sample_type & power_sw_filter[i]) {
+			if (!(pmu_bhrb_filter & power_sw_filter[i])) {
+				branch_sw_filter |= power_sw_filter[i];
+				*filter_mask |= power_sw_filter[i];
+			}
 		}
 	}
-
 	return branch_sw_filter;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up
@ 2013-12-04 10:32   ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-04 10:32 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: mikey, ak, eranian, michael, acme, sukadev, mingo

This patch adds enumeration for all available SW branch filters
in powerpc book3s code and also streamlines the look for the
SW branch filter entries while trying to figure out which all
branch filters can be supported in SW.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 54d39a5..42c6428 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -566,6 +566,12 @@ static int match_filters(u64 branch_sample_type, u64 filter_mask)
 	return true;
 }
 
+/* SW implemented branch filters */
+static unsigned int power_sw_filter[] =	      { PERF_SAMPLE_BRANCH_ANY_CALL,
+						PERF_SAMPLE_BRANCH_COND,
+						PERF_SAMPLE_BRANCH_ANY_RETURN,
+						PERF_SAMPLE_BRANCH_IND_CALL };
+
 /*
  * Required SW based branch filters
  *
@@ -578,6 +584,7 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
 			     					u64 *filter_mask)
 {
 	u64 branch_sw_filter = 0;
+	unsigned int i;
 
 	/* No branch filter requested */
 	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
@@ -593,34 +600,15 @@ static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
 	 * SW implemented filters. But right now, there is now way to
 	 * initimate the user about this decision.
 	 */
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
-			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
-		}
-	}
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
-			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
-		}
-	}
 
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
-		}
-	}
-
-	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
-		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
-			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
-			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
+	for (i = 0; i < ARRAY_SIZE(power_sw_filter); i++) {
+		if (branch_sample_type & power_sw_filter[i]) {
+			if (!(pmu_bhrb_filter & power_sw_filter[i])) {
+				branch_sw_filter |= power_sw_filter[i];
+				*filter_mask |= power_sw_filter[i];
+			}
 		}
 	}
-
 	return branch_sw_filter;
 }
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 00/10] perf: New conditional branch filter
  2013-12-04 10:32 ` Anshuman Khandual
@ 2013-12-05  4:47   ` Michael Ellerman
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-05  4:47 UTC (permalink / raw)
  To: acme
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, sukadev, mingo,
	Anshuman Khandual

On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> 		This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> also enables SW based branch filtering support for book3s powerpc platforms which
> have PMU HW backed branch stack sampling support. 
> 
> Summary of code changes in this patchset:
> 
> (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> (2) Add the "cond" branch filter options in the "perf record" tool
> (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
> (5) Update the documentation regarding "perf record" tool


Hi Arnaldo,

Can you please take just patches 1-5 into the perf tree? And do you mind
putting them in a topic branch so Benh can merge that.

The remaining patches are powerpc specific and still need some more review.

cheers



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 00/10] perf: New conditional branch filter
@ 2013-12-05  4:47   ` Michael Ellerman
  0 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-05  4:47 UTC (permalink / raw)
  To: acme
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, sukadev, mingo,
	Anshuman Khandual

On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> 		This patchset is the re-spin of the original branch stack sampling
> patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> also enables SW based branch filtering support for book3s powerpc platforms which
> have PMU HW backed branch stack sampling support. 
> 
> Summary of code changes in this patchset:
> 
> (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> (2) Add the "cond" branch filter options in the "perf record" tool
> (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
> (5) Update the documentation regarding "perf record" tool


Hi Arnaldo,

Can you please take just patches 1-5 into the perf tree? And do you mind
putting them in a topic branch so Benh can merge that.

The remaining patches are powerpc specific and still need some more review.

cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 00/10] perf: New conditional branch filter
  2013-12-05  4:47   ` Michael Ellerman
@ 2013-12-06 13:18     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 57+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-06 13:18 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, sukadev, mingo,
	Anshuman Khandual

Em Thu, Dec 05, 2013 at 03:47:54PM +1100, Michael Ellerman escreveu:
> On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> > 		This patchset is the re-spin of the original branch stack sampling
> > patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> > also enables SW based branch filtering support for book3s powerpc platforms which
> > have PMU HW backed branch stack sampling support. 
> > 
> > Summary of code changes in this patchset:
> > 
> > (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> > (2) Add the "cond" branch filter options in the "perf record" tool
> > (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> > (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
> > (5) Update the documentation regarding "perf record" tool
> 
> 
> Hi Arnaldo,
> 
> Can you please take just patches 1-5 into the perf tree? And do you mind
> putting them in a topic branch so Benh can merge that.

This is mostly kernel code, I process the userspace ones, so I think either
Ingo or PeterZ should pick these, Ingo, Peter?

Only:

Subject: [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record

Which is a one liner, touches tools/perf/, and I'm ok with it.


- Arnaldo
 
> The remaining patches are powerpc specific and still need some more review.
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 00/10] perf: New conditional branch filter
@ 2013-12-06 13:18     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 57+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-06 13:18 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, sukadev, mingo,
	Anshuman Khandual

Em Thu, Dec 05, 2013 at 03:47:54PM +1100, Michael Ellerman escreveu:
> On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> > 		This patchset is the re-spin of the original branch stack sampling
> > patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> > also enables SW based branch filtering support for book3s powerpc platforms which
> > have PMU HW backed branch stack sampling support. 
> > 
> > Summary of code changes in this patchset:
> > 
> > (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> > (2) Add the "cond" branch filter options in the "perf record" tool
> > (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> > (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
> > (5) Update the documentation regarding "perf record" tool
> 
> 
> Hi Arnaldo,
> 
> Can you please take just patches 1-5 into the perf tree? And do you mind
> putting them in a topic branch so Benh can merge that.

This is mostly kernel code, I process the userspace ones, so I think either
Ingo or PeterZ should pick these, Ingo, Peter?

Only:

Subject: [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record

Which is a one liner, touches tools/perf/, and I'm ok with it.


- Arnaldo
 
> The remaining patches are powerpc specific and still need some more review.
> 
> cheers
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 04/10] x86, perf: Add conditional branch filtering support
  2013-12-04 10:32   ` Anshuman Khandual
@ 2013-12-06 16:46     ` Andi Kleen
  -1 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2013-12-06 16:46 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linuxppc-dev, linux-kernel, michael, mikey, sukadev, eranian,
	acme, mingo

On Wed, Dec 04, 2013 at 04:02:36PM +0530, Anshuman Khandual wrote:
> This patch adds conditional branch filtering support,
> enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
> stack sampling framework by utilizing an available
> software filter X86_BR_JCC.

Newer Intel CPUs a hardware filter too for "not a conditional
branch".  I can look at implementing that.

The software option seems fine for now.

-Andi


> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Reviewed-by: Stephane Eranian <eranian@google.com>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> index d82d155..9dd2459 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> @@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
>  	if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
>  		mask |= X86_BR_NO_TX;
>  
> +	if (br_type & PERF_SAMPLE_BRANCH_COND)
> +		mask |= X86_BR_JCC;
> +
>  	/*
>  	 * stash actual user request into reg, it may
>  	 * be used by fixup code for some CPU
> @@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
>  	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
>  	 */
>  	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
> +	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
>  };
>  
>  static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
> @@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
>  	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
>  					| LBR_FAR,
>  	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
> +	[PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
>  };
>  
>  /* core */
> -- 
> 1.7.11.7
> 

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 04/10] x86, perf: Add conditional branch filtering support
@ 2013-12-06 16:46     ` Andi Kleen
  0 siblings, 0 replies; 57+ messages in thread
From: Andi Kleen @ 2013-12-06 16:46 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, linux-kernel, eranian, michael, linuxppc-dev, acme,
	sukadev, mingo

On Wed, Dec 04, 2013 at 04:02:36PM +0530, Anshuman Khandual wrote:
> This patch adds conditional branch filtering support,
> enabling it for PERF_SAMPLE_BRANCH_COND in perf branch
> stack sampling framework by utilizing an available
> software filter X86_BR_JCC.

Newer Intel CPUs a hardware filter too for "not a conditional
branch".  I can look at implementing that.

The software option seems fine for now.

-Andi


> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Reviewed-by: Stephane Eranian <eranian@google.com>
> ---
>  arch/x86/kernel/cpu/perf_event_intel_lbr.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> index d82d155..9dd2459 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> @@ -384,6 +384,9 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
>  	if (br_type & PERF_SAMPLE_BRANCH_NO_TX)
>  		mask |= X86_BR_NO_TX;
>  
> +	if (br_type & PERF_SAMPLE_BRANCH_COND)
> +		mask |= X86_BR_JCC;
> +
>  	/*
>  	 * stash actual user request into reg, it may
>  	 * be used by fixup code for some CPU
> @@ -678,6 +681,7 @@ static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
>  	 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
>  	 */
>  	[PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
> +	[PERF_SAMPLE_BRANCH_COND]     = LBR_JCC,
>  };
>  
>  static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
> @@ -689,6 +693,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
>  	[PERF_SAMPLE_BRANCH_ANY_CALL]	= LBR_REL_CALL | LBR_IND_CALL
>  					| LBR_FAR,
>  	[PERF_SAMPLE_BRANCH_IND_CALL]	= LBR_IND_CALL,
> +	[PERF_SAMPLE_BRANCH_COND]       = LBR_JCC,
>  };
>  
>  /* core */
> -- 
> 1.7.11.7
> 

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 00/10] perf: New conditional branch filter
  2013-12-06 13:18     ` Arnaldo Carvalho de Melo
@ 2013-12-09  0:41       ` Michael Ellerman
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-09  0:41 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, peterz, mingo
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, sukadev,
	Anshuman Khandual

On Fri, 2013-12-06 at 10:18 -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Dec 05, 2013 at 03:47:54PM +1100, Michael Ellerman escreveu:
> > On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> > > 		This patchset is the re-spin of the original branch stack sampling
> > > patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> > > also enables SW based branch filtering support for book3s powerpc platforms which
> > > have PMU HW backed branch stack sampling support. 
> > > 
> > > Summary of code changes in this patchset:
> > > 
> > > (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> > > (2) Add the "cond" branch filter options in the "perf record" tool
> > > (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> > > (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
> > > (5) Update the documentation regarding "perf record" tool
> > 
> > 
> > Hi Arnaldo,
> > 
> > Can you please take just patches 1-5 into the perf tree? And do you mind
> > putting them in a topic branch so Benh can merge that.
> 
> This is mostly kernel code, I process the userspace ones, so I think either
> Ingo or PeterZ should pick these, Ingo, Peter?

Urgh, sorry. MAINTAINERS just lists all of you in a block.

Added PeterZ to CC.

Peter/Ingo can you please take just patches 1-5 into the perf tree? And
do you mind putting them in a topic branch so Benh can merge that.

The generic & x86 changes have a Reviewed-by from Stephane, and the change to
tools/perf has an ack-of-sorts from Arnaldo:

> Only:
> 
> Subject: [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record
> 
> Which is a one liner, touches tools/perf/, and I'm ok with it.


cheers



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 00/10] perf: New conditional branch filter
@ 2013-12-09  0:41       ` Michael Ellerman
  0 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-09  0:41 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, peterz, mingo
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, sukadev,
	Anshuman Khandual

On Fri, 2013-12-06 at 10:18 -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Dec 05, 2013 at 03:47:54PM +1100, Michael Ellerman escreveu:
> > On Wed, 2013-12-04 at 16:02 +0530, Anshuman Khandual wrote:
> > > 		This patchset is the re-spin of the original branch stack sampling
> > > patchset which introduced new PERF_SAMPLE_BRANCH_COND branch filter. This patchset
> > > also enables SW based branch filtering support for book3s powerpc platforms which
> > > have PMU HW backed branch stack sampling support. 
> > > 
> > > Summary of code changes in this patchset:
> > > 
> > > (1) Introduces a new PERF_SAMPLE_BRANCH_COND branch filter
> > > (2) Add the "cond" branch filter options in the "perf record" tool
> > > (3) Enable PERF_SAMPLE_BRANCH_COND in X86 platforms
> > > (4) Enable PERF_SAMPLE_BRANCH_COND in POWER8 platform 
> > > (5) Update the documentation regarding "perf record" tool
> > 
> > 
> > Hi Arnaldo,
> > 
> > Can you please take just patches 1-5 into the perf tree? And do you mind
> > putting them in a topic branch so Benh can merge that.
> 
> This is mostly kernel code, I process the userspace ones, so I think either
> Ingo or PeterZ should pick these, Ingo, Peter?

Urgh, sorry. MAINTAINERS just lists all of you in a block.

Added PeterZ to CC.

Peter/Ingo can you please take just patches 1-5 into the perf tree? And
do you mind putting them in a topic branch so Benh can merge that.

The generic & x86 changes have a Reviewed-by from Stephane, and the change to
tools/perf has an ack-of-sorts from Arnaldo:

> Only:
> 
> Subject: [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record
> 
> Which is a one liner, touches tools/perf/, and I'm ok with it.


cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
  2013-12-04 10:32   ` Anshuman Khandual
  (?)
@ 2013-12-09  6:21   ` Michael Ellerman
  2013-12-10  6:09       ` Anshuman Khandual
  -1 siblings, 1 reply; 57+ messages in thread
From: Michael Ellerman @ 2013-12-09  6:21 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, khandual
  Cc: mikey, ak, eranian, acme, sukadev, mingo

On Wed, 2013-04-12 at 10:32:39 UTC, Anshuman Khandual wrote:
> Generic powerpc branch instruction analysis support added in the code
> patching library which will help the subsequent patch on SW based
> filtering of branch records in perf. This patch also converts and
> exports some of the existing local static functions through the header
> file to be used else where.
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
> index a6f8c7a..8bab417 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -22,6 +22,36 @@
>  #define BRANCH_SET_LINK	0x1
>  #define BRANCH_ABSOLUTE	0x2
>  
> +#define XL_FORM_LR  0x4C000020
> +#define XL_FORM_CTR 0x4C000420
> +#define XL_FORM_TAR 0x4C000460
> +
> +#define BO_ALWAYS    0x02800000
> +#define BO_CTR       0x02000000
> +#define BO_CRBI_OFF  0x00800000
> +#define BO_CRBI_ON   0x01800000
> +#define BO_CRBI_HINT 0x00400000
> +
> +/* Forms of branch instruction */
> +int instr_is_branch_iform(unsigned int instr);
> +int instr_is_branch_bform(unsigned int instr);
> +int instr_is_branch_xlform(unsigned int instr);
> +
> +/* Classification of XL-form instruction */
> +int is_xlform_lr(unsigned int instr);
> +int is_xlform_ctr(unsigned int instr);
> +int is_xlform_tar(unsigned int instr);
> +
> +/* Branch instruction is a call */
> +int is_branch_link_set(unsigned int instr);
> +
> +/* BO field analysis (B-form or XL-form) */
> +int is_bo_always(unsigned int instr);
> +int is_bo_ctr(unsigned int instr);
> +int is_bo_crbi_off(unsigned int instr);
> +int is_bo_crbi_on(unsigned int instr);
> +int is_bo_crbi_hint(unsigned int instr);


I think this is the wrong API.

We end up with all these micro checks, which don't actually encapsulate much,
and don't implement the logic perf needs. If we had another user for this level
of detail then it might make sense, but for a single user I think we're better
off just implementing the semantics it wants.

So that would be something more like:

bool instr_is_return_branch(unsigned int instr);
bool instr_is_conditional_branch(unsigned int instr);
bool instr_is_func_call(unsigned int instr);
bool instr_is_indirect_func_call(unsigned int instr);


These would then encapsulate something like the logic in your 8/10 patch. You
can hopefully also optimise the checking logic in each routine because you know
the exact semantics you're implementing.

cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-04 10:32   ` Anshuman Khandual
  (?)
@ 2013-12-09  6:21   ` Michael Ellerman
  2013-12-10  5:57       ` Anshuman Khandual
  2013-12-20 11:01       ` Anshuman Khandual
  -1 siblings, 2 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-09  6:21 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, khandual
  Cc: mikey, ak, eranian, acme, sukadev, mingo

On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> This patch enables SW based post processing of BHRB captured branches
> to be able to meet more user defined branch filtration criteria in perf
> branch stack sampling framework. These changes increase the number of
> branch filters and their valid combinations on any powerpc64 server
> platform with BHRB support. Find the summary of code changes here.
> 
> (1) struct cpu_hw_events
> 
> 	Introduced two new variables track various filter values and mask
> 
> 	(a) bhrb_sw_filter	Tracks SW implemented branch filter flags
> 	(b) filter_mask		Tracks both (SW and HW) branch filter flags

The name 'filter_mask' doesn't mean much to me. I'd rather it was 'bhrb_filter'.


> (2) Event creation
> 
> 	Kernel will figure out supported BHRB branch filters through a PMU call
> 	back 'bhrb_filter_map'. This function will find out how many of the
> 	requested branch filters can be supported in the PMU HW. It will not
> 	try to invalidate any branch filter combinations. Event creation will not
> 	error out because of lack of HW based branch filters. Meanwhile it will
> 	track the overall supported branch filters in the "filter_mask" variable.
> 
> 	Once the PMU call back returns kernel will process the user branch filter
> 	request against available SW filters while looking at the "filter_mask".
> 	During this phase all the branch filters which are still pending from the
> 	user requested list will have to be supported in SW failing which the
> 	event creation will error out.
> 
> (3) SW branch filter
> 
> 	During the BHRB data capture inside the PMU interrupt context, each
> 	of the captured 'perf_branch_entry.from' will be checked for compliance
> 	with applicable SW branch filters. If the entry does not conform to the
> 	filter requirements, it will be discarded from the final perf branch
> 	stack buffer.
> 
> (4) Supported SW based branch filters
> 
> 	(a) PERF_SAMPLE_BRANCH_ANY_RETURN
> 	(b) PERF_SAMPLE_BRANCH_IND_CALL
> 	(c) PERF_SAMPLE_BRANCH_ANY_CALL
> 	(d) PERF_SAMPLE_BRANCH_COND
> 
> 	Please refer patch to understand the classification of instructions into
> 	these branch filter categories.
> 
> (5) Multiple branch filter semantics
> 
> 	Book3 sever implementation follows the same OR semantics (as implemented in
> 	x86) while dealing with multiple branch filters at any point of time. SW
> 	branch filter analysis is carried on the data set captured in the PMU HW.
> 	So the resulting set of data (after applying the SW filters) will inherently
> 	be an AND with the HW captured set. Hence any combination of HW and SW branch
> 	filters will be invalid. HW based branch filters are more efficient and faster
> 	compared to SW implemented branch filters. So at first the PMU should decide
> 	whether it can support all the requested branch filters itself or not. In case
> 	it can support all the branch filters in an OR manner, we dont apply any SW
> 	branch filter on top of the HW captured set (which is the final set). This
> 	preserves the OR semantic of multiple branch filters as required. But in case
> 	where the PMU cannot support all the requested branch filters in an OR manner,
> 	it should not apply any it's filters and leave it upto the SW to handle them
> 	all. Its the PMU code's responsibility to uphold this protocol to be able to
> 	conform to the overall OR semantic of perf branch stack sampling framework.


I'd prefer this level of commentary was in a block comment in the code. It's
much more likely to be seen by a future hacker than here in the commit log.


> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 2de7d48..54d39a5 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -48,6 +48,8 @@ struct cpu_hw_events {
>  
>  	/* BHRB bits */
>  	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
> +	u64				bhrb_sw_filter;	/* BHRB SW branch filter */
> +	u64				filter_mask;	/* Branch filter mask */
>  	int				bhrb_users;
>  	void				*bhrb_context;
>  	struct	perf_branch_stack	bhrb_stack;
> @@ -400,6 +402,228 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>  	return target - (unsigned long)&instr + addr;
>  }
>  
> +/*
> + * Instruction opcode analysis
> + *
> + * Analyse instruction opcodes and classify them
> + * into various branch filter options available.
> + * This follows the standard semantics of OR which
> + * means that instructions which conforms to `any`
> + * of the requested branch filters get picked up.
> + */
> +static bool validate_instruction(unsigned int *addr, u64 bhrb_sw_filter)
> +{

"validate" is not a good name here. That implies that this routine identifies
"valid" and "invalid" instructions - but that's not really correct.

Also it's preferable to not use the same variable name for the local as for the
cpuhw->bhrb_sw_filter global. Although technically it doesn't shadow the global
it can still be confusing to a human, ie. me. A good name for the local would
just be "sw_filter" because we know in this code that we're dealing with the
BHRB.


> +	bool result = false;
> +
> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
> +
> +		/* XL-form instruction */
> +		if (instr_is_branch_xlform(*addr)) {
> +
> +			/* LR should not be set */
> +				/*
> +			 	 * Conditional and unconditional
> +			 	 * branch to LR register.
> +			 	 */
> +				if (is_xlform_lr(*addr))
> +					result = true;
> +			}
> +		}
> +	}

is_xform_lr() implies instr_is_branch_xlform(), and once you get a hit you can
short-circuit and exit the function, so this should boil down to just:

	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)
		if (is_xlform_lr(*addr) && !is_branch_link_set(*addr))
			return true;


Having said that I think it should move into a routine in code-patching as I
said in the comments to the previous patch.


> +
> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
> +		/* XL-form instruction */
> +		if (instr_is_branch_xlform(*addr)) {
> +
> +			/* LR should be set */
> +			if (is_branch_link_set(*addr)) {
> +				/*
> +			 	 * Conditional and unconditional
> +			 	 * branch to CTR.
> +			 	 */
> +				if (is_xlform_ctr(*addr))
> +					result = true;
> +
> +				/*
> +			 	 * Conditional and unconditional
> +			 	 * branch to LR.
> +			 	 */
> +				if (is_xlform_lr(*addr))
> +					result = true;
> +
> +				/*
> +			 	 * Conditional and unconditional
> +			 	 * branch to TAR.
> +			 	 */
> +				if (is_xlform_tar(*addr))
> +					result = true;

What other kind of XL-Form branch is there?

> +			}
> +		}
> +	}

The comments above all have a bogus leading space.

> +
> +	/* Any-form branch */
> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_CALL) {
> +		/* LR should be set */
> +		if (is_branch_link_set(*addr))
> +			result = true;

Short circuit.

> +	}
> +
> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> +
> +		/* I-form instruction - excluded */
> +		if (instr_is_branch_iform(*addr))
> +			goto out;
> +
> +		/* B-form or XL-form instruction */
> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> +
> +			/* Not branch always  */
> +			if (!is_bo_always(*addr)) {
> +
> +				/* Conditional branch to CTR register */
> +				if (is_bo_ctr(*addr))
> +					goto out;

We might have discussed this but why not?

> +
> +				/* CR[BI] conditional branch with static hint */

A conditional branch with a static hint is still a conditional branch?

> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
> +					if (is_bo_crbi_hint(*addr))
> +						goto out;
> +				}
> +
> +				result = true;
> +			}
> +		}
> +	}
> +out:
> +	return result;
> +}
> +
> +static bool check_instruction(u64 addr, u64 bhrb_sw_filter)
> +{


"check" is not a very descriptive name here, especially when "check" calls
"validate".

"filter" is also not good because a filter keeps some things and rejects others,
and the directionality is not clear.

I'd suggest "filter_selects_branch()" or just "keep_branch()".


> +	unsigned int instr;
> +	bool ret;
> +
> +	if (bhrb_sw_filter == 0)
> +		return true;
> +
> +	if (is_kernel_addr(addr)) {
> +		ret = validate_instruction((unsigned int *) addr, bhrb_sw_filter);

No reason not to return directly here.

That would then remove the need for an else block.

> +	} else {
> +		/*
> +		 * Userspace address needs to be
> +		 * copied first before analysis.
> +		 */
> +		pagefault_disable();
> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);

I suspect you borrowed this incantation from the callchain code. Unlike that
code you don't fallback to reading the page tables directly.

I'd rather see the accessor in the callchain code made generic and have you
call it here.

> +
> +		/*
> +		 * If the instruction could not be accessible
> +		 * from user space, we still 'okay' the entry.
> +		 */
> +		if (ret) {
> +			pagefault_enable();
> +			return true;
> +		}
> +		pagefault_enable();
> +		ret = validate_instruction(&instr, bhrb_sw_filter);

No reason not to return directly here.

> +	}
> +	return ret;
> +}
> +
> +/*
> + * Validate whether all requested branch filters
> + * are getting processed either in the PMU or in SW.
> + */
> +static int match_filters(u64 branch_sample_type, u64 filter_mask)

I don't really understand why we have this routine?

We should implement the filter in HW if we can, or in SW. Which filters can't we
implement in SW?

> +{
> +	u64 x;
> +
> +	if (filter_mask == PERF_SAMPLE_BRANCH_ANY)
> +		return true;
> +
> +	for_each_branch_sample_type(x) {
> +		if (!(branch_sample_type & x))
> +			continue;
> +		/*
> +		 * Privilege filter requests have been already
> +		 * taken care during the base PMU configuration.
> +		 */
> +		if (x == PERF_SAMPLE_BRANCH_USER)
> +			continue;
> +		if (x == PERF_SAMPLE_BRANCH_KERNEL)
> +			continue;
> +		if (x == PERF_SAMPLE_BRANCH_HV)
> +			continue;
> +
> +		/*
> +		 * Requested filter not available either
> +		 * in PMU or in SW.
> +		 */
> +		if (!(filter_mask & x))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +/*
> + * Required SW based branch filters
> + *
> + * This is called after figuring out what all branch filters the
> + * PMU HW supports for the requested branch filter set. Here we
> + * will go through all the SW implemented branch filters one by
> + * one and pick them up if its not already supported in the PMU.
> + */
> +static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
> +			     					u64 *filter_mask)

Whitespace is foobar here ^

This function deals exclusively with the software filter IIUI, but the name
doesn't indicate that in any way.

As far as the logic goes, you return the software filter value, as well as
mutating the *filter_mask. And in all cases you make the same modification to
both. That seems very dubious.

Shouldn't this routine just setup the software filter, and leave the upper
level code to deal with the HW & SW filter values?

> +{
> +	u64 branch_sw_filter = 0;
> +
> +	/* No branch filter requested */
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
> +		WARN_ON(pmu_bhrb_filter != 0);
> +		WARN_ON(*filter_mask != PERF_SAMPLE_BRANCH_ANY);
> +		return branch_sw_filter;
> +	}
> +
> +	/*
> +	 * PMU supported branch filters must also be implemented in SW
> +	 * in the event when the PMU is unable to process them for some
> +	 * reason. This all those branch filters can be satisfied with
> +	 * SW implemented filters. But right now, there is now way to
> +	 * initimate the user about this decision.

Please proof read these comments, I don't entirely follow this one.

You say "must also be implemented in SW" - but I think it's actually "must be
implemented in SW", ie. the HW is not "also" implementing the filter.

You say "in the event when" but I think you just mean "when" - the word "event"
has a particular meaning in this code so you should only use it for that if at
all possible.

I don't follow "This all those".

You should just drop the last sentence, there is never going to be any way to
notify the user that their filter is implemented in HW vs SW, that's an
implementation detail.

> +	 */
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
> +			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
> +		}
> +	}
> +
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
> +			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
> +		}
> +	}
> +
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
> +			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
> +		}
> +	}
> +
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
> +			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
> +		}
> +	}
> +
> +	return branch_sw_filter;
> +}
> +
>  /* Processing BHRB entries */
>  void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>  {
> @@ -459,17 +683,29 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>  					addr = 0;
>  				}
>  				cpuhw->bhrb_entries[u_index].from = addr;
> +
> +				if (!check_instruction(cpuhw->
> +						bhrb_entries[u_index].from,
> +							cpuhw->bhrb_sw_filter))
> +					u_index--;
>  			} else {
>  				/* Branches to immediate field 
>  				   (ie I or B form) */
>  				cpuhw->bhrb_entries[u_index].from = addr;
> -				cpuhw->bhrb_entries[u_index].to =
> -					power_pmu_bhrb_to(addr);
> -				cpuhw->bhrb_entries[u_index].mispred = pred;
> -				cpuhw->bhrb_entries[u_index].predicted = ~pred;
> +				if (check_instruction(cpuhw->
> +						bhrb_entries[u_index].from,
> +						cpuhw->bhrb_sw_filter)) {
> +					cpuhw->bhrb_entries[u_index].
> +						to = power_pmu_bhrb_to(addr);
> +					cpuhw->bhrb_entries[u_index].
> +						mispred = pred;
> +					cpuhw->bhrb_entries[u_index].
> +						predicted = ~pred;
> +				} else {
> +					u_index--;
> +				}
>  			}
>  			u_index++;


This code was already in need of some unindentation, and now it's just
ridiculous.

To start with at the beginning of this routine we have:

while (..) {
	if (!val)
		break;
	else {
		// Bulk of the logic
		...
	}
}

That should almost always become:

while (..) {
	if (!val)
		break;

	// Bulk of the logic
	...
}


But in this case that's not enough. Please send a precursor patch which moves
this logic out into a helper function.


> -
>  		}
>  	}
>  	cpuhw->bhrb_stack.nr = u_index;
> @@ -1255,7 +1491,11 @@ nocheck:
>  	if (has_branch_stack(event)) {
>  		power_pmu_bhrb_enable(event);
>  		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
> -					event->attr.branch_sample_type);
> +					event->attr.branch_sample_type,
> +					&cpuhw->filter_mask);
> +		cpuhw->bhrb_sw_filter = branch_filter_map
> +					(event->attr.branch_sample_type,
> +					cpuhw->bhrb_hw_filter, &cpuhw->filter_mask);
>  	}
>  
>  	perf_pmu_enable(event->pmu);
> @@ -1637,10 +1877,16 @@ static int power_pmu_event_init(struct perf_event *event)
>  	err = power_check_constraints(cpuhw, events, cflags, n + 1);
>  
>  	if (has_branch_stack(event)) {
> -		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
> -					event->attr.branch_sample_type);
> -
> -		if(cpuhw->bhrb_hw_filter == -1)
> +		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
> +				(event->attr.branch_sample_type,
> +				&cpuhw->filter_mask);
> +		cpuhw->bhrb_sw_filter = branch_filter_map
> +				(event->attr.branch_sample_type,
> +				cpuhw->bhrb_hw_filter,
> +				&cpuhw->filter_mask);
> +
> +		if(!match_filters(event->attr.branch_sample_type,
> +						cpuhw->filter_mask))
>  			return -EOPNOTSUPP;

The above two hunks look too similar for my liking.


cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
  2013-12-04 10:32   ` Anshuman Khandual
  (?)
@ 2013-12-09  6:21   ` Michael Ellerman
  2013-12-13  8:20       ` Anshuman Khandual
  -1 siblings, 1 reply; 57+ messages in thread
From: Michael Ellerman @ 2013-12-09  6:21 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, khandual
  Cc: mikey, ak, eranian, acme, sukadev, mingo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4825 bytes --]

On Wed, 2013-04-12 at 10:32:41 UTC, Anshuman Khandual wrote:
> Powerpc kernel now supports SW based branch filters for book3s systems with some
> specifc requirements while dealing with HW supported branch filters in order to
> achieve overall OR semantics prevailing in perf branch stack sampling framework.
> This patch adapts the BHRB branch filter configuration to meet those protocols.
> POWER8 PMU does support 3 branch filters (out of which two are getting used in
> perf branch stack) which are mutually exclussive and cannot be ORed with each
> other. This implies that PMU can only handle one HW based branch filter request
> at any point of time. For all other combinations PMU will pass it on to the SW.
> 
> Also the combination of PERF_SAMPLE_BRANCH_ANY_CALL and PERF_SAMPLE_BRANCH_COND
> can now be handled in SW, hence we dont error them out anymore.
> 
> diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
> index 03c5b8d..6021349 100644
> --- a/arch/powerpc/perf/power8-pmu.c
> +++ b/arch/powerpc/perf/power8-pmu.c
> @@ -561,7 +561,56 @@ static int power8_generic_events[] = {
>  
>  static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
>  {
> -	u64 pmu_bhrb_filter = 0;
> +	u64 x, tmp, pmu_bhrb_filter = 0;
> +	*filter_mask = 0;
> +
> +	/* No branch filter requested */
> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
> +		*filter_mask = PERF_SAMPLE_BRANCH_ANY;
> +		return pmu_bhrb_filter;
> +	}
> +
> +	/*
> +	 * P8 does not support oring of PMU HW branch filters. Hence
> +	 * if multiple branch filters are requested which includes filters
> +	 * supported in PMU, still go ahead and clear the PMU based HW branch
> +	 * filter component as in this case all the filters will be processed
> + 	 * in SW.

Leading space there.

> +	 */
> +	tmp = branch_sample_type;
> +
> +	/* Remove privilege filters before comparison */
> +	tmp &= ~PERF_SAMPLE_BRANCH_USER;
> +	tmp &= ~PERF_SAMPLE_BRANCH_KERNEL;
> +	tmp &= ~PERF_SAMPLE_BRANCH_HV;
> +
> +	for_each_branch_sample_type(x) {
> +		/* Ignore privilege requests */
> +		if ((x == PERF_SAMPLE_BRANCH_USER) || (x == PERF_SAMPLE_BRANCH_KERNEL) || (x == PERF_SAMPLE_BRANCH_HV))
> +			continue;
> +
> +		if (!(tmp & x))
> +			continue;
> +
> +               /* Supported HW PMU filters */
> +		if (tmp & PERF_SAMPLE_BRANCH_ANY_CALL) {
> +			tmp &= ~PERF_SAMPLE_BRANCH_ANY_CALL;
> +			if (tmp) {
> +				pmu_bhrb_filter = 0;
> +				*filter_mask = 0;
> +				return pmu_bhrb_filter;
> +			}
> +		}
> +
> +		if (tmp & PERF_SAMPLE_BRANCH_COND) {
> +			tmp &= ~PERF_SAMPLE_BRANCH_COND;
> +			if (tmp) {
> +				pmu_bhrb_filter = 0;
> +				*filter_mask = 0;
> +				return pmu_bhrb_filter;
> +			}
> +		}
> +	}

>  
>  	/* BHRB and regular PMU events share the same privilege state
>  	 * filter configuration. BHRB is always recorded along with a
> @@ -570,34 +619,20 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type, u64 *filter_mask)
>  	 * PMU event, we ignore any separate BHRB specific request.
>  	 */
>  
> -	/* No branch filter requested */
> -	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY)
> -		return pmu_bhrb_filter;
> -
> -	/* Invalid branch filter options - HW does not support */
> -	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
> -		return -1;
> -
> -	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL)
> -		return -1;
> -
> +	/* Supported individual branch filters */
>  	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
>  		pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
> +		*filter_mask    |= PERF_SAMPLE_BRANCH_ANY_CALL;
>  		return pmu_bhrb_filter;
>  	}
>  
>  	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
>  		pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
> +		*filter_mask    |= PERF_SAMPLE_BRANCH_COND;
>  		return pmu_bhrb_filter;
>  	}
>  
> -	/* PMU does not support ANY combination of HW BHRB filters */
> -	if ((branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) &&
> -			(branch_sample_type & PERF_SAMPLE_BRANCH_COND))
> -		return -1;
> -
> -	/* Every thing else is unsupported */
> -	return -1;
> +	return pmu_bhrb_filter;
>  }


As I said in my comments on version 3 which you ignored:

    I think it would be clearer if we actually checked for the possibilities we
    allow and let everything else fall through, eg:

        /* Ignore user/kernel/hv bits */
        branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;

        if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
                return 0;

        if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
                return POWER8_MMCRA_IFM1;
 
        if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
                return POWER8_MMCRA_IFM3;
        
        return -1;


cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up
  2013-12-04 10:32   ` Anshuman Khandual
  (?)
@ 2013-12-09  6:21   ` Michael Ellerman
  2013-12-20 11:06       ` Anshuman Khandual
  -1 siblings, 1 reply; 57+ messages in thread
From: Michael Ellerman @ 2013-12-09  6:21 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, khandual
  Cc: mikey, ak, eranian, acme, sukadev, mingo

On Wed, 2013-04-12 at 10:32:42 UTC, Anshuman Khandual wrote:
> This patch adds enumeration for all available SW branch filters
> in powerpc book3s code and also streamlines the look for the
> SW branch filter entries while trying to figure out which all
> branch filters can be supported in SW.

This appears to patch code that was only added in 8/10 ?

Was there any reason not to do it the right way from the beginning?

cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-09  6:21   ` Michael Ellerman
@ 2013-12-10  5:57       ` Anshuman Khandual
  2013-12-20 11:01       ` Anshuman Khandual
  1 sibling, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-10  5:57 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> This code was already in need of some unindentation, and now it's just
> ridiculous.
> 
> To start with at the beginning of this routine we have:
> 
> while (..) {
> 	if (!val)
> 		break;
> 	else {
> 		// Bulk of the logic
> 		...
> 	}
> }
> 
> That should almost always become:
> 
> while (..) {
> 	if (!val)
> 		break;
> 
> 	// Bulk of the logic
> 	...
> }
> 
> 
> But in this case that's not enough. Please send a precursor patch which moves
> this logic out into a helper function.

Hey Michael,

I believe this patch should be able to take care of this.

commit d66d729715cabe0cfd8e34861a6afa8ad639ddf3
Author: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Date:   Tue Dec 10 11:10:06 2013 +0530

    power, perf: Clean up BHRB processing
    
    This patch cleans up some indentation problem and re-organizes the
    BHRB processing code with an additional helper function.
    
    Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 29b89e8..9ae96c5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -400,11 +400,20 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	return target - (unsigned long)&instr + addr;
 }
 
+void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred)
+{
+	cpuhw->bhrb_entries[u_index].from = from;
+	cpuhw->bhrb_entries[u_index].to = to;
+	cpuhw->bhrb_entries[u_index].mispred = pred;
+	cpuhw->bhrb_entries[u_index].predicted = ~pred;
+	return;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
 	u64 val;
-	u64 addr;
+	u64 addr, tmp;
 	int r_index, u_index, pred;
 
 	r_index = 0;
@@ -415,62 +424,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 		if (!val)
 			/* Terminal marker: End of valid BHRB entries */
 			break;
-		else {
-			addr = val & BHRB_EA;
-			pred = val & BHRB_PREDICTION;
 
-			if (!addr)
-				/* invalid entry */
-				continue;
+		addr = val & BHRB_EA;
+		pred = val & BHRB_PREDICTION;
 
-			/* Branches are read most recent first (ie. mfbhrb 0 is
-			 * the most recent branch).
-			 * There are two types of valid entries:
-			 * 1) a target entry which is the to address of a
-			 *    computed goto like a blr,bctr,btar.  The next
-			 *    entry read from the bhrb will be branch
-			 *    corresponding to this target (ie. the actual
-			 *    blr/bctr/btar instruction).
-			 * 2) a from address which is an actual branch.  If a
-			 *    target entry proceeds this, then this is the
-			 *    matching branch for that target.  If this is not
-			 *    following a target entry, then this is a branch
-			 *    where the target is given as an immediate field
-			 *    in the instruction (ie. an i or b form branch).
-			 *    In this case we need to read the instruction from
-			 *    memory to determine the target/to address.
+		if (!addr)
+			/* invalid entry */
+			continue;
+
+		/* Branches are read most recent first (ie. mfbhrb 0 is
+		 * the most recent branch).
+		 * There are two types of valid entries:
+		 * 1) a target entry which is the to address of a
+		 *    computed goto like a blr,bctr,btar.  The next
+		 *    entry read from the bhrb will be branch
+		 *    corresponding to this target (ie. the actual
+		 *    blr/bctr/btar instruction).
+		 * 2) a from address which is an actual branch.  If a
+		 *    target entry proceeds this, then this is the
+		 *    matching branch for that target.  If this is not
+		 *    following a target entry, then this is a branch
+		 *    where the target is given as an immediate field
+		 *    in the instruction (ie. an i or b form branch).
+		 *    In this case we need to read the instruction from
+		 *    memory to determine the target/to address.
+		 */
+		if (val & BHRB_TARGET) {
+			/* Target branches use two entries
+			 * (ie. computed gotos/XL form)
 			 */
+			tmp = addr;
 
+			/* Get from address in next entry */
+			val = read_bhrb(r_index++);
+			addr = val & BHRB_EA;
 			if (val & BHRB_TARGET) {
-				/* Target branches use two entries
-				 * (ie. computed gotos/XL form)
-				 */
-				cpuhw->bhrb_entries[u_index].to = addr;
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
-
-				/* Get from address in next entry */
-				val = read_bhrb(r_index++);
-				addr = val & BHRB_EA;
-				if (val & BHRB_TARGET) {
-					/* Shouldn't have two targets in a
-					   row.. Reset index and try again */
-					r_index--;
-					addr = 0;
-				}
-				cpuhw->bhrb_entries[u_index].from = addr;
-			} else {
-				/* Branches to immediate field 
-				   (ie I or B form) */
-				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to =
-					power_pmu_bhrb_to(addr);
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
+				/* Shouldn't have two targets in a
+				   row.. Reset index and try again */
+				r_index--;
+				addr = 0;
 			}
-			u_index++;
-
+			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
+		} else {
+			/* Branches to immediate field 
+			   (ie I or B form) */
+			tmp = power_pmu_bhrb_to(addr);
+			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
 		}
+		u_index++;
 	}
 	cpuhw->bhrb_stack.nr = u_index;
 	return;


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
@ 2013-12-10  5:57       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-10  5:57 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> This code was already in need of some unindentation, and now it's just
> ridiculous.
> 
> To start with at the beginning of this routine we have:
> 
> while (..) {
> 	if (!val)
> 		break;
> 	else {
> 		// Bulk of the logic
> 		...
> 	}
> }
> 
> That should almost always become:
> 
> while (..) {
> 	if (!val)
> 		break;
> 
> 	// Bulk of the logic
> 	...
> }
> 
> 
> But in this case that's not enough. Please send a precursor patch which moves
> this logic out into a helper function.

Hey Michael,

I believe this patch should be able to take care of this.

commit d66d729715cabe0cfd8e34861a6afa8ad639ddf3
Author: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Date:   Tue Dec 10 11:10:06 2013 +0530

    power, perf: Clean up BHRB processing
    
    This patch cleans up some indentation problem and re-organizes the
    BHRB processing code with an additional helper function.
    
    Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 29b89e8..9ae96c5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -400,11 +400,20 @@ static __u64 power_pmu_bhrb_to(u64 addr)
 	return target - (unsigned long)&instr + addr;
 }
 
+void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred)
+{
+	cpuhw->bhrb_entries[u_index].from = from;
+	cpuhw->bhrb_entries[u_index].to = to;
+	cpuhw->bhrb_entries[u_index].mispred = pred;
+	cpuhw->bhrb_entries[u_index].predicted = ~pred;
+	return;
+}
+
 /* Processing BHRB entries */
 void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 {
 	u64 val;
-	u64 addr;
+	u64 addr, tmp;
 	int r_index, u_index, pred;
 
 	r_index = 0;
@@ -415,62 +424,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
 		if (!val)
 			/* Terminal marker: End of valid BHRB entries */
 			break;
-		else {
-			addr = val & BHRB_EA;
-			pred = val & BHRB_PREDICTION;
 
-			if (!addr)
-				/* invalid entry */
-				continue;
+		addr = val & BHRB_EA;
+		pred = val & BHRB_PREDICTION;
 
-			/* Branches are read most recent first (ie. mfbhrb 0 is
-			 * the most recent branch).
-			 * There are two types of valid entries:
-			 * 1) a target entry which is the to address of a
-			 *    computed goto like a blr,bctr,btar.  The next
-			 *    entry read from the bhrb will be branch
-			 *    corresponding to this target (ie. the actual
-			 *    blr/bctr/btar instruction).
-			 * 2) a from address which is an actual branch.  If a
-			 *    target entry proceeds this, then this is the
-			 *    matching branch for that target.  If this is not
-			 *    following a target entry, then this is a branch
-			 *    where the target is given as an immediate field
-			 *    in the instruction (ie. an i or b form branch).
-			 *    In this case we need to read the instruction from
-			 *    memory to determine the target/to address.
+		if (!addr)
+			/* invalid entry */
+			continue;
+
+		/* Branches are read most recent first (ie. mfbhrb 0 is
+		 * the most recent branch).
+		 * There are two types of valid entries:
+		 * 1) a target entry which is the to address of a
+		 *    computed goto like a blr,bctr,btar.  The next
+		 *    entry read from the bhrb will be branch
+		 *    corresponding to this target (ie. the actual
+		 *    blr/bctr/btar instruction).
+		 * 2) a from address which is an actual branch.  If a
+		 *    target entry proceeds this, then this is the
+		 *    matching branch for that target.  If this is not
+		 *    following a target entry, then this is a branch
+		 *    where the target is given as an immediate field
+		 *    in the instruction (ie. an i or b form branch).
+		 *    In this case we need to read the instruction from
+		 *    memory to determine the target/to address.
+		 */
+		if (val & BHRB_TARGET) {
+			/* Target branches use two entries
+			 * (ie. computed gotos/XL form)
 			 */
+			tmp = addr;
 
+			/* Get from address in next entry */
+			val = read_bhrb(r_index++);
+			addr = val & BHRB_EA;
 			if (val & BHRB_TARGET) {
-				/* Target branches use two entries
-				 * (ie. computed gotos/XL form)
-				 */
-				cpuhw->bhrb_entries[u_index].to = addr;
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
-
-				/* Get from address in next entry */
-				val = read_bhrb(r_index++);
-				addr = val & BHRB_EA;
-				if (val & BHRB_TARGET) {
-					/* Shouldn't have two targets in a
-					   row.. Reset index and try again */
-					r_index--;
-					addr = 0;
-				}
-				cpuhw->bhrb_entries[u_index].from = addr;
-			} else {
-				/* Branches to immediate field 
-				   (ie I or B form) */
-				cpuhw->bhrb_entries[u_index].from = addr;
-				cpuhw->bhrb_entries[u_index].to =
-					power_pmu_bhrb_to(addr);
-				cpuhw->bhrb_entries[u_index].mispred = pred;
-				cpuhw->bhrb_entries[u_index].predicted = ~pred;
+				/* Shouldn't have two targets in a
+				   row.. Reset index and try again */
+				r_index--;
+				addr = 0;
 			}
-			u_index++;
-
+			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
+		} else {
+			/* Branches to immediate field 
+			   (ie I or B form) */
+			tmp = power_pmu_bhrb_to(addr);
+			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
 		}
+		u_index++;
 	}
 	cpuhw->bhrb_stack.nr = u_index;
 	return;

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
  2013-12-09  6:21   ` Michael Ellerman
@ 2013-12-10  6:09       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-10  6:09 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> On Wed, 2013-04-12 at 10:32:39 UTC, Anshuman Khandual wrote:
>> Generic powerpc branch instruction analysis support added in the code
>> patching library which will help the subsequent patch on SW based
>> filtering of branch records in perf. This patch also converts and
>> exports some of the existing local static functions through the header
>> file to be used else where.
>>
>> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
>> index a6f8c7a..8bab417 100644
>> --- a/arch/powerpc/include/asm/code-patching.h
>> +++ b/arch/powerpc/include/asm/code-patching.h
>> @@ -22,6 +22,36 @@
>>  #define BRANCH_SET_LINK	0x1
>>  #define BRANCH_ABSOLUTE	0x2
>>  
>> +#define XL_FORM_LR  0x4C000020
>> +#define XL_FORM_CTR 0x4C000420
>> +#define XL_FORM_TAR 0x4C000460
>> +
>> +#define BO_ALWAYS    0x02800000
>> +#define BO_CTR       0x02000000
>> +#define BO_CRBI_OFF  0x00800000
>> +#define BO_CRBI_ON   0x01800000
>> +#define BO_CRBI_HINT 0x00400000
>> +
>> +/* Forms of branch instruction */
>> +int instr_is_branch_iform(unsigned int instr);
>> +int instr_is_branch_bform(unsigned int instr);
>> +int instr_is_branch_xlform(unsigned int instr);
>> +
>> +/* Classification of XL-form instruction */
>> +int is_xlform_lr(unsigned int instr);
>> +int is_xlform_ctr(unsigned int instr);
>> +int is_xlform_tar(unsigned int instr);
>> +
>> +/* Branch instruction is a call */
>> +int is_branch_link_set(unsigned int instr);
>> +
>> +/* BO field analysis (B-form or XL-form) */
>> +int is_bo_always(unsigned int instr);
>> +int is_bo_ctr(unsigned int instr);
>> +int is_bo_crbi_off(unsigned int instr);
>> +int is_bo_crbi_on(unsigned int instr);
>> +int is_bo_crbi_hint(unsigned int instr);
> 
> 
> I think this is the wrong API.
> 
> We end up with all these micro checks, which don't actually encapsulate much,
> and don't implement the logic perf needs. If we had another user for this level
> of detail then it might make sense, but for a single user I think we're better
> off just implementing the semantics it wants.
> 

Having a comprehensive list of branch instruction analysis APIs which some other
user can also use in the future does not make it wrong. Being more elaborate and
detailed makes this one a better choice than the API you have suggested below.

> So that would be something more like:
> 
> bool instr_is_return_branch(unsigned int instr);
> bool instr_is_conditional_branch(unsigned int instr);
> bool instr_is_func_call(unsigned int instr);
> bool instr_is_indirect_func_call(unsigned int instr);
> 
> 
> These would then encapsulate something like the logic in your 8/10 patch. You
> can hopefully also optimise the checking logic in each routine because you know
> the exact semantics you're implementing.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
@ 2013-12-10  6:09       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-10  6:09 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> On Wed, 2013-04-12 at 10:32:39 UTC, Anshuman Khandual wrote:
>> Generic powerpc branch instruction analysis support added in the code
>> patching library which will help the subsequent patch on SW based
>> filtering of branch records in perf. This patch also converts and
>> exports some of the existing local static functions through the header
>> file to be used else where.
>>
>> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
>> index a6f8c7a..8bab417 100644
>> --- a/arch/powerpc/include/asm/code-patching.h
>> +++ b/arch/powerpc/include/asm/code-patching.h
>> @@ -22,6 +22,36 @@
>>  #define BRANCH_SET_LINK	0x1
>>  #define BRANCH_ABSOLUTE	0x2
>>  
>> +#define XL_FORM_LR  0x4C000020
>> +#define XL_FORM_CTR 0x4C000420
>> +#define XL_FORM_TAR 0x4C000460
>> +
>> +#define BO_ALWAYS    0x02800000
>> +#define BO_CTR       0x02000000
>> +#define BO_CRBI_OFF  0x00800000
>> +#define BO_CRBI_ON   0x01800000
>> +#define BO_CRBI_HINT 0x00400000
>> +
>> +/* Forms of branch instruction */
>> +int instr_is_branch_iform(unsigned int instr);
>> +int instr_is_branch_bform(unsigned int instr);
>> +int instr_is_branch_xlform(unsigned int instr);
>> +
>> +/* Classification of XL-form instruction */
>> +int is_xlform_lr(unsigned int instr);
>> +int is_xlform_ctr(unsigned int instr);
>> +int is_xlform_tar(unsigned int instr);
>> +
>> +/* Branch instruction is a call */
>> +int is_branch_link_set(unsigned int instr);
>> +
>> +/* BO field analysis (B-form or XL-form) */
>> +int is_bo_always(unsigned int instr);
>> +int is_bo_ctr(unsigned int instr);
>> +int is_bo_crbi_off(unsigned int instr);
>> +int is_bo_crbi_on(unsigned int instr);
>> +int is_bo_crbi_hint(unsigned int instr);
> 
> 
> I think this is the wrong API.
> 
> We end up with all these micro checks, which don't actually encapsulate much,
> and don't implement the logic perf needs. If we had another user for this level
> of detail then it might make sense, but for a single user I think we're better
> off just implementing the semantics it wants.
> 

Having a comprehensive list of branch instruction analysis APIs which some other
user can also use in the future does not make it wrong. Being more elaborate and
detailed makes this one a better choice than the API you have suggested below.

> So that would be something more like:
> 
> bool instr_is_return_branch(unsigned int instr);
> bool instr_is_conditional_branch(unsigned int instr);
> bool instr_is_func_call(unsigned int instr);
> bool instr_is_indirect_func_call(unsigned int instr);
> 
> 
> These would then encapsulate something like the logic in your 8/10 patch. You
> can hopefully also optimise the checking logic in each routine because you know
> the exact semantics you're implementing.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-10  5:57       ` Anshuman Khandual
  (?)
@ 2013-12-12  8:45       ` Anshuman Khandual
  2013-12-13  2:47         ` Michael Ellerman
  -1 siblings, 1 reply; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-12  8:45 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/10/2013 11:27 AM, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>> This code was already in need of some unindentation, and now it's just
>> ridiculous.
>>
>> To start with at the beginning of this routine we have:
>>
>> while (..) {
>> 	if (!val)
>> 		break;
>> 	else {
>> 		// Bulk of the logic
>> 		...
>> 	}
>> }
>>
>> That should almost always become:
>>
>> while (..) {
>> 	if (!val)
>> 		break;
>>
>> 	// Bulk of the logic
>> 	...
>> }
>>
>>
>> But in this case that's not enough. Please send a precursor patch which moves
>> this logic out into a helper function.
> 
> Hey Michael,
> 
> I believe this patch should be able to take care of this.
> 
> commit d66d729715cabe0cfd8e34861a6afa8ad639ddf3
> Author: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> Date:   Tue Dec 10 11:10:06 2013 +0530
> 
>     power, perf: Clean up BHRB processing
>     
>     This patch cleans up some indentation problem and re-organizes the
>     BHRB processing code with an additional helper function.
>     
>     Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 29b89e8..9ae96c5 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -400,11 +400,20 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>  	return target - (unsigned long)&instr + addr;
>  }
> 
> +void update_branch_entry(struct cpu_hw_events *cpuhw, int u_index, u64 from, u64 to, int pred)
> +{
> +	cpuhw->bhrb_entries[u_index].from = from;
> +	cpuhw->bhrb_entries[u_index].to = to;
> +	cpuhw->bhrb_entries[u_index].mispred = pred;
> +	cpuhw->bhrb_entries[u_index].predicted = ~pred;
> +	return;
> +}
> +
>  /* Processing BHRB entries */
>  void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>  {
>  	u64 val;
> -	u64 addr;
> +	u64 addr, tmp;
>  	int r_index, u_index, pred;
> 
>  	r_index = 0;
> @@ -415,62 +424,54 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>  		if (!val)
>  			/* Terminal marker: End of valid BHRB entries */
>  			break;
> -		else {
> -			addr = val & BHRB_EA;
> -			pred = val & BHRB_PREDICTION;
> 
> -			if (!addr)
> -				/* invalid entry */
> -				continue;
> +		addr = val & BHRB_EA;
> +		pred = val & BHRB_PREDICTION;
> 
> -			/* Branches are read most recent first (ie. mfbhrb 0 is
> -			 * the most recent branch).
> -			 * There are two types of valid entries:
> -			 * 1) a target entry which is the to address of a
> -			 *    computed goto like a blr,bctr,btar.  The next
> -			 *    entry read from the bhrb will be branch
> -			 *    corresponding to this target (ie. the actual
> -			 *    blr/bctr/btar instruction).
> -			 * 2) a from address which is an actual branch.  If a
> -			 *    target entry proceeds this, then this is the
> -			 *    matching branch for that target.  If this is not
> -			 *    following a target entry, then this is a branch
> -			 *    where the target is given as an immediate field
> -			 *    in the instruction (ie. an i or b form branch).
> -			 *    In this case we need to read the instruction from
> -			 *    memory to determine the target/to address.
> +		if (!addr)
> +			/* invalid entry */
> +			continue;
> +
> +		/* Branches are read most recent first (ie. mfbhrb 0 is
> +		 * the most recent branch).
> +		 * There are two types of valid entries:
> +		 * 1) a target entry which is the to address of a
> +		 *    computed goto like a blr,bctr,btar.  The next
> +		 *    entry read from the bhrb will be branch
> +		 *    corresponding to this target (ie. the actual
> +		 *    blr/bctr/btar instruction).
> +		 * 2) a from address which is an actual branch.  If a
> +		 *    target entry proceeds this, then this is the
> +		 *    matching branch for that target.  If this is not
> +		 *    following a target entry, then this is a branch
> +		 *    where the target is given as an immediate field
> +		 *    in the instruction (ie. an i or b form branch).
> +		 *    In this case we need to read the instruction from
> +		 *    memory to determine the target/to address.
> +		 */
> +		if (val & BHRB_TARGET) {
> +			/* Target branches use two entries
> +			 * (ie. computed gotos/XL form)
>  			 */
> +			tmp = addr;
> 
> +			/* Get from address in next entry */
> +			val = read_bhrb(r_index++);
> +			addr = val & BHRB_EA;
>  			if (val & BHRB_TARGET) {
> -				/* Target branches use two entries
> -				 * (ie. computed gotos/XL form)
> -				 */
> -				cpuhw->bhrb_entries[u_index].to = addr;
> -				cpuhw->bhrb_entries[u_index].mispred = pred;
> -				cpuhw->bhrb_entries[u_index].predicted = ~pred;
> -
> -				/* Get from address in next entry */
> -				val = read_bhrb(r_index++);
> -				addr = val & BHRB_EA;
> -				if (val & BHRB_TARGET) {
> -					/* Shouldn't have two targets in a
> -					   row.. Reset index and try again */
> -					r_index--;
> -					addr = 0;
> -				}
> -				cpuhw->bhrb_entries[u_index].from = addr;
> -			} else {
> -				/* Branches to immediate field 
> -				   (ie I or B form) */
> -				cpuhw->bhrb_entries[u_index].from = addr;
> -				cpuhw->bhrb_entries[u_index].to =
> -					power_pmu_bhrb_to(addr);
> -				cpuhw->bhrb_entries[u_index].mispred = pred;
> -				cpuhw->bhrb_entries[u_index].predicted = ~pred;
> +				/* Shouldn't have two targets in a
> +				   row.. Reset index and try again */
> +				r_index--;
> +				addr = 0;
>  			}
> -			u_index++;
> -
> +			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
> +		} else {
> +			/* Branches to immediate field 
> +			   (ie I or B form) */
> +			tmp = power_pmu_bhrb_to(addr);
> +			update_branch_entry(cpuhw, u_index, addr, tmp, pred);
>  		}
> +		u_index++;
>  	}
>  	cpuhw->bhrb_stack.nr = u_index;
>  	return;

Hey Michael,

Does the patch looks okay ? In which case will send it out separately. Do let
me know. Thank you.

Regards
Anshuman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-12  8:45       ` Anshuman Khandual
@ 2013-12-13  2:47         ` Michael Ellerman
  0 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-13  2:47 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On Thu, 2013-12-12 at 14:15 +0530, Anshuman Khandual wrote:
> On 12/10/2013 11:27 AM, Anshuman Khandual wrote:
> > On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> >> This code was already in need of some unindentation, and now it's just
> >> ridiculous.
> >>
> >> To start with at the beginning of this routine we have:
> >>
> >> while (..) {
> >> 	if (!val)
> >> 		break;
> >> 	else {
> >> 		// Bulk of the logic
> >> 		...
> >> 	}
> >> }
> >>
> >> That should almost always become:
> >>
> >> while (..) {
> >> 	if (!val)
> >> 		break;
> >>
> >> 	// Bulk of the logic
> >> 	...
> >> }
> >>
> >>
> >> But in this case that's not enough. Please send a precursor patch which moves
> >> this logic out into a helper function.
> > 
> > Hey Michael,
> > 
> > I believe this patch should be able to take care of this.

...
 
> Does the patch looks okay ? In which case will send it out separately. Do let
> me know. Thank you.

It's OK.

Don't send it out separately, make it the first patch in your series.

cheers



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
  2013-12-09  6:21   ` Michael Ellerman
@ 2013-12-13  8:20       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-13  8:20 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> 
> As I said in my comments on version 3 which you ignored:
> 
>     I think it would be clearer if we actually checked for the possibilities we
>     allow and let everything else fall through, eg:
> 
> Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
> Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
> 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
> Â 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
> Â Â Â Â Â Â Â Â 
> Â Â Â Â Â Â Â Â return -1;
> 

Hey Michael,

This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
over all code flow does not clearly suggest that all combinations of any of
these HW filters are invalid, then we can go with one more patch to clean
that up before or after this patch but not here in this patch. Finally the
code section here will look something like this. Does it sound good ?

static u64 power8_bhrb_filter_map(u64 branch_sample_type)
{
        u64 pmu_bhrb_filter = 0;

        /* BHRB and regular PMU events share the same privilege state
         * filter configuration. BHRB is always recorded along with a
         * regular PMU event. As the privilege state filter is handled
         * in the basic PMC configuration of the accompanying regular
         * PMU event, we ignore any separate BHRB specific request.
         */

        /* Ignore user, kernel, hv bits */
        branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;

        if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
                return pmu_bhrb_filter;


        if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
                pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
                return pmu_bhrb_filter;
        }

        if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
                pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
                return pmu_bhrb_filter;
        }

        /* Every thing else is unsupported */
        return -1;
}


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
@ 2013-12-13  8:20       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-13  8:20 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> 
> As I said in my comments on version 3 which you ignored:
> 
>     I think it would be clearer if we actually checked for the possibilities we
>     allow and let everything else fall through, eg:
> 
> Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
> Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
> 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
> Â 
> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
> Â Â Â Â Â Â Â Â 
> Â Â Â Â Â Â Â Â return -1;
> 

Hey Michael,

This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
over all code flow does not clearly suggest that all combinations of any of
these HW filters are invalid, then we can go with one more patch to clean
that up before or after this patch but not here in this patch. Finally the
code section here will look something like this. Does it sound good ?

static u64 power8_bhrb_filter_map(u64 branch_sample_type)
{
        u64 pmu_bhrb_filter = 0;

        /* BHRB and regular PMU events share the same privilege state
         * filter configuration. BHRB is always recorded along with a
         * regular PMU event. As the privilege state filter is handled
         * in the basic PMC configuration of the accompanying regular
         * PMU event, we ignore any separate BHRB specific request.
         */

        /* Ignore user, kernel, hv bits */
        branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;

        if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
                return pmu_bhrb_filter;


        if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
                pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
                return pmu_bhrb_filter;
        }

        if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
                pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
                return pmu_bhrb_filter;
        }

        /* Every thing else is unsupported */
        return -1;
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
  2013-12-13  8:20       ` Anshuman Khandual
@ 2013-12-18  0:08         ` Michael Ellerman
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-18  0:08 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On Fri, 2013-12-13 at 13:50 +0530, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> > 
> > As I said in my comments on version 3 which you ignored:
> > 
> >     I think it would be clearer if we actually checked for the possibilities we
> >     allow and let everything else fall through, eg:
> > 
> > Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
> > Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> > 
> > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
> > 
> > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
> > Â 
> > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
> > Â Â Â Â Â Â Â Â 
> > Â Â Â Â Â Â Â Â return -1;
> > 
> 
> Hey Michael,
> 
> This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
> over all code flow does not clearly suggest that all combinations of any of
> these HW filters are invalid, then we can go with one more patch to clean
> that up before or after this patch but not here in this patch. Finally the
> code section here will look something like this. Does it sound good ?

Better, but not quite.

> static u64 power8_bhrb_filter_map(u64 branch_sample_type)
> {
>         u64 pmu_bhrb_filter = 0;
> 
>         /* BHRB and regular PMU events share the same privilege state
>          * filter configuration. BHRB is always recorded along with a
>          * regular PMU event. As the privilege state filter is handled
>          * in the basic PMC configuration of the accompanying regular
>          * PMU event, we ignore any separate BHRB specific request.
>          */
> 
>         /* Ignore user, kernel, hv bits */
>         branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> 
>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>                 return pmu_bhrb_filter;

return 0;

> 
> 
>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>                 return pmu_bhrb_filter;

return POWER8_MMCRA_IFM1;

>         }
> 
>         if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
>                 return pmu_bhrb_filter;

return POWER8_MMCRA_IFM3;

>         }
> 
>         /* Every thing else is unsupported */
>         return -1;
> }

cheers



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
@ 2013-12-18  0:08         ` Michael Ellerman
  0 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-18  0:08 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On Fri, 2013-12-13 at 13:50 +0530, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> > 
> > As I said in my comments on version 3 which you ignored:
> > 
> >     I think it would be clearer if we actually checked for the possibilities we
> >     allow and let everything else fall through, eg:
> > 
> > Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
> > Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> > 
> > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
> > 
> > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
> > Â 
> > Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
> > Â Â Â Â Â Â Â Â 
> > Â Â Â Â Â Â Â Â return -1;
> > 
> 
> Hey Michael,
> 
> This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
> over all code flow does not clearly suggest that all combinations of any of
> these HW filters are invalid, then we can go with one more patch to clean
> that up before or after this patch but not here in this patch. Finally the
> code section here will look something like this. Does it sound good ?

Better, but not quite.

> static u64 power8_bhrb_filter_map(u64 branch_sample_type)
> {
>         u64 pmu_bhrb_filter = 0;
> 
>         /* BHRB and regular PMU events share the same privilege state
>          * filter configuration. BHRB is always recorded along with a
>          * regular PMU event. As the privilege state filter is handled
>          * in the basic PMC configuration of the accompanying regular
>          * PMU event, we ignore any separate BHRB specific request.
>          */
> 
>         /* Ignore user, kernel, hv bits */
>         branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
> 
>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>                 return pmu_bhrb_filter;

return 0;

> 
> 
>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>                 return pmu_bhrb_filter;

return POWER8_MMCRA_IFM1;

>         }
> 
>         if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
>                 return pmu_bhrb_filter;

return POWER8_MMCRA_IFM3;

>         }
> 
>         /* Every thing else is unsupported */
>         return -1;
> }

cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
  2013-12-18  0:08         ` Michael Ellerman
@ 2013-12-18  3:55           ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-18  3:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/18/2013 05:38 AM, Michael Ellerman wrote:
> On Fri, 2013-12-13 at 13:50 +0530, Anshuman Khandual wrote:
>> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>>>
>>> As I said in my comments on version 3 which you ignored:
>>>
>>>     I think it would be clearer if we actually checked for the possibilities we
>>>     allow and let everything else fall through, eg:
>>>
>>> Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
>>> Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
>>>
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
>>>
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
>>> Â 
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
>>> Â Â Â Â Â Â Â Â 
>>> Â Â Â Â Â Â Â Â return -1;
>>>
>>
>> Hey Michael,
>>
>> This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
>> over all code flow does not clearly suggest that all combinations of any of
>> these HW filters are invalid, then we can go with one more patch to clean
>> that up before or after this patch but not here in this patch. Finally the
>> code section here will look something like this. Does it sound good ?
> 
> Better, but not quite.
> 
>> static u64 power8_bhrb_filter_map(u64 branch_sample_type)
>> {
>>         u64 pmu_bhrb_filter = 0;
>>
>>         /* BHRB and regular PMU events share the same privilege state
>>          * filter configuration. BHRB is always recorded along with a
>>          * regular PMU event. As the privilege state filter is handled
>>          * in the basic PMC configuration of the accompanying regular
>>          * PMU event, we ignore any separate BHRB specific request.
>>          */
>>
>>         /* Ignore user, kernel, hv bits */
>>         branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>>                 return pmu_bhrb_filter;
> 
> return 0;
> 
>>
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
>>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>>                 return pmu_bhrb_filter;
> 
> return POWER8_MMCRA_IFM1;
> 
>>         }
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
>>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
>>                 return pmu_bhrb_filter;
> 
> return POWER8_MMCRA_IFM3;
> 
>>         }
>>
>>         /* Every thing else is unsupported */
>>         return -1;
>> }
> 

Okay, will take these changes into another patch before adding conditional branch
filter here.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
@ 2013-12-18  3:55           ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-18  3:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/18/2013 05:38 AM, Michael Ellerman wrote:
> On Fri, 2013-12-13 at 13:50 +0530, Anshuman Khandual wrote:
>> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>>>
>>> As I said in my comments on version 3 which you ignored:
>>>
>>>     I think it would be clearer if we actually checked for the possibilities we
>>>     allow and let everything else fall through, eg:
>>>
>>> Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
>>> Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
>>>
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
>>>
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
>>> Â 
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
>>> Â Â Â Â Â Â Â Â 
>>> Â Â Â Â Â Â Â Â return -1;
>>>
>>
>> Hey Michael,
>>
>> This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
>> over all code flow does not clearly suggest that all combinations of any of
>> these HW filters are invalid, then we can go with one more patch to clean
>> that up before or after this patch but not here in this patch. Finally the
>> code section here will look something like this. Does it sound good ?
> 
> Better, but not quite.
> 
>> static u64 power8_bhrb_filter_map(u64 branch_sample_type)
>> {
>>         u64 pmu_bhrb_filter = 0;
>>
>>         /* BHRB and regular PMU events share the same privilege state
>>          * filter configuration. BHRB is always recorded along with a
>>          * regular PMU event. As the privilege state filter is handled
>>          * in the basic PMC configuration of the accompanying regular
>>          * PMU event, we ignore any separate BHRB specific request.
>>          */
>>
>>         /* Ignore user, kernel, hv bits */
>>         branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>>                 return pmu_bhrb_filter;
> 
> return 0;
> 
>>
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
>>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>>                 return pmu_bhrb_filter;
> 
> return POWER8_MMCRA_IFM1;
> 
>>         }
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
>>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
>>                 return pmu_bhrb_filter;
> 
> return POWER8_MMCRA_IFM3;
> 
>>         }
>>
>>         /* Every thing else is unsupported */
>>         return -1;
>> }
> 

Okay, will take these changes into another patch before adding conditional branch
filter here.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions
  2013-12-10  6:09       ` Anshuman Khandual
  (?)
@ 2013-12-20 10:06       ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-20 10:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/10/2013 11:39 AM, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>> On Wed, 2013-04-12 at 10:32:39 UTC, Anshuman Khandual wrote:
>>> Generic powerpc branch instruction analysis support added in the code
>>> patching library which will help the subsequent patch on SW based
>>> filtering of branch records in perf. This patch also converts and
>>> exports some of the existing local static functions through the header
>>> file to be used else where.
>>>
>>> diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
>>> index a6f8c7a..8bab417 100644
>>> --- a/arch/powerpc/include/asm/code-patching.h
>>> +++ b/arch/powerpc/include/asm/code-patching.h
>>> @@ -22,6 +22,36 @@
>>>  #define BRANCH_SET_LINK	0x1
>>>  #define BRANCH_ABSOLUTE	0x2
>>>  
>>> +#define XL_FORM_LR  0x4C000020
>>> +#define XL_FORM_CTR 0x4C000420
>>> +#define XL_FORM_TAR 0x4C000460
>>> +
>>> +#define BO_ALWAYS    0x02800000
>>> +#define BO_CTR       0x02000000
>>> +#define BO_CRBI_OFF  0x00800000
>>> +#define BO_CRBI_ON   0x01800000
>>> +#define BO_CRBI_HINT 0x00400000
>>> +
>>> +/* Forms of branch instruction */
>>> +int instr_is_branch_iform(unsigned int instr);
>>> +int instr_is_branch_bform(unsigned int instr);
>>> +int instr_is_branch_xlform(unsigned int instr);
>>> +
>>> +/* Classification of XL-form instruction */
>>> +int is_xlform_lr(unsigned int instr);
>>> +int is_xlform_ctr(unsigned int instr);
>>> +int is_xlform_tar(unsigned int instr);
>>> +
>>> +/* Branch instruction is a call */
>>> +int is_branch_link_set(unsigned int instr);
>>> +
>>> +/* BO field analysis (B-form or XL-form) */
>>> +int is_bo_always(unsigned int instr);
>>> +int is_bo_ctr(unsigned int instr);
>>> +int is_bo_crbi_off(unsigned int instr);
>>> +int is_bo_crbi_on(unsigned int instr);
>>> +int is_bo_crbi_hint(unsigned int instr);
>>
>>
>> I think this is the wrong API.
>>
>> We end up with all these micro checks, which don't actually encapsulate much,
>> and don't implement the logic perf needs. If we had another user for this level
>> of detail then it might make sense, but for a single user I think we're better
>> off just implementing the semantics it wants.
>>
> 
> Having a comprehensive list of branch instruction analysis APIs which some other
> user can also use in the future does not make it wrong. Being more elaborate and
> detailed makes this one a better choice than the API you have suggested below.
> 
>> So that would be something more like:
>>
>> bool instr_is_return_branch(unsigned int instr);
>> bool instr_is_conditional_branch(unsigned int instr);
>> bool instr_is_func_call(unsigned int instr);
>> bool instr_is_indirect_func_call(unsigned int instr);
>>
>>
>> These would then encapsulate something like the logic in your 8/10 patch. You
>> can hopefully also optimise the checking logic in each routine because you know
>> the exact semantics you're implementing.

Any ways, here is the patch which is will supersede the present patch for adding
required library functions. Hope this works.

commit 9d9f11a6b778b51732aaa0e7c9dea4be3385df56
Author: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Date:   Fri Dec 20 13:46:15 2013 +0530

    powerpc, lib: Add new branch analysis support functions
    
    Generic powerpc branch analysis support added in the code patching
    library which will help the subsequent patch on SW based filtering
    of branch records in perf.
    
    Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..15700b5 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -22,6 +22,16 @@
 #define BRANCH_SET_LINK	0x1
 #define BRANCH_ABSOLUTE	0x2
 
+#define XL_FORM_LR  0x4C000020
+#define XL_FORM_CTR 0x4C000420
+#define XL_FORM_TAR 0x4C000460
+
+#define BO_ALWAYS    0x02800000
+#define BO_CTR       0x02000000
+#define BO_CRBI_OFF  0x00800000
+#define BO_CRBI_ON   0x01800000
+#define BO_CRBI_HINT 0x00400000
+
 unsigned int create_branch(const unsigned int *addr,
 			   unsigned long target, int flags);
 unsigned int create_cond_branch(const unsigned int *addr,
@@ -49,4 +59,10 @@ static inline unsigned long ppc_function_entry(void *func)
 #endif
 }
 
+/* Perf branch filters */
+bool instr_is_return_branch(unsigned int instr);
+bool instr_is_conditional_branch(unsigned int instr);
+bool instr_is_func_call(unsigned int instr);
+bool instr_is_indirect_func_call(unsigned int instr);
+
 #endif /* _ASM_POWERPC_CODE_PATCHING_H */
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..ad39c58 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -77,6 +77,7 @@ static unsigned int branch_opcode(unsigned int instr)
 	return (instr >> 26) & 0x3F;
 }
 
+/* Forms of branch instruction */
 static int instr_is_branch_iform(unsigned int instr)
 {
 	return branch_opcode(instr) == 18;
@@ -87,6 +88,140 @@ static int instr_is_branch_bform(unsigned int instr)
 	return branch_opcode(instr) == 16;
 }
 
+static int instr_is_branch_xlform(unsigned int instr)
+{
+	return branch_opcode(instr) == 19;
+}
+
+/* Classification of XL-form instruction */
+static int is_xlform_lr(unsigned int instr)
+{
+	return (instr & XL_FORM_LR) == XL_FORM_LR;
+}
+
+static int is_xlform_ctr(unsigned int instr)
+{
+	return (instr & XL_FORM_CTR) == XL_FORM_CTR;
+}
+
+static int is_xlform_tar(unsigned int instr)
+{
+	return (instr & XL_FORM_TAR) == XL_FORM_TAR;
+}
+
+/* BO field analysis (B-form or XL-form) */
+static int is_bo_always(unsigned int instr)
+{
+	return (instr & BO_ALWAYS) == BO_ALWAYS;
+}
+
+static int is_bo_ctr(unsigned int instr)
+{
+        return (instr & BO_CTR) == BO_CTR;
+}
+
+static int is_bo_crbi_off(unsigned int instr)
+{
+	return (instr & BO_CRBI_OFF) == BO_CRBI_OFF;
+}
+
+static int is_bo_crbi_on(unsigned int instr)
+{
+	return (instr & BO_CRBI_ON) == BO_CRBI_ON;
+}
+
+static int is_bo_crbi_hint(unsigned int instr)
+{
+	return (instr & BO_CRBI_HINT) == BO_CRBI_HINT;
+}
+
+/* Link bit is set */
+static int is_branch_link_set(unsigned int instr)
+{
+	return (instr & BRANCH_SET_LINK) == BRANCH_SET_LINK;
+}
+
+/* Perf branch filters */
+bool instr_is_return_branch(unsigned int instr)
+{
+	/*
+	 * Conditional and unconditional branch to LR register
+	 * without seting the link register.
+	 */
+	if (is_xlform_lr(instr) && !is_branch_link_set(instr))
+		return true;
+
+	return false;
+}
+
+bool instr_is_conditional_branch(unsigned int instr)
+{
+	/* I-form instruction - excluded */
+	if (instr_is_branch_iform(instr))
+		return false;
+
+	/* B-form or XL-form instruction */
+	if (instr_is_branch_bform(instr) || instr_is_branch_xlform(instr))  {
+
+		/* Not branch always */
+		if (!is_bo_always(instr)) {
+
+			/* Conditional branch to CTR register */
+			if (is_bo_ctr(instr))
+				return false;
+
+			/* CR[BI] conditional branch with static hint */
+			if (is_bo_crbi_off(instr) || is_bo_crbi_on(instr)) {
+        			if (is_bo_crbi_hint(instr))
+                			 return false;;
+			}
+			return true;
+		}
+	}
+	return false;
+}
+
+bool instr_is_func_call(unsigned int instr)
+{
+	/* LR should be set */
+	if (is_branch_link_set(instr))
+		return true;
+
+	return false;
+}
+
+bool instr_is_indirect_func_call(unsigned int instr)
+{
+	/* XL-form instruction */
+	if (instr_is_branch_xlform(instr)) {
+
+		/* LR should be set */
+		if (is_branch_link_set(instr)) {
+			/*
+			 * Conditional and unconditional
+			 * branch to CTR register.
+			 */
+			 if (is_xlform_ctr(instr))
+				return true;
+
+			/*
+			 * Conditional and unconditional
+			 * branch to LR register.
+			 */
+			if (is_xlform_lr(instr))
+				return true;
+
+			/*
+			 * Conditional and unconditional
+			 * branch to TAR register.
+			 */
+			if (is_xlform_tar(instr))
+				return true;
+		}
+	}
+	return false;
+}
+
 int instr_is_relative_branch(unsigned int instr)
 {
 	if (instr & BRANCH_ABSOLUTE)




^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-09  6:21   ` Michael Ellerman
@ 2013-12-20 11:01       ` Anshuman Khandual
  2013-12-20 11:01       ` Anshuman Khandual
  1 sibling, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-20 11:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
>> This patch enables SW based post processing of BHRB captured branches
>> to be able to meet more user defined branch filtration criteria in perf
>> branch stack sampling framework. These changes increase the number of
>> branch filters and their valid combinations on any powerpc64 server
>> platform with BHRB support. Find the summary of code changes here.
>>
>> (1) struct cpu_hw_events
>>
>> 	Introduced two new variables track various filter values and mask
>>
>> 	(a) bhrb_sw_filter	Tracks SW implemented branch filter flags
>> 	(b) filter_mask		Tracks both (SW and HW) branch filter flags
> 
> The name 'filter_mask' doesn't mean much to me. I'd rather it was 'bhrb_filter'.

Done.

> 
> 
>> (2) Event creation
>>
>> 	Kernel will figure out supported BHRB branch filters through a PMU call
>> 	back 'bhrb_filter_map'. This function will find out how many of the
>> 	requested branch filters can be supported in the PMU HW. It will not
>> 	try to invalidate any branch filter combinations. Event creation will not
>> 	error out because of lack of HW based branch filters. Meanwhile it will
>> 	track the overall supported branch filters in the "filter_mask" variable.
>>
>> 	Once the PMU call back returns kernel will process the user branch filter
>> 	request against available SW filters while looking at the "filter_mask".
>> 	During this phase all the branch filters which are still pending from the
>> 	user requested list will have to be supported in SW failing which the
>> 	event creation will error out.
>>
>> (3) SW branch filter
>>
>> 	During the BHRB data capture inside the PMU interrupt context, each
>> 	of the captured 'perf_branch_entry.from' will be checked for compliance
>> 	with applicable SW branch filters. If the entry does not conform to the
>> 	filter requirements, it will be discarded from the final perf branch
>> 	stack buffer.
>>
>> (4) Supported SW based branch filters
>>
>> 	(a) PERF_SAMPLE_BRANCH_ANY_RETURN
>> 	(b) PERF_SAMPLE_BRANCH_IND_CALL
>> 	(c) PERF_SAMPLE_BRANCH_ANY_CALL
>> 	(d) PERF_SAMPLE_BRANCH_COND
>>
>> 	Please refer patch to understand the classification of instructions into
>> 	these branch filter categories.
>>
>> (5) Multiple branch filter semantics
>>
>> 	Book3 sever implementation follows the same OR semantics (as implemented in
>> 	x86) while dealing with multiple branch filters at any point of time. SW
>> 	branch filter analysis is carried on the data set captured in the PMU HW.
>> 	So the resulting set of data (after applying the SW filters) will inherently
>> 	be an AND with the HW captured set. Hence any combination of HW and SW branch
>> 	filters will be invalid. HW based branch filters are more efficient and faster
>> 	compared to SW implemented branch filters. So at first the PMU should decide
>> 	whether it can support all the requested branch filters itself or not. In case
>> 	it can support all the branch filters in an OR manner, we dont apply any SW
>> 	branch filter on top of the HW captured set (which is the final set). This
>> 	preserves the OR semantic of multiple branch filters as required. But in case
>> 	where the PMU cannot support all the requested branch filters in an OR manner,
>> 	it should not apply any it's filters and leave it upto the SW to handle them
>> 	all. Its the PMU code's responsibility to uphold this protocol to be able to
>> 	conform to the overall OR semantic of perf branch stack sampling framework.
> 
> 
> I'd prefer this level of commentary was in a block comment in the code. It's
> much more likely to be seen by a future hacker than here in the commit log.
> 

I felt it was pretty big to be inside the code blocks. Though I have improved in-code
documentation substantially in the next version. 
 
> 
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 2de7d48..54d39a5 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -48,6 +48,8 @@ struct cpu_hw_events {
>>  
>>  	/* BHRB bits */
>>  	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
>> +	u64				bhrb_sw_filter;	/* BHRB SW branch filter */
>> +	u64				filter_mask;	/* Branch filter mask */
>>  	int				bhrb_users;
>>  	void				*bhrb_context;
>>  	struct	perf_branch_stack	bhrb_stack;
>> @@ -400,6 +402,228 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>>  	return target - (unsigned long)&instr + addr;
>>  }
>>  
>> +/*
>> + * Instruction opcode analysis
>> + *
>> + * Analyse instruction opcodes and classify them
>> + * into various branch filter options available.
>> + * This follows the standard semantics of OR which
>> + * means that instructions which conforms to `any`
>> + * of the requested branch filters get picked up.
>> + */
>> +static bool validate_instruction(unsigned int *addr, u64 bhrb_sw_filter)
>> +{
> 
> "validate" is not a good name here. That implies that this routine identifies
> "valid" and "invalid" instructions - but that's not really correct.
> 

Done.

validate_instruction --> check_instruction
 
> Also it's preferable to not use the same variable name for the local as for the
> cpuhw->bhrb_sw_filter global. Although technically it doesn't shadow the global
> it can still be confusing to a human, ie. me. A good name for the local would
> just be "sw_filter" because we know in this code that we're dealing with the
> BHRB.
> 

Done.

local variable bhrb_sw_filter ---> sw_filter
 
> 
>> +	bool result = false;
>> +
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
>> +
>> +		/* XL-form instruction */
>> +		if (instr_is_branch_xlform(*addr)) {
>> +
>> +			/* LR should not be set */
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to LR register.
>> +			 	 */
>> +				if (is_xlform_lr(*addr))
>> +					result = true;
>> +			}
>> +		}
>> +	}
> 
> is_xform_lr() implies instr_is_branch_xlform(), and once you get a hit you can
> short-circuit and exit the function, so this should boil down to just:
> 
> 	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)
> 		if (is_xlform_lr(*addr) && !is_branch_link_set(*addr))
> 			return true;
> 

Done

> 
> Having said that I think it should move into a routine in code-patching as I
> said in the comments to the previous patch.
> 

Done

> 
>> +
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
>> +		/* XL-form instruction */
>> +		if (instr_is_branch_xlform(*addr)) {
>> +
>> +			/* LR should be set */
>> +			if (is_branch_link_set(*addr)) {
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to CTR.
>> +			 	 */
>> +				if (is_xlform_ctr(*addr))
>> +					result = true;
>> +
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to LR.
>> +			 	 */
>> +				if (is_xlform_lr(*addr))
>> +					result = true;
>> +
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to TAR.
>> +			 	 */
>> +				if (is_xlform_tar(*addr))
>> +					result = true;
> 
> What other kind of XL-Form branch is there?

I am not sure. Do you know of any ?

> 
>> +			}
>> +		}
>> +	}
> 
> The comments above all have a bogus leading space.
> 

Rectified.

>> +
>> +	/* Any-form branch */
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_CALL) {
>> +		/* LR should be set */
>> +		if (is_branch_link_set(*addr))
>> +			result = true;
> 
> Short circuit.
> 

Rectified.


>> +	}
>> +
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
>> +
>> +		/* I-form instruction - excluded */
>> +		if (instr_is_branch_iform(*addr))
>> +			goto out;
>> +
>> +		/* B-form or XL-form instruction */
>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
>> +
>> +			/* Not branch always  */
>> +			if (!is_bo_always(*addr)) {
>> +
>> +				/* Conditional branch to CTR register */
>> +				if (is_bo_ctr(*addr))
>> +					goto out;
> 
> We might have discussed this but why not?

Did not get that, discuss what ?

> 
>> +
>> +				/* CR[BI] conditional branch with static hint */
> 
> A conditional branch with a static hint is still a conditional branch?
>

No its not. 
 
>> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
>> +					if (is_bo_crbi_hint(*addr))
>> +						goto out;
>> +				}
>> +
>> +				result = true;
>> +			}
>> +		}
>> +	}
>> +out:
>> +	return result;
>> +}
>> +
>> +static bool check_instruction(u64 addr, u64 bhrb_sw_filter)
>> +{
> 
> 
> "check" is not a very descriptive name here, especially when "check" calls
> "validate".
> 
> "filter" is also not good because a filter keeps some things and rejects others,
> and the directionality is not clear.
> 
> I'd suggest "filter_selects_branch()" or just "keep_branch()".
> 

keep_branch() now calls check_instruction()

> 
>> +	unsigned int instr;
>> +	bool ret;
>> +
>> +	if (bhrb_sw_filter == 0)
>> +		return true;
>> +
>> +	if (is_kernel_addr(addr)) {
>> +		ret = validate_instruction((unsigned int *) addr, bhrb_sw_filter);
> 
> No reason not to return directly here.
> 
> That would then remove the need for an else block.

Done.

> 
>> +	} else {
>> +		/*
>> +		 * Userspace address needs to be
>> +		 * copied first before analysis.
>> +		 */
>> +		pagefault_disable();
>> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
> 
> I suspect you borrowed this incantation from the callchain code. Unlike that
> code you don't fallback to reading the page tables directly.
> 
> I'd rather see the accessor in the callchain code made generic and have you
> call it here.

You have mentioned to take care of this issue yourself.

> 
>> +
>> +		/*
>> +		 * If the instruction could not be accessible
>> +		 * from user space, we still 'okay' the entry.
>> +		 */
>> +		if (ret) {
>> +			pagefault_enable();
>> +			return true;
>> +		}
>> +		pagefault_enable();
>> +		ret = validate_instruction(&instr, bhrb_sw_filter);
> 
> No reason not to return directly here.
> 

Done.


>> +	}
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Validate whether all requested branch filters
>> + * are getting processed either in the PMU or in SW.
>> + */
>> +static int match_filters(u64 branch_sample_type, u64 filter_mask)
> 
> I don't really understand why we have this routine?
> 
> We should implement the filter in HW if we can, or in SW. Which filters can't we
> implement in SW?
>

As of now in POWER8, we implement all the filters either in HW or SW. But this framework
allows us to have a combined HW and SW branch filter implementation where PMU HW support
ORing of branch filters (which is not true for POWER8). This functions just runs a sanity
check to make sure that we got all branch filters covered either in HW or SW. BTW changed
name of the function from "match_filters" to all_filters_covered. 
 
>> +{
>> +	u64 x;
>> +
>> +	if (filter_mask == PERF_SAMPLE_BRANCH_ANY)
>> +		return true;
>> +
>> +	for_each_branch_sample_type(x) {
>> +		if (!(branch_sample_type & x))
>> +			continue;
>> +		/*
>> +		 * Privilege filter requests have been already
>> +		 * taken care during the base PMU configuration.
>> +		 */
>> +		if (x == PERF_SAMPLE_BRANCH_USER)
>> +			continue;
>> +		if (x == PERF_SAMPLE_BRANCH_KERNEL)
>> +			continue;
>> +		if (x == PERF_SAMPLE_BRANCH_HV)
>> +			continue;
>> +
>> +		/*
>> +		 * Requested filter not available either
>> +		 * in PMU or in SW.
>> +		 */
>> +		if (!(filter_mask & x))
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +/*
>> + * Required SW based branch filters
>> + *
>> + * This is called after figuring out what all branch filters the
>> + * PMU HW supports for the requested branch filter set. Here we
>> + * will go through all the SW implemented branch filters one by
>> + * one and pick them up if its not already supported in the PMU.
>> + */
>> +static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
>> +			     					u64 *filter_mask)
> 
> Whitespace is foobar here ^
> 

Will fix it.

> This function deals exclusively with the software filter IIUI, but the name
> doesn't indicate that in any way.

Correct, changed the name from branch_filter_map to bhrb_sw_filter_map which
will complement bhrb_filter_map used for figuring out PMU supported filters.

> 
> As far as the logic goes, you return the software filter value, as well as
> mutating the *filter_mask. And in all cases you make the same modification to
> both. That seems very dubious.
> 

yeah, thats right. Because we will use cpuhw->bhrb_sw_filter to apply SW filters on
branch records once they are captured from BHRB and cpuhw->bhrb_filter
(cpuhw->filter_mask before) to track the overall coverage of branch filters either
in HW or SW. While we modify bhrb_filter (filter mask before) inside this function,
it previous contains branch filters which is promised to be implemented by the PMU
for this session. 
 
> Shouldn't this routine just setup the software filter, and leave the upper
> level code to deal with the HW & SW filter values?
> 

bhrb_filter (filter_mask before) runs through two functions one after the other. First
one being PMU specific bhrb_filter_map to figure out available HW filters for the session
and then bhrb_sw_filter_map to figure out available SW filters for the session. There is
no high level code dealing with bhrb_filter mask.

>> +{
>> +	u64 branch_sw_filter = 0;
>> +
>> +	/* No branch filter requested */
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
>> +		WARN_ON(pmu_bhrb_filter != 0);
>> +		WARN_ON(*filter_mask != PERF_SAMPLE_BRANCH_ANY);
>> +		return branch_sw_filter;
>> +	}
>> +
>> +	/*
>> +	 * PMU supported branch filters must also be implemented in SW
>> +	 * in the event when the PMU is unable to process them for some
>> +	 * reason. This all those branch filters can be satisfied with
>> +	 * SW implemented filters. But right now, there is now way to
>> +	 * initimate the user about this decision.
> 
> Please proof read these comments, I don't entirely follow this one.
> 
> You say "must also be implemented in SW" - but I think it's actually "must be
> implemented in SW", ie. the HW is not "also" implementing the filter.
> 
> You say "in the event when" but I think you just mean "when" - the word "event"
> has a particular meaning in this code so you should only use it for that if at
> all possible.
> 
> I don't follow "This all those".
> 
> You should just drop the last sentence, there is never going to be any way to
> notify the user that their filter is implemented in HW vs SW, that's an
> implementation detail.
> 

Took care of these observations.

>> +	 */
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
>> +		}
>> +	}
>> +
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
>> +		}
>> +	}
>> +
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
>> +		}
>> +	}
>> +
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
>> +		}
>> +	}
>> +
>> +	return branch_sw_filter;
>> +}
>> +
>>  /* Processing BHRB entries */
>>  void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>>  {
>> @@ -459,17 +683,29 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>>  					addr = 0;
>>  				}
>>  				cpuhw->bhrb_entries[u_index].from = addr;
>> +
>> +				if (!check_instruction(cpuhw->
>> +						bhrb_entries[u_index].from,
>> +							cpuhw->bhrb_sw_filter))
>> +					u_index--;
>>  			} else {
>>  				/* Branches to immediate field 
>>  				   (ie I or B form) */
>>  				cpuhw->bhrb_entries[u_index].from = addr;
>> -				cpuhw->bhrb_entries[u_index].to =
>> -					power_pmu_bhrb_to(addr);
>> -				cpuhw->bhrb_entries[u_index].mispred = pred;
>> -				cpuhw->bhrb_entries[u_index].predicted = ~pred;
>> +				if (check_instruction(cpuhw->
>> +						bhrb_entries[u_index].from,
>> +						cpuhw->bhrb_sw_filter)) {
>> +					cpuhw->bhrb_entries[u_index].
>> +						to = power_pmu_bhrb_to(addr);
>> +					cpuhw->bhrb_entries[u_index].
>> +						mispred = pred;
>> +					cpuhw->bhrb_entries[u_index].
>> +						predicted = ~pred;
>> +				} else {
>> +					u_index--;
>> +				}
>>  			}
>>  			u_index++;
> 
> 
> This code was already in need of some unindentation, and now it's just
> ridiculous.
> 
> To start with at the beginning of this routine we have:
> 
> while (..) {
> 	if (!val)
> 		break;
> 	else {
> 		// Bulk of the logic
> 		...
> 	}
> }
> 
> That should almost always become:
> 
> while (..) {
> 	if (!val)
> 		break;
> 
> 	// Bulk of the logic
> 	...
> }
> 
> 
> But in this case that's not enough. Please send a precursor patch which moves
> this logic out into a helper function.
> 

Done

> 
>> -
>>  		}
>>  	}
>>  	cpuhw->bhrb_stack.nr = u_index;
>> @@ -1255,7 +1491,11 @@ nocheck:
>>  	if (has_branch_stack(event)) {
>>  		power_pmu_bhrb_enable(event);
>>  		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
>> -					event->attr.branch_sample_type);
>> +					event->attr.branch_sample_type,
>> +					&cpuhw->filter_mask);
>> +		cpuhw->bhrb_sw_filter = branch_filter_map
>> +					(event->attr.branch_sample_type,
>> +					cpuhw->bhrb_hw_filter, &cpuhw->filter_mask);
>>  	}
>>  
>>  	perf_pmu_enable(event->pmu);
>> @@ -1637,10 +1877,16 @@ static int power_pmu_event_init(struct perf_event *event)
>>  	err = power_check_constraints(cpuhw, events, cflags, n + 1);
>>  
>>  	if (has_branch_stack(event)) {
>> -		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
>> -					event->attr.branch_sample_type);
>> -
>> -		if(cpuhw->bhrb_hw_filter == -1)
>> +		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
>> +				(event->attr.branch_sample_type,
>> +				&cpuhw->filter_mask);
>> +		cpuhw->bhrb_sw_filter = branch_filter_map
>> +				(event->attr.branch_sample_type,
>> +				cpuhw->bhrb_hw_filter,
>> +				&cpuhw->filter_mask);
>> +
>> +		if(!match_filters(event->attr.branch_sample_type,
>> +						cpuhw->filter_mask))
>>  			return -EOPNOTSUPP;
> 
> The above two hunks look too similar for my liking.

Moved the SW filter check below the else block to make it common for both the type branches.
Wanted to save some cycles by not accessing the user space (power_pmu_bhrb_to) in case
we know that the "from" is not going to pass the SW branch filter check.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
@ 2013-12-20 11:01       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-20 11:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
>> This patch enables SW based post processing of BHRB captured branches
>> to be able to meet more user defined branch filtration criteria in perf
>> branch stack sampling framework. These changes increase the number of
>> branch filters and their valid combinations on any powerpc64 server
>> platform with BHRB support. Find the summary of code changes here.
>>
>> (1) struct cpu_hw_events
>>
>> 	Introduced two new variables track various filter values and mask
>>
>> 	(a) bhrb_sw_filter	Tracks SW implemented branch filter flags
>> 	(b) filter_mask		Tracks both (SW and HW) branch filter flags
> 
> The name 'filter_mask' doesn't mean much to me. I'd rather it was 'bhrb_filter'.

Done.

> 
> 
>> (2) Event creation
>>
>> 	Kernel will figure out supported BHRB branch filters through a PMU call
>> 	back 'bhrb_filter_map'. This function will find out how many of the
>> 	requested branch filters can be supported in the PMU HW. It will not
>> 	try to invalidate any branch filter combinations. Event creation will not
>> 	error out because of lack of HW based branch filters. Meanwhile it will
>> 	track the overall supported branch filters in the "filter_mask" variable.
>>
>> 	Once the PMU call back returns kernel will process the user branch filter
>> 	request against available SW filters while looking at the "filter_mask".
>> 	During this phase all the branch filters which are still pending from the
>> 	user requested list will have to be supported in SW failing which the
>> 	event creation will error out.
>>
>> (3) SW branch filter
>>
>> 	During the BHRB data capture inside the PMU interrupt context, each
>> 	of the captured 'perf_branch_entry.from' will be checked for compliance
>> 	with applicable SW branch filters. If the entry does not conform to the
>> 	filter requirements, it will be discarded from the final perf branch
>> 	stack buffer.
>>
>> (4) Supported SW based branch filters
>>
>> 	(a) PERF_SAMPLE_BRANCH_ANY_RETURN
>> 	(b) PERF_SAMPLE_BRANCH_IND_CALL
>> 	(c) PERF_SAMPLE_BRANCH_ANY_CALL
>> 	(d) PERF_SAMPLE_BRANCH_COND
>>
>> 	Please refer patch to understand the classification of instructions into
>> 	these branch filter categories.
>>
>> (5) Multiple branch filter semantics
>>
>> 	Book3 sever implementation follows the same OR semantics (as implemented in
>> 	x86) while dealing with multiple branch filters at any point of time. SW
>> 	branch filter analysis is carried on the data set captured in the PMU HW.
>> 	So the resulting set of data (after applying the SW filters) will inherently
>> 	be an AND with the HW captured set. Hence any combination of HW and SW branch
>> 	filters will be invalid. HW based branch filters are more efficient and faster
>> 	compared to SW implemented branch filters. So at first the PMU should decide
>> 	whether it can support all the requested branch filters itself or not. In case
>> 	it can support all the branch filters in an OR manner, we dont apply any SW
>> 	branch filter on top of the HW captured set (which is the final set). This
>> 	preserves the OR semantic of multiple branch filters as required. But in case
>> 	where the PMU cannot support all the requested branch filters in an OR manner,
>> 	it should not apply any it's filters and leave it upto the SW to handle them
>> 	all. Its the PMU code's responsibility to uphold this protocol to be able to
>> 	conform to the overall OR semantic of perf branch stack sampling framework.
> 
> 
> I'd prefer this level of commentary was in a block comment in the code. It's
> much more likely to be seen by a future hacker than here in the commit log.
> 

I felt it was pretty big to be inside the code blocks. Though I have improved in-code
documentation substantially in the next version. 
 
> 
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 2de7d48..54d39a5 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -48,6 +48,8 @@ struct cpu_hw_events {
>>  
>>  	/* BHRB bits */
>>  	u64				bhrb_hw_filter;	/* BHRB HW branch filter */
>> +	u64				bhrb_sw_filter;	/* BHRB SW branch filter */
>> +	u64				filter_mask;	/* Branch filter mask */
>>  	int				bhrb_users;
>>  	void				*bhrb_context;
>>  	struct	perf_branch_stack	bhrb_stack;
>> @@ -400,6 +402,228 @@ static __u64 power_pmu_bhrb_to(u64 addr)
>>  	return target - (unsigned long)&instr + addr;
>>  }
>>  
>> +/*
>> + * Instruction opcode analysis
>> + *
>> + * Analyse instruction opcodes and classify them
>> + * into various branch filter options available.
>> + * This follows the standard semantics of OR which
>> + * means that instructions which conforms to `any`
>> + * of the requested branch filters get picked up.
>> + */
>> +static bool validate_instruction(unsigned int *addr, u64 bhrb_sw_filter)
>> +{
> 
> "validate" is not a good name here. That implies that this routine identifies
> "valid" and "invalid" instructions - but that's not really correct.
> 

Done.

validate_instruction --> check_instruction
 
> Also it's preferable to not use the same variable name for the local as for the
> cpuhw->bhrb_sw_filter global. Although technically it doesn't shadow the global
> it can still be confusing to a human, ie. me. A good name for the local would
> just be "sw_filter" because we know in this code that we're dealing with the
> BHRB.
> 

Done.

local variable bhrb_sw_filter ---> sw_filter
 
> 
>> +	bool result = false;
>> +
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN) {
>> +
>> +		/* XL-form instruction */
>> +		if (instr_is_branch_xlform(*addr)) {
>> +
>> +			/* LR should not be set */
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to LR register.
>> +			 	 */
>> +				if (is_xlform_lr(*addr))
>> +					result = true;
>> +			}
>> +		}
>> +	}
> 
> is_xform_lr() implies instr_is_branch_xlform(), and once you get a hit you can
> short-circuit and exit the function, so this should boil down to just:
> 
> 	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)
> 		if (is_xlform_lr(*addr) && !is_branch_link_set(*addr))
> 			return true;
> 

Done

> 
> Having said that I think it should move into a routine in code-patching as I
> said in the comments to the previous patch.
> 

Done

> 
>> +
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
>> +		/* XL-form instruction */
>> +		if (instr_is_branch_xlform(*addr)) {
>> +
>> +			/* LR should be set */
>> +			if (is_branch_link_set(*addr)) {
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to CTR.
>> +			 	 */
>> +				if (is_xlform_ctr(*addr))
>> +					result = true;
>> +
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to LR.
>> +			 	 */
>> +				if (is_xlform_lr(*addr))
>> +					result = true;
>> +
>> +				/*
>> +			 	 * Conditional and unconditional
>> +			 	 * branch to TAR.
>> +			 	 */
>> +				if (is_xlform_tar(*addr))
>> +					result = true;
> 
> What other kind of XL-Form branch is there?

I am not sure. Do you know of any ?

> 
>> +			}
>> +		}
>> +	}
> 
> The comments above all have a bogus leading space.
> 

Rectified.

>> +
>> +	/* Any-form branch */
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_ANY_CALL) {
>> +		/* LR should be set */
>> +		if (is_branch_link_set(*addr))
>> +			result = true;
> 
> Short circuit.
> 

Rectified.


>> +	}
>> +
>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
>> +
>> +		/* I-form instruction - excluded */
>> +		if (instr_is_branch_iform(*addr))
>> +			goto out;
>> +
>> +		/* B-form or XL-form instruction */
>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
>> +
>> +			/* Not branch always  */
>> +			if (!is_bo_always(*addr)) {
>> +
>> +				/* Conditional branch to CTR register */
>> +				if (is_bo_ctr(*addr))
>> +					goto out;
> 
> We might have discussed this but why not?

Did not get that, discuss what ?

> 
>> +
>> +				/* CR[BI] conditional branch with static hint */
> 
> A conditional branch with a static hint is still a conditional branch?
>

No its not. 
 
>> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
>> +					if (is_bo_crbi_hint(*addr))
>> +						goto out;
>> +				}
>> +
>> +				result = true;
>> +			}
>> +		}
>> +	}
>> +out:
>> +	return result;
>> +}
>> +
>> +static bool check_instruction(u64 addr, u64 bhrb_sw_filter)
>> +{
> 
> 
> "check" is not a very descriptive name here, especially when "check" calls
> "validate".
> 
> "filter" is also not good because a filter keeps some things and rejects others,
> and the directionality is not clear.
> 
> I'd suggest "filter_selects_branch()" or just "keep_branch()".
> 

keep_branch() now calls check_instruction()

> 
>> +	unsigned int instr;
>> +	bool ret;
>> +
>> +	if (bhrb_sw_filter == 0)
>> +		return true;
>> +
>> +	if (is_kernel_addr(addr)) {
>> +		ret = validate_instruction((unsigned int *) addr, bhrb_sw_filter);
> 
> No reason not to return directly here.
> 
> That would then remove the need for an else block.

Done.

> 
>> +	} else {
>> +		/*
>> +		 * Userspace address needs to be
>> +		 * copied first before analysis.
>> +		 */
>> +		pagefault_disable();
>> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
> 
> I suspect you borrowed this incantation from the callchain code. Unlike that
> code you don't fallback to reading the page tables directly.
> 
> I'd rather see the accessor in the callchain code made generic and have you
> call it here.

You have mentioned to take care of this issue yourself.

> 
>> +
>> +		/*
>> +		 * If the instruction could not be accessible
>> +		 * from user space, we still 'okay' the entry.
>> +		 */
>> +		if (ret) {
>> +			pagefault_enable();
>> +			return true;
>> +		}
>> +		pagefault_enable();
>> +		ret = validate_instruction(&instr, bhrb_sw_filter);
> 
> No reason not to return directly here.
> 

Done.


>> +	}
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Validate whether all requested branch filters
>> + * are getting processed either in the PMU or in SW.
>> + */
>> +static int match_filters(u64 branch_sample_type, u64 filter_mask)
> 
> I don't really understand why we have this routine?
> 
> We should implement the filter in HW if we can, or in SW. Which filters can't we
> implement in SW?
>

As of now in POWER8, we implement all the filters either in HW or SW. But this framework
allows us to have a combined HW and SW branch filter implementation where PMU HW support
ORing of branch filters (which is not true for POWER8). This functions just runs a sanity
check to make sure that we got all branch filters covered either in HW or SW. BTW changed
name of the function from "match_filters" to all_filters_covered. 
 
>> +{
>> +	u64 x;
>> +
>> +	if (filter_mask == PERF_SAMPLE_BRANCH_ANY)
>> +		return true;
>> +
>> +	for_each_branch_sample_type(x) {
>> +		if (!(branch_sample_type & x))
>> +			continue;
>> +		/*
>> +		 * Privilege filter requests have been already
>> +		 * taken care during the base PMU configuration.
>> +		 */
>> +		if (x == PERF_SAMPLE_BRANCH_USER)
>> +			continue;
>> +		if (x == PERF_SAMPLE_BRANCH_KERNEL)
>> +			continue;
>> +		if (x == PERF_SAMPLE_BRANCH_HV)
>> +			continue;
>> +
>> +		/*
>> +		 * Requested filter not available either
>> +		 * in PMU or in SW.
>> +		 */
>> +		if (!(filter_mask & x))
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +/*
>> + * Required SW based branch filters
>> + *
>> + * This is called after figuring out what all branch filters the
>> + * PMU HW supports for the requested branch filter set. Here we
>> + * will go through all the SW implemented branch filters one by
>> + * one and pick them up if its not already supported in the PMU.
>> + */
>> +static u64 branch_filter_map(u64 branch_sample_type, u64 pmu_bhrb_filter,
>> +			     					u64 *filter_mask)
> 
> Whitespace is foobar here ^
> 

Will fix it.

> This function deals exclusively with the software filter IIUI, but the name
> doesn't indicate that in any way.

Correct, changed the name from branch_filter_map to bhrb_sw_filter_map which
will complement bhrb_filter_map used for figuring out PMU supported filters.

> 
> As far as the logic goes, you return the software filter value, as well as
> mutating the *filter_mask. And in all cases you make the same modification to
> both. That seems very dubious.
> 

yeah, thats right. Because we will use cpuhw->bhrb_sw_filter to apply SW filters on
branch records once they are captured from BHRB and cpuhw->bhrb_filter
(cpuhw->filter_mask before) to track the overall coverage of branch filters either
in HW or SW. While we modify bhrb_filter (filter mask before) inside this function,
it previous contains branch filters which is promised to be implemented by the PMU
for this session. 
 
> Shouldn't this routine just setup the software filter, and leave the upper
> level code to deal with the HW & SW filter values?
> 

bhrb_filter (filter_mask before) runs through two functions one after the other. First
one being PMU specific bhrb_filter_map to figure out available HW filters for the session
and then bhrb_sw_filter_map to figure out available SW filters for the session. There is
no high level code dealing with bhrb_filter mask.

>> +{
>> +	u64 branch_sw_filter = 0;
>> +
>> +	/* No branch filter requested */
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY) {
>> +		WARN_ON(pmu_bhrb_filter != 0);
>> +		WARN_ON(*filter_mask != PERF_SAMPLE_BRANCH_ANY);
>> +		return branch_sw_filter;
>> +	}
>> +
>> +	/*
>> +	 * PMU supported branch filters must also be implemented in SW
>> +	 * in the event when the PMU is unable to process them for some
>> +	 * reason. This all those branch filters can be satisfied with
>> +	 * SW implemented filters. But right now, there is now way to
>> +	 * initimate the user about this decision.
> 
> Please proof read these comments, I don't entirely follow this one.
> 
> You say "must also be implemented in SW" - but I think it's actually "must be
> implemented in SW", ie. the HW is not "also" implementing the filter.
> 
> You say "in the event when" but I think you just mean "when" - the word "event"
> has a particular meaning in this code so you should only use it for that if at
> all possible.
> 
> I don't follow "This all those".
> 
> You should just drop the last sentence, there is never going to be any way to
> notify the user that their filter is implemented in HW vs SW, that's an
> implementation detail.
> 

Took care of these observations.

>> +	 */
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_CALL)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_CALL;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_CALL;
>> +		}
>> +	}
>> +
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_COND) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_COND)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_COND;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_COND;
>> +		}
>> +	}
>> +
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_ANY_RETURN) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_ANY_RETURN)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_ANY_RETURN;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_ANY_RETURN;
>> +		}
>> +	}
>> +
>> +	if (branch_sample_type & PERF_SAMPLE_BRANCH_IND_CALL) {
>> +		if (!(pmu_bhrb_filter & PERF_SAMPLE_BRANCH_IND_CALL)) {
>> +			branch_sw_filter |= PERF_SAMPLE_BRANCH_IND_CALL;
>> +			*filter_mask |= PERF_SAMPLE_BRANCH_IND_CALL;
>> +		}
>> +	}
>> +
>> +	return branch_sw_filter;
>> +}
>> +
>>  /* Processing BHRB entries */
>>  void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>>  {
>> @@ -459,17 +683,29 @@ void power_pmu_bhrb_read(struct cpu_hw_events *cpuhw)
>>  					addr = 0;
>>  				}
>>  				cpuhw->bhrb_entries[u_index].from = addr;
>> +
>> +				if (!check_instruction(cpuhw->
>> +						bhrb_entries[u_index].from,
>> +							cpuhw->bhrb_sw_filter))
>> +					u_index--;
>>  			} else {
>>  				/* Branches to immediate field 
>>  				   (ie I or B form) */
>>  				cpuhw->bhrb_entries[u_index].from = addr;
>> -				cpuhw->bhrb_entries[u_index].to =
>> -					power_pmu_bhrb_to(addr);
>> -				cpuhw->bhrb_entries[u_index].mispred = pred;
>> -				cpuhw->bhrb_entries[u_index].predicted = ~pred;
>> +				if (check_instruction(cpuhw->
>> +						bhrb_entries[u_index].from,
>> +						cpuhw->bhrb_sw_filter)) {
>> +					cpuhw->bhrb_entries[u_index].
>> +						to = power_pmu_bhrb_to(addr);
>> +					cpuhw->bhrb_entries[u_index].
>> +						mispred = pred;
>> +					cpuhw->bhrb_entries[u_index].
>> +						predicted = ~pred;
>> +				} else {
>> +					u_index--;
>> +				}
>>  			}
>>  			u_index++;
> 
> 
> This code was already in need of some unindentation, and now it's just
> ridiculous.
> 
> To start with at the beginning of this routine we have:
> 
> while (..) {
> 	if (!val)
> 		break;
> 	else {
> 		// Bulk of the logic
> 		...
> 	}
> }
> 
> That should almost always become:
> 
> while (..) {
> 	if (!val)
> 		break;
> 
> 	// Bulk of the logic
> 	...
> }
> 
> 
> But in this case that's not enough. Please send a precursor patch which moves
> this logic out into a helper function.
> 

Done

> 
>> -
>>  		}
>>  	}
>>  	cpuhw->bhrb_stack.nr = u_index;
>> @@ -1255,7 +1491,11 @@ nocheck:
>>  	if (has_branch_stack(event)) {
>>  		power_pmu_bhrb_enable(event);
>>  		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
>> -					event->attr.branch_sample_type);
>> +					event->attr.branch_sample_type,
>> +					&cpuhw->filter_mask);
>> +		cpuhw->bhrb_sw_filter = branch_filter_map
>> +					(event->attr.branch_sample_type,
>> +					cpuhw->bhrb_hw_filter, &cpuhw->filter_mask);
>>  	}
>>  
>>  	perf_pmu_enable(event->pmu);
>> @@ -1637,10 +1877,16 @@ static int power_pmu_event_init(struct perf_event *event)
>>  	err = power_check_constraints(cpuhw, events, cflags, n + 1);
>>  
>>  	if (has_branch_stack(event)) {
>> -		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map(
>> -					event->attr.branch_sample_type);
>> -
>> -		if(cpuhw->bhrb_hw_filter == -1)
>> +		cpuhw->bhrb_hw_filter = ppmu->bhrb_filter_map
>> +				(event->attr.branch_sample_type,
>> +				&cpuhw->filter_mask);
>> +		cpuhw->bhrb_sw_filter = branch_filter_map
>> +				(event->attr.branch_sample_type,
>> +				cpuhw->bhrb_hw_filter,
>> +				&cpuhw->filter_mask);
>> +
>> +		if(!match_filters(event->attr.branch_sample_type,
>> +						cpuhw->filter_mask))
>>  			return -EOPNOTSUPP;
> 
> The above two hunks look too similar for my liking.

Moved the SW filter check below the else block to make it common for both the type branches.
Wanted to save some cycles by not accessing the user space (power_pmu_bhrb_to) in case
we know that the "from" is not going to pass the SW branch filter check.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up
  2013-12-09  6:21   ` Michael Ellerman
@ 2013-12-20 11:06       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-20 11:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> On Wed, 2013-04-12 at 10:32:42 UTC, Anshuman Khandual wrote:
>> This patch adds enumeration for all available SW branch filters
>> in powerpc book3s code and also streamlines the look for the
>> SW branch filter entries while trying to figure out which all
>> branch filters can be supported in SW.
> 
> This appears to patch code that was only added in 8/10 ?
> 
> Was there any reason not to do it the right way from the beginning?

No reason, merged this into the 8/10th patch. Working on the V5 of
this patchset. Will send out a draft V5 version for early review.

Regards
Anshuman


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up
@ 2013-12-20 11:06       ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-20 11:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> On Wed, 2013-04-12 at 10:32:42 UTC, Anshuman Khandual wrote:
>> This patch adds enumeration for all available SW branch filters
>> in powerpc book3s code and also streamlines the look for the
>> SW branch filter entries while trying to figure out which all
>> branch filters can be supported in SW.
> 
> This appears to patch code that was only added in 8/10 ?
> 
> Was there any reason not to do it the right way from the beginning?

No reason, merged this into the 8/10th patch. Working on the V5 of
this patchset. Will send out a draft V5 version for early review.

Regards
Anshuman

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-20 11:01       ` Anshuman Khandual
@ 2013-12-24  3:29         ` Michael Ellerman
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-24  3:29 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> > On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> >> +
> >> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
> >> +		/* XL-form instruction */
> >> +		if (instr_is_branch_xlform(*addr)) {
> >> +
> >> +			/* LR should be set */
> >> +			if (is_branch_link_set(*addr)) {
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to CTR.
> >> +			 	 */
> >> +				if (is_xlform_ctr(*addr))
> >> +					result = true;
> >> +
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to LR.
> >> +			 	 */
> >> +				if (is_xlform_lr(*addr))
> >> +					result = true;
> >> +
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to TAR.
> >> +			 	 */
> >> +				if (is_xlform_tar(*addr))
> >> +					result = true;
> > 
> > What other kind of XL-Form branch is there?
> 
> I am not sure. Do you know of any ?

That was my point. There are no other types, so you can just do:

	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL)
		if (instr_is_branch_xlform(*addr) && is_branch_link_set(*addr))
			return true;

> >> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> >> +
> >> +		/* I-form instruction - excluded */
> >> +		if (instr_is_branch_iform(*addr))
> >> +			goto out;
> >> +
> >> +		/* B-form or XL-form instruction */
> >> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> >> +
> >> +			/* Not branch always  */
> >> +			if (!is_bo_always(*addr)) {
> >> +
> >> +				/* Conditional branch to CTR register */
> >> +				if (is_bo_ctr(*addr))
> >> +					goto out;
> > 
> > We might have discussed this but why not?
> 
> Did not get that, discuss what ?

Why are we saying a conditional branch to the CTR is not a conditional branch?

It is conditional, so I think it should be included.

> >> +
> >> +				/* CR[BI] conditional branch with static hint */
> > 
> > A conditional branch with a static hint is still a conditional branch?
> 
> No its not. 

Yes it is?

In fact they could be very interesting branches. Because the compiler or
programmer has statically hinted them, if the hint is wrong they may be a major
source of branch midpredicts.


> >> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
> >> +					if (is_bo_crbi_hint(*addr))
> >> +						goto out;
> >> +				}
> >> +
> >> +				result = true;
> >> +			}
> >> +		}
> >> +	}
> >> +out:
> >> +	return result;
> >> +}
 
> >> +	} else {
> >> +		/*
> >> +		 * Userspace address needs to be
> >> +		 * copied first before analysis.
> >> +		 */
> >> +		pagefault_disable();
> >> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
> > 
> > I suspect you borrowed this incantation from the callchain code. Unlike that
> > code you don't fallback to reading the page tables directly.
> > 
> > I'd rather see the accessor in the callchain code made generic and have you
> > call it here.
> 
> You have mentioned to take care of this issue yourself.

Yes I will.

cheers



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
@ 2013-12-24  3:29         ` Michael Ellerman
  0 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-24  3:29 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> > On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> >> +
> >> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
> >> +		/* XL-form instruction */
> >> +		if (instr_is_branch_xlform(*addr)) {
> >> +
> >> +			/* LR should be set */
> >> +			if (is_branch_link_set(*addr)) {
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to CTR.
> >> +			 	 */
> >> +				if (is_xlform_ctr(*addr))
> >> +					result = true;
> >> +
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to LR.
> >> +			 	 */
> >> +				if (is_xlform_lr(*addr))
> >> +					result = true;
> >> +
> >> +				/*
> >> +			 	 * Conditional and unconditional
> >> +			 	 * branch to TAR.
> >> +			 	 */
> >> +				if (is_xlform_tar(*addr))
> >> +					result = true;
> > 
> > What other kind of XL-Form branch is there?
> 
> I am not sure. Do you know of any ?

That was my point. There are no other types, so you can just do:

	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL)
		if (instr_is_branch_xlform(*addr) && is_branch_link_set(*addr))
			return true;

> >> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> >> +
> >> +		/* I-form instruction - excluded */
> >> +		if (instr_is_branch_iform(*addr))
> >> +			goto out;
> >> +
> >> +		/* B-form or XL-form instruction */
> >> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> >> +
> >> +			/* Not branch always  */
> >> +			if (!is_bo_always(*addr)) {
> >> +
> >> +				/* Conditional branch to CTR register */
> >> +				if (is_bo_ctr(*addr))
> >> +					goto out;
> > 
> > We might have discussed this but why not?
> 
> Did not get that, discuss what ?

Why are we saying a conditional branch to the CTR is not a conditional branch?

It is conditional, so I think it should be included.

> >> +
> >> +				/* CR[BI] conditional branch with static hint */
> > 
> > A conditional branch with a static hint is still a conditional branch?
> 
> No its not. 

Yes it is?

In fact they could be very interesting branches. Because the compiler or
programmer has statically hinted them, if the hint is wrong they may be a major
source of branch midpredicts.


> >> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
> >> +					if (is_bo_crbi_hint(*addr))
> >> +						goto out;
> >> +				}
> >> +
> >> +				result = true;
> >> +			}
> >> +		}
> >> +	}
> >> +out:
> >> +	return result;
> >> +}
 
> >> +	} else {
> >> +		/*
> >> +		 * Userspace address needs to be
> >> +		 * copied first before analysis.
> >> +		 */
> >> +		pagefault_disable();
> >> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
> > 
> > I suspect you borrowed this incantation from the callchain code. Unlike that
> > code you don't fallback to reading the page tables directly.
> > 
> > I'd rather see the accessor in the callchain code made generic and have you
> > call it here.
> 
> You have mentioned to take care of this issue yourself.

Yes I will.

cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-24  3:29         ` Michael Ellerman
@ 2013-12-24  3:50           ` Anshuman Khandual
  -1 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-24  3:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On 12/24/2013 08:59 AM, Michael Ellerman wrote:
> On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
>> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>>> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
>>>> +
>>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
>>>> +		/* XL-form instruction */
>>>> +		if (instr_is_branch_xlform(*addr)) {
>>>> +
>>>> +			/* LR should be set */
>>>> +			if (is_branch_link_set(*addr)) {
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to CTR.
>>>> +			 	 */
>>>> +				if (is_xlform_ctr(*addr))
>>>> +					result = true;
>>>> +
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to LR.
>>>> +			 	 */
>>>> +				if (is_xlform_lr(*addr))
>>>> +					result = true;
>>>> +
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to TAR.
>>>> +			 	 */
>>>> +				if (is_xlform_tar(*addr))
>>>> +					result = true;
>>>
>>> What other kind of XL-Form branch is there?
>>
>> I am not sure. Do you know of any ?
> 
> That was my point. There are no other types, so you can just do:
> 
> 	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL)
> 		if (instr_is_branch_xlform(*addr) && is_branch_link_set(*addr))
> 			return true;
> 

Done

>>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
>>>> +
>>>> +		/* I-form instruction - excluded */
>>>> +		if (instr_is_branch_iform(*addr))
>>>> +			goto out;
>>>> +
>>>> +		/* B-form or XL-form instruction */
>>>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
>>>> +
>>>> +			/* Not branch always  */
>>>> +			if (!is_bo_always(*addr)) {
>>>> +
>>>> +				/* Conditional branch to CTR register */
>>>> +				if (is_bo_ctr(*addr))
>>>> +					goto out;
>>>
>>> We might have discussed this but why not?
>>
>> Did not get that, discuss what ?
> 
> Why are we saying a conditional branch to the CTR is not a conditional branch?
> 
> It is conditional, so I think it should be included.
> 

I believe conditional branch to CTR register and the below conditional branch
with static hint are excluded when processed with BHRB PMU based filter IFM3,
Here the SW implemented filter try to match those exclusions, so that a user
should not see any difference in results whether the filter is processed
either in PMU or in SW.

>>>> +
>>>> +				/* CR[BI] conditional branch with static hint */
>>>
>>> A conditional branch with a static hint is still a conditional branch?
>>
>> No its not. 
> 
> Yes it is?
> 
> In fact they could be very interesting branches. Because the compiler or
> programmer has statically hinted them, if the hint is wrong they may be a major
> source of branch midpredicts.
> 
> 
>>>> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
>>>> +					if (is_bo_crbi_hint(*addr))
>>>> +						goto out;
>>>> +				}
>>>> +
>>>> +				result = true;
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +out:
>>>> +	return result;
>>>> +}
> 
>>>> +	} else {
>>>> +		/*
>>>> +		 * Userspace address needs to be
>>>> +		 * copied first before analysis.
>>>> +		 */
>>>> +		pagefault_disable();
>>>> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
>>>
>>> I suspect you borrowed this incantation from the callchain code. Unlike that
>>> code you don't fallback to reading the page tables directly.
>>>
>>> I'd rather see the accessor in the callchain code made generic and have you
>>> call it here.
>>
>> You have mentioned to take care of this issue yourself.
> 
> Yes I will.

Thanks !!


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
@ 2013-12-24  3:50           ` Anshuman Khandual
  0 siblings, 0 replies; 57+ messages in thread
From: Anshuman Khandual @ 2013-12-24  3:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On 12/24/2013 08:59 AM, Michael Ellerman wrote:
> On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
>> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>>> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
>>>> +
>>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL) {
>>>> +		/* XL-form instruction */
>>>> +		if (instr_is_branch_xlform(*addr)) {
>>>> +
>>>> +			/* LR should be set */
>>>> +			if (is_branch_link_set(*addr)) {
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to CTR.
>>>> +			 	 */
>>>> +				if (is_xlform_ctr(*addr))
>>>> +					result = true;
>>>> +
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to LR.
>>>> +			 	 */
>>>> +				if (is_xlform_lr(*addr))
>>>> +					result = true;
>>>> +
>>>> +				/*
>>>> +			 	 * Conditional and unconditional
>>>> +			 	 * branch to TAR.
>>>> +			 	 */
>>>> +				if (is_xlform_tar(*addr))
>>>> +					result = true;
>>>
>>> What other kind of XL-Form branch is there?
>>
>> I am not sure. Do you know of any ?
> 
> That was my point. There are no other types, so you can just do:
> 
> 	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_IND_CALL)
> 		if (instr_is_branch_xlform(*addr) && is_branch_link_set(*addr))
> 			return true;
> 

Done

>>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
>>>> +
>>>> +		/* I-form instruction - excluded */
>>>> +		if (instr_is_branch_iform(*addr))
>>>> +			goto out;
>>>> +
>>>> +		/* B-form or XL-form instruction */
>>>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
>>>> +
>>>> +			/* Not branch always  */
>>>> +			if (!is_bo_always(*addr)) {
>>>> +
>>>> +				/* Conditional branch to CTR register */
>>>> +				if (is_bo_ctr(*addr))
>>>> +					goto out;
>>>
>>> We might have discussed this but why not?
>>
>> Did not get that, discuss what ?
> 
> Why are we saying a conditional branch to the CTR is not a conditional branch?
> 
> It is conditional, so I think it should be included.
> 

I believe conditional branch to CTR register and the below conditional branch
with static hint are excluded when processed with BHRB PMU based filter IFM3,
Here the SW implemented filter try to match those exclusions, so that a user
should not see any difference in results whether the filter is processed
either in PMU or in SW.

>>>> +
>>>> +				/* CR[BI] conditional branch with static hint */
>>>
>>> A conditional branch with a static hint is still a conditional branch?
>>
>> No its not. 
> 
> Yes it is?
> 
> In fact they could be very interesting branches. Because the compiler or
> programmer has statically hinted them, if the hint is wrong they may be a major
> source of branch midpredicts.
> 
> 
>>>> +				if (is_bo_crbi_off(*addr) || is_bo_crbi_on(*addr)) {
>>>> +					if (is_bo_crbi_hint(*addr))
>>>> +						goto out;
>>>> +				}
>>>> +
>>>> +				result = true;
>>>> +			}
>>>> +		}
>>>> +	}
>>>> +out:
>>>> +	return result;
>>>> +}
> 
>>>> +	} else {
>>>> +		/*
>>>> +		 * Userspace address needs to be
>>>> +		 * copied first before analysis.
>>>> +		 */
>>>> +		pagefault_disable();
>>>> +		ret =  __get_user_inatomic(instr, (unsigned int __user *)addr);
>>>
>>> I suspect you borrowed this incantation from the callchain code. Unlike that
>>> code you don't fallback to reading the page tables directly.
>>>
>>> I'd rather see the accessor in the callchain code made generic and have you
>>> call it here.
>>
>> You have mentioned to take care of this issue yourself.
> 
> Yes I will.

Thanks !!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
  2013-12-24  3:50           ` Anshuman Khandual
@ 2013-12-24  4:35             ` Michael Ellerman
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-24  4:35 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linuxppc-dev, linux-kernel, mikey, ak, eranian, acme, sukadev, mingo

On Tue, 2013-12-24 at 09:20 +0530, Anshuman Khandual wrote:
> On 12/24/2013 08:59 AM, Michael Ellerman wrote:
> > On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
> >> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> >>> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> >>>> +
> 
> >>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> >>>> +
> >>>> +		/* I-form instruction - excluded */
> >>>> +		if (instr_is_branch_iform(*addr))
> >>>> +			goto out;
> >>>> +
> >>>> +		/* B-form or XL-form instruction */
> >>>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> >>>> +
> >>>> +			/* Not branch always  */
> >>>> +			if (!is_bo_always(*addr)) {
> >>>> +
> >>>> +				/* Conditional branch to CTR register */
> >>>> +				if (is_bo_ctr(*addr))
> >>>> +					goto out;
> >>>
> >>> We might have discussed this but why not?
> >>
> >> Did not get that, discuss what ?
> > 
> > Why are we saying a conditional branch to the CTR is not a conditional branch?
> > 
> > It is conditional, so I think it should be included.

> I believe conditional branch to CTR register and the below conditional branch
> with static hint are excluded when processed with BHRB PMU based filter IFM3,
> Here the SW implemented filter try to match those exclusions, so that a user
> should not see any difference in results whether the filter is processed
> either in PMU or in SW.

OK. That's what I meant by "we might have discussed this".

So you need to make it very clear in the code that we are implementing the IFM3
semantics, with a comment. Otherwise it's not obviously clear why those
semantics make sense.

And we need to make extra sure we implement the same semantics as IFM3, which I
don't think you do at the moment.

The description for IFM3 is:

   Do not record:
    * b and bl instructions, 
    * bc and bcl instructions for which the BO field indicates “Branch always.”
   
   For bclr, bclrl, bctr, bctrl, bctar, and bctarl instructions for which
   the BO field indicates “Branch always,” record only one entry
   containing the Branch target address.

So I don't think your SW filter implements that part correctly. You are
discarding all branches with "branch always" set.


   Do not record:
    * Branch instructions for which BO[0]=1, 

This is what excludes branches to CTR. But, it's only branches to CTR that
don't also depend on CR[BI] - we need to make that clear in the code.

    * Branch instructions for which the “a” bit in the BO field is set to 1.

So that's the is_bo_crbi_hint() check and rejection, but it's not related to
CR[BI] at all.

There's a note about CR[BI]:

    Do not record instructions that do not depend on the value of CR[BI].

But I think you've misinterpreted that. 

    Do not record instructions that do not depend on the value of CR[BI].

    Do     record instructions that        depend on the value of CR[BI].


In fact the only branches that don't depend on CR[BI] are "branch always"
branches, and branches with BO[0]=1, both of which were handled above.

cheers



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework
@ 2013-12-24  4:35             ` Michael Ellerman
  0 siblings, 0 replies; 57+ messages in thread
From: Michael Ellerman @ 2013-12-24  4:35 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev, mingo

On Tue, 2013-12-24 at 09:20 +0530, Anshuman Khandual wrote:
> On 12/24/2013 08:59 AM, Michael Ellerman wrote:
> > On Fri, 2013-12-20 at 16:31 +0530, Anshuman Khandual wrote:
> >> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
> >>> On Wed, 2013-04-12 at 10:32:40 UTC, Anshuman Khandual wrote:
> >>>> +
> 
> >>>> +	if (bhrb_sw_filter & PERF_SAMPLE_BRANCH_COND) {
> >>>> +
> >>>> +		/* I-form instruction - excluded */
> >>>> +		if (instr_is_branch_iform(*addr))
> >>>> +			goto out;
> >>>> +
> >>>> +		/* B-form or XL-form instruction */
> >>>> +		if (instr_is_branch_bform(*addr) || instr_is_branch_xlform(*addr))  {
> >>>> +
> >>>> +			/* Not branch always  */
> >>>> +			if (!is_bo_always(*addr)) {
> >>>> +
> >>>> +				/* Conditional branch to CTR register */
> >>>> +				if (is_bo_ctr(*addr))
> >>>> +					goto out;
> >>>
> >>> We might have discussed this but why not?
> >>
> >> Did not get that, discuss what ?
> > 
> > Why are we saying a conditional branch to the CTR is not a conditional branch?
> > 
> > It is conditional, so I think it should be included.

> I believe conditional branch to CTR register and the below conditional branch
> with static hint are excluded when processed with BHRB PMU based filter IFM3,
> Here the SW implemented filter try to match those exclusions, so that a user
> should not see any difference in results whether the filter is processed
> either in PMU or in SW.

OK. That's what I meant by "we might have discussed this".

So you need to make it very clear in the code that we are implementing the IFM3
semantics, with a comment. Otherwise it's not obviously clear why those
semantics make sense.

And we need to make extra sure we implement the same semantics as IFM3, which I
don't think you do at the moment.

The description for IFM3 is:

   Do not record:
    * b and bl instructions, 
    * bc and bcl instructions for which the BO field indicates “Branch always.”
   
   For bclr, bclrl, bctr, bctrl, bctar, and bctarl instructions for which
   the BO field indicates “Branch always,” record only one entry
   containing the Branch target address.

So I don't think your SW filter implements that part correctly. You are
discarding all branches with "branch always" set.


   Do not record:
    * Branch instructions for which BO[0]=1, 

This is what excludes branches to CTR. But, it's only branches to CTR that
don't also depend on CR[BI] - we need to make that clear in the code.

    * Branch instructions for which the “a” bit in the BO field is set to 1.

So that's the is_bo_crbi_hint() check and rejection, but it's not related to
CR[BI] at all.

There's a note about CR[BI]:

    Do not record instructions that do not depend on the value of CR[BI].

But I think you've misinterpreted that. 

    Do not record instructions that do not depend on the value of CR[BI].

    Do     record instructions that        depend on the value of CR[BI].


In fact the only branches that don't depend on CR[BI] are "branch always"
branches, and branches with BO[0]=1, both of which were handled above.

cheers

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2013-12-24  4:35 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-04 10:32 [PATCH V4 00/10] perf: New conditional branch filter Anshuman Khandual
2013-12-04 10:32 ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 01/10] perf: Add PERF_SAMPLE_BRANCH_COND Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 02/10] powerpc, perf: Enable conditional branch filter for POWER8 Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 03/10] perf, tool: Conditional branch filter 'cond' added to perf record Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 04/10] x86, perf: Add conditional branch filtering support Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-06 16:46   ` Andi Kleen
2013-12-06 16:46     ` Andi Kleen
2013-12-04 10:32 ` [PATCH V4 05/10] perf, documentation: Description for conditional branch filter Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 06/10] powerpc, perf: Change the name of HW PMU branch filter tracking variable Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 07/10] powerpc, lib: Add new branch instruction analysis support functions Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-09  6:21   ` Michael Ellerman
2013-12-10  6:09     ` Anshuman Khandual
2013-12-10  6:09       ` Anshuman Khandual
2013-12-20 10:06       ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 08/10] powerpc, perf: Enable SW filtering in branch stack sampling framework Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-09  6:21   ` Michael Ellerman
2013-12-10  5:57     ` Anshuman Khandual
2013-12-10  5:57       ` Anshuman Khandual
2013-12-12  8:45       ` Anshuman Khandual
2013-12-13  2:47         ` Michael Ellerman
2013-12-20 11:01     ` Anshuman Khandual
2013-12-20 11:01       ` Anshuman Khandual
2013-12-24  3:29       ` Michael Ellerman
2013-12-24  3:29         ` Michael Ellerman
2013-12-24  3:50         ` Anshuman Khandual
2013-12-24  3:50           ` Anshuman Khandual
2013-12-24  4:35           ` Michael Ellerman
2013-12-24  4:35             ` Michael Ellerman
2013-12-04 10:32 ` [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-09  6:21   ` Michael Ellerman
2013-12-13  8:20     ` Anshuman Khandual
2013-12-13  8:20       ` Anshuman Khandual
2013-12-18  0:08       ` Michael Ellerman
2013-12-18  0:08         ` Michael Ellerman
2013-12-18  3:55         ` Anshuman Khandual
2013-12-18  3:55           ` Anshuman Khandual
2013-12-04 10:32 ` [PATCH V4 10/10] powerpc, perf: Cleanup SW branch filter list look up Anshuman Khandual
2013-12-04 10:32   ` Anshuman Khandual
2013-12-09  6:21   ` Michael Ellerman
2013-12-20 11:06     ` Anshuman Khandual
2013-12-20 11:06       ` Anshuman Khandual
2013-12-05  4:47 ` [PATCH V4 00/10] perf: New conditional branch filter Michael Ellerman
2013-12-05  4:47   ` Michael Ellerman
2013-12-06 13:18   ` Arnaldo Carvalho de Melo
2013-12-06 13:18     ` Arnaldo Carvalho de Melo
2013-12-09  0:41     ` Michael Ellerman
2013-12-09  0:41       ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.