Greeting,

FYI, we noticed a -65.2% regression of will-it-scale.per_thread_ops due to commit:


commit: 3c19f2312f48a3d36a4e13f5072a6a95e755b3d5 ("fs/locks: always delete_block after waiting.")
https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git locks-4.21

in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: lock1
	ucode: 0xb00002e
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-bdw-ep3b/lock1/will-it-scale/0xb00002e

commit: 
  816f2fb5a2 ("fs/locks: allow a lock request to block other requests.")
  3c19f2312f ("fs/locks: always delete_block after waiting.")

816f2fb5a2fc678c 3c19f2312f48a3d36a4e13f507 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     71447           -65.2%      24854        will-it-scale.per_thread_ops
    138940            -2.9%     134886        will-it-scale.time.involuntary_context_switches
    279.85           -64.2%     100.29        will-it-scale.time.user_time
   6287454           -65.2%    2187242        will-it-scale.workload
      1.09            -0.7        0.42        mpstat.cpu.usr%
    371230 ±  4%      +9.5%     406403        softirqs.SCHED
      1803 ± 16%     +48.9%       2685 ±  8%  numa-meminfo.node0.PageTables
      2784 ± 10%     -30.7%       1928 ± 12%  numa-meminfo.node1.PageTables
    224.55            -1.8%     220.57        turbostat.PkgWatt
      7.70            -3.0%       7.47        turbostat.RAMWatt
    450.50 ± 17%     +49.0%     671.25 ±  8%  numa-vmstat.node0.nr_page_table_pages
    644147 ± 10%     -19.8%     516646 ± 11%  numa-vmstat.node0.numa_hit
    639812 ± 10%     -20.6%     508027 ± 12%  numa-vmstat.node0.numa_local
    696.25 ± 10%     -30.7%     482.50 ± 12%  numa-vmstat.node1.nr_page_table_pages
      4617            +2.1%       4715        proc-vmstat.nr_inactive_anon
      7097            +2.0%       7241        proc-vmstat.nr_mapped
     20507            +7.0%      21934 ±  3%  proc-vmstat.nr_shmem
      4617            +2.1%       4715        proc-vmstat.nr_zone_inactive_anon
    690109            +1.0%     696863        proc-vmstat.numa_hit
    672911            +1.0%     679694        proc-vmstat.numa_local
     23133 ±  2%      +8.9%      25196 ±  4%  proc-vmstat.pgactivate
    607.03 ±  6%     -16.0%     509.80 ± 12%  sched_debug.cfs_rq:/.util_est_enqueued.avg
     24.42 ± 28%     +38.2%      33.75 ± 22%  sched_debug.cpu.cpu_load[2].max
      2.20 ± 28%     +39.9%       3.08 ± 23%  sched_debug.cpu.cpu_load[2].stddev
     25.33 ± 12%     +23.2%      31.21 ±  9%  sched_debug.cpu.cpu_load[3].max
      2.28 ± 21%     +29.6%       2.95 ± 12%  sched_debug.cpu.cpu_load[3].stddev
     52140 ± 23%     +37.1%      71510 ±  3%  sched_debug.cpu.nr_switches.max
     53379 ± 24%     +46.4%      78158 ± 11%  sched_debug.cpu.sched_count.max
      7132 ± 12%     +32.3%       9436 ± 15%  sched_debug.cpu.sched_count.stddev
 4.587e+12            -7.5%  4.245e+12        perf-stat.branch-instructions
      0.24            -0.1        0.15        perf-stat.branch-miss-rate%
 1.107e+10           -43.0%  6.312e+09        perf-stat.branch-misses
     40.04            -2.0       38.01        perf-stat.cache-miss-rate%
 8.415e+09 ±  2%     -19.4%  6.782e+09 ±  6%  perf-stat.cache-misses
 2.101e+10           -15.1%  1.783e+10 ±  5%  perf-stat.cache-references
      3.85           +10.7%       4.26        perf-stat.cpi
      0.00 ±  2%      +0.0        0.00 ±  4%  perf-stat.dTLB-load-miss-rate%
  90399109 ±  2%      +6.6%   96381582 ±  4%  perf-stat.dTLB-load-misses
 4.956e+12           -11.6%   4.38e+12        perf-stat.dTLB-loads
      0.00 ±  8%      +0.0        0.01 ± 24%  perf-stat.dTLB-store-miss-rate%
 8.789e+11           -61.0%  3.427e+11        perf-stat.dTLB-stores
     80.76           -10.8       69.98        perf-stat.iTLB-load-miss-rate%
 3.901e+09           -63.3%   1.43e+09        perf-stat.iTLB-load-misses
   9.3e+08 ±  6%     -34.0%  6.135e+08 ±  2%  perf-stat.iTLB-loads
 1.912e+13            -9.6%  1.728e+13        perf-stat.instructions
      4901          +146.5%      12081        perf-stat.instructions-per-iTLB-miss
      0.26            -9.6%       0.23        perf-stat.ipc
     82.36            -3.7       78.70        perf-stat.node-store-miss-rate%
 2.319e+09           -20.6%  1.842e+09        perf-stat.node-store-misses
   3041599          +159.7%    7898884        perf-stat.path-length
     61.02 ± 10%     -61.0        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
     60.64 ± 10%     -60.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl
     98.79            +0.7       99.50        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     98.76            +0.7       99.49        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
     98.64            +0.8       99.44        perf-profile.calltrace.cycles-pp.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.70            +1.5       99.16        perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     97.41            +1.6       99.05        perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     35.73 ± 18%     +62.7       98.45        perf-profile.calltrace.cycles-pp.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
      0.00           +65.1       65.07        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk
      0.00           +65.3       65.28        perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl
      0.00           +65.3       65.31        perf-profile.calltrace.cycles-pp.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
      1.13 ±  2%      -0.7        0.43        perf-profile.children.cycles-pp.locks_alloc_lock
      0.98 ±  2%      -0.6        0.38        perf-profile.children.cycles-pp.kmem_cache_alloc
      0.59            -0.4        0.23 ±  2%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.53            -0.3        0.20 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.35 ± 11%      -0.3        0.06 ±  9%  perf-profile.children.cycles-pp.fput
      0.41 ±  2%      -0.3        0.15 ±  3%  perf-profile.children.cycles-pp.file_has_perm
      0.33 ±  2%      -0.2        0.12 ±  6%  perf-profile.children.cycles-pp.memset_erms
      0.30 ±  3%      -0.2        0.11 ±  3%  perf-profile.children.cycles-pp.security_file_lock
      0.25 ±  3%      -0.2        0.10 ±  5%  perf-profile.children.cycles-pp.security_file_fcntl
      0.24 ±  2%      -0.1        0.10 ±  4%  perf-profile.children.cycles-pp._copy_from_user
      0.22 ± 12%      -0.1        0.07 ±  5%  perf-profile.children.cycles-pp.__fget_light
      0.21 ±  3%      -0.1        0.08 ±  6%  perf-profile.children.cycles-pp.avc_has_perm
      0.20 ±  5%      -0.1        0.08        perf-profile.children.cycles-pp.___might_sleep
      0.16 ±  5%      -0.1        0.06        perf-profile.children.cycles-pp.__fget
      0.24 ±  3%      -0.1        0.17 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
      0.12 ±  5%      -0.1        0.05        perf-profile.children.cycles-pp.__might_sleep
      0.24 ± 15%      -0.1        0.18        perf-profile.children.cycles-pp.locks_insert_lock_ctx
      0.11 ±  3%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.locks_free_lock
     98.83            +0.7       99.54        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     98.79            +0.7       99.52        perf-profile.children.cycles-pp.do_syscall_64
     98.65            +0.8       99.44        perf-profile.children.cycles-pp.__x64_sys_fcntl
     97.71            +1.5       99.17        perf-profile.children.cycles-pp.do_fcntl
     97.42            +1.6       99.05        perf-profile.children.cycles-pp.fcntl_setlk
     94.97            +3.0       97.98        perf-profile.children.cycles-pp._raw_spin_lock
     93.97            +3.3       97.24        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     35.74 ± 18%     +62.7       98.46        perf-profile.children.cycles-pp.do_lock_file_wait
      0.00           +65.3       65.31        perf-profile.children.cycles-pp.locks_delete_block
      0.59            -0.4        0.23 ±  2%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.53            -0.3        0.20 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.35 ± 10%      -0.3        0.06 ±  9%  perf-profile.self.cycles-pp.fput
      1.00 ±  2%      -0.3        0.74        perf-profile.self.cycles-pp._raw_spin_lock
      0.38 ±  3%      -0.2        0.14        perf-profile.self.cycles-pp.kmem_cache_alloc
      0.32 ±  2%      -0.2        0.12 ±  3%  perf-profile.self.cycles-pp.memset_erms
      0.20 ±  3%      -0.1        0.08 ±  6%  perf-profile.self.cycles-pp.avc_has_perm
      0.20 ±  2%      -0.1        0.08 ±  6%  perf-profile.self.cycles-pp.posix_lock_inode
      0.20 ±  6%      -0.1        0.08        perf-profile.self.cycles-pp.___might_sleep
      0.16 ±  4%      -0.1        0.05 ±  9%  perf-profile.self.cycles-pp.__fget
      0.15 ±  8%      -0.1        0.06        perf-profile.self.cycles-pp.fcntl_setlk
      0.24            -0.1        0.15 ±  2%  perf-profile.self.cycles-pp.kmem_cache_free
      0.11            -0.1        0.03 ±100%  perf-profile.self.cycles-pp.__x64_sys_fcntl
      0.11 ±  4%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.__might_sleep
      0.13            -0.1        0.05        perf-profile.self.cycles-pp.locks_alloc_lock
      0.13 ±  5%      -0.1        0.05        perf-profile.self.cycles-pp.file_has_perm
      0.07 ±  7%      -0.0        0.05        perf-profile.self.cycles-pp.locks_free_lock
     93.64            +3.3       96.89        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


                            will-it-scale.per_thread_ops                        
                                                                                
  75000 +-+-----------------------------------------------------------------+   
  70000 +-+.. .+.+.+..+.+.+.+..        .+. .+.. .+.+.  .+. .+.+..+.+.+.+..+.|   
        |    +                 +. .+.+.   +    +     +.   +                 |   
  65000 +-+                      +                                          |   
  60000 +-+                                                                 |   
  55000 +-+                                                                 |   
  50000 +-+                                                                 |   
        |                                                                   |   
  45000 +-+                                                                 |   
  40000 +-+                                                                 |   
  35000 +-+                                                                 |   
  30000 +-+                                                                 |   
        |                                                                   |   
  25000 O-O  O O O O  O O O O  O O O O  O O O  O O O O  O O                 |   
  20000 +-+-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
  6.5e+06 +-+---------------------------------------------------------------+   
          |.+.  .+.+.+.+    +   +.      .+. .+.  .+.+. .+.  .+.+   +.+..+.+ |   
    6e+06 +-+ +.                  +.+..+   +   +.     +   +.                |   
  5.5e+06 +-+                                                               |   
          |                                                                 |   
    5e+06 +-+                                                               |   
  4.5e+06 +-+                                                               |   
          |                                                                 |   
    4e+06 +-+                                                               |   
  3.5e+06 +-+                                                               |   
          |                                                                 |   
    3e+06 +-+                                                               |   
  2.5e+06 +-+                                                               |   
          O O O  O O     O  O O O O O  O O O O O  O O O O O                 |   
    2e+06 +-+--------O-O----------------------------------------------------+   
                                                                                
                                                                                                                                                                
                           will-it-scale.time.user_time                         
                                                                                
  300 +-+-------------------------------------------------------------------+   
  280 +-+            .+.. .+.                                  .+.. .+.  .+.|   
      |.+..+.+.+..+.+    +   +..      .+.+.+..+.+.+..+.+.+..+.+    +   +.   |   
  260 +-+                       +.+.+.                                      |   
  240 +-+                                                                   |   
  220 +-+                                                                   |   
  200 +-+                                                                   |   
      |                                                                     |   
  180 +-+                                                                   |   
  160 +-+                                                                   |   
  140 +-+  O                                                                |   
  120 +-+                                                                   |   
      O O    O O                  O                                         |   
  100 +-+         O O O  O O O  O   O  O O O  O O O  O O O                  |   
   80 +-+-------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen