linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [locking/qspinlock]  6f9a39a437:  unixbench.score -17.3% regression
@ 2020-11-25 18:56 Alex Kogan
  0 siblings, 0 replies; 2+ messages in thread
From: Alex Kogan @ 2020-11-25 18:56 UTC (permalink / raw)
  To: oliver.sang
  Cc: tglx, lkp, ying.huang, lkp, linux, feng.tang, hpa, dave.dice,
	mingo, will.deacon, arnd, jglauber, guohanjun, x86,
	zhengjun.xing, daniel.m.jordan, steven.sistare, bp,
	linux-arm-kernel, longman, linux-kernel, peterz, linux-arch

[-- Attachment #1: Type: text/plain, Size: 61044 bytes --]

Oliver, thank you for this report.

All, with nr_task=30%, the benchmark hits the sweet spot on the contention curve 
amplifying the overhead of shuffling threads between waiting queues without 
reaping the locality overhead. I was able to reproduce the regression on our 
machine, though to a lesser extent of about 10% of the performance drop for 
the given test.

Luckily, we have a solution for this exact scenario, which we call the 
shuffle reduction optimization, or SRO. It was a part of the series until v9, 
but since it did not provide much benefit in my benchmarks in v10, it was 
dropped. Now, with SRO, the regression on unixbench shrinks to about 1%, 
while other performance numbers do not change much.

I attach the SRO patch here. IMHO, it is pretty straight-forward. 
It uses randomization, but only to throttle the creation of a secondary queue.
In particular, it does not introduce any extra delays for threads waiting
in that queue once it is created.

Anyway, any feedback is welcome!
Unless I hear any objections, I will plan to post another version of the series 
with SRO included.

Thanks,
-- Alex

----- Original Message -----
From: oliver.sang@intel.com
To: alex.kogan@oracle.com
Cc: linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com, will.deacon@arm.com, arnd@arndb.de, longman@redhat.com, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com, alex.kogan@oracle.com, dave.dice@oracle.com, lkp@intel.com, lkp@lists.01.org, ying.huang@intel.com, feng.tang@intel.com, zhengjun.xing@intel.com
Sent: Sunday, November 22, 2020 4:33:52 AM GMT -05:00 US/Canada Eastern
Subject: [locking/qspinlock]  6f9a39a437:  unixbench.score -17.3% regression


Greeting,

FYI, we noticed a -17.3% regression of unixbench.score due to commit:


commit: 6f9a39a4372e37907ac1fc7ede6c90932a88d174 ("[PATCH v12 5/5] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA")
url: https://urldefense.com/v3/__https://github.com/0day-ci/linux/commits/Alex-Kogan/Add-NUMA-awareness-to-qspinlock/20201118-072506__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZiGJB2Kl$ 
base: https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZn0AlnmE$  932f8c64d38bb08f69c8c26a2216ba0c36c6daa8

in testcase: unixbench
on test machine: 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory
with following parameters:

	runtime: 300s
	nr_task: 30%
	test: context1
	cpufreq_governor: performance
	ucode: 0x4003003

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://urldefense.com/v3/__https://github.com/kdlucas/byte-unixbench__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZlLfqDIS$ 



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@intel.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://urldefense.com/v3/__https://github.com/intel/lkp-tests.git__;!!GqivPVa7Brio!J6uFF5neDgzw1T5v2mMXBTe1dyDbcWqAn9mi-YuDyYUiT8W303JqK82CZjvM7lRy$ 
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp4/context1/unixbench/0x4003003

commit: 
  eaf522d564 ("locking/qspinlock: Introduce starvation avoidance into CNA")
  6f9a39a437 ("locking/qspinlock: Avoid moving certain threads between waiting queues in CNA")

eaf522d56432e0e5 6f9a39a4372e37907ac1fc7ede6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      3715           -17.3%       3070        unixbench.score
     11584           +13.2%      13118        unixbench.time.involuntary_context_switches
      1830            +4.7%       1916        unixbench.time.percent_of_cpu_this_job_got
      7012            +5.1%       7373        unixbench.time.system_time
    141.44           -15.6%     119.37        unixbench.time.user_time
 4.338e+08           -16.4%  3.627e+08        unixbench.time.voluntary_context_switches
 5.807e+08           -17.5%  4.793e+08        unixbench.workload
    139.00 ± 67%     -71.0%      40.25        numa-vmstat.node1.nr_mlock
      1.08            -0.1        0.94        mpstat.cpu.all.irq%
      0.48 ±  2%      -0.1        0.40        mpstat.cpu.all.usr%
    956143 ±  7%     +11.0%    1060959 ±  3%  numa-meminfo.node0.MemUsed
   1185909 ±  5%      -8.8%    1081277 ±  3%  numa-meminfo.node1.MemUsed
   4402315           -16.3%    3682692        vmstat.system.cs
    235535            -4.6%     224625        vmstat.system.in
  6.42e+09           +16.4%  7.471e+09        cpuidle.C1.time
 1.941e+10 ±  7%     -20.0%  1.553e+10 ± 21%  cpuidle.C1E.time
  94497227 ±  5%     -63.8%   34185071 ± 15%  cpuidle.C1E.usage
  2.62e+08 ±  8%     -90.1%   26020649        cpuidle.POLL.time
  81581001 ±  9%     -96.1%    3221876        cpuidle.POLL.usage
     84602 ±  3%     +12.7%      95329 ±  5%  softirqs.CPU65.SCHED
     86631 ±  5%     +10.9%      96057 ±  6%  softirqs.CPU67.SCHED
     81448 ±  3%     +12.6%      91708        softirqs.CPU70.SCHED
     99715            +8.1%     107808 ±  2%  softirqs.CPU75.SCHED
     91997 ±  4%     +15.5%     106236 ±  2%  softirqs.CPU81.SCHED
    417904 ±  6%     +43.6%     600289 ± 16%  sched_debug.cfs_rq:/.MIN_vruntime.avg
   3142033            +9.7%    3446986 ±  4%  sched_debug.cfs_rq:/.MIN_vruntime.max
    969106           +20.4%    1166681 ±  8%  sched_debug.cfs_rq:/.MIN_vruntime.stddev
     44659 ± 12%     +21.1%      54091 ±  3%  sched_debug.cfs_rq:/.exec_clock.min
     12198 ± 12%     +24.5%      15181 ±  9%  sched_debug.cfs_rq:/.load.avg
    417904 ±  6%     +43.6%     600289 ± 16%  sched_debug.cfs_rq:/.max_vruntime.avg
   3142033            +9.7%    3446986 ±  4%  sched_debug.cfs_rq:/.max_vruntime.max
    969106           +20.4%    1166681 ±  8%  sched_debug.cfs_rq:/.max_vruntime.stddev
   1926443 ± 12%     +25.6%    2419565 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
      0.41 ±  2%     +16.3%       0.47 ±  3%  sched_debug.cfs_rq:/.nr_running.avg
    322.15 ±  2%     +13.5%     365.49 ±  4%  sched_debug.cfs_rq:/.util_est_enqueued.avg
     58399 ± 49%     -62.5%      21882 ± 74%  sched_debug.cpu.avg_idle.min
      3.74 ± 14%     -20.1%       2.99 ±  3%  sched_debug.cpu.clock.stddev
     20770 ± 50%     -65.0%       7271 ± 39%  sched_debug.cpu.max_idle_balance_cost.stddev
   8250432           -16.5%    6887763        sched_debug.cpu.nr_switches.avg
  11243220 ±  4%     -21.5%    8826971        sched_debug.cpu.nr_switches.max
   1603956 ± 26%     -52.5%     761566 ±  4%  sched_debug.cpu.nr_switches.stddev
   8248654           -16.5%    6885987        sched_debug.cpu.sched_count.avg
  11240496 ±  4%     -21.5%    8823964        sched_debug.cpu.sched_count.max
   1603802 ± 26%     -52.5%     761522 ±  4%  sched_debug.cpu.sched_count.stddev
   4123397           -16.5%    3441927        sched_debug.cpu.sched_goidle.avg
   5619132 ±  4%     -21.5%    4410755        sched_debug.cpu.sched_goidle.max
    801761 ± 26%     -52.5%     380727 ±  4%  sched_debug.cpu.sched_goidle.stddev
   4124921           -16.5%    3443709        sched_debug.cpu.ttwu_count.avg
   5620396 ±  4%     -21.5%    4412427        sched_debug.cpu.ttwu_count.max
    801796 ± 26%     -52.5%     380615 ±  4%  sched_debug.cpu.ttwu_count.stddev
  7.45e+09           -14.3%  6.382e+09        perf-stat.i.branch-instructions
      1.33            -0.1        1.24        perf-stat.i.branch-miss-rate%
  91615750           -22.0%   71469356        perf-stat.i.branch-misses
      3.80            +2.5        6.31 ± 13%  perf-stat.i.cache-miss-rate%
   8753636 ±  4%    +109.7%   18358392        perf-stat.i.cache-misses
 7.691e+08           -14.2%  6.597e+08        perf-stat.i.cache-references
   4428060           -16.4%    3704052        perf-stat.i.context-switches
      2.87           +11.2%       3.20        perf-stat.i.cpi
 8.789e+10            -5.6%  8.294e+10        perf-stat.i.cpu-cycles
     16303 ±  7%     -74.2%       4204 ±  2%  perf-stat.i.cycles-between-cache-misses
  8.94e+09           -14.0%  7.685e+09        perf-stat.i.dTLB-loads
 4.951e+09           -16.2%  4.149e+09        perf-stat.i.dTLB-stores
  57458394           -17.3%   47543962        perf-stat.i.iTLB-load-misses
  30827890           -15.9%   25930501        perf-stat.i.iTLB-loads
 3.327e+10           -14.6%  2.842e+10        perf-stat.i.instructions
    581.15            +3.3%     600.28        perf-stat.i.instructions-per-iTLB-miss
      0.36            -9.4%       0.33        perf-stat.i.ipc
      0.92            -5.6%       0.86        perf-stat.i.metric.GHz
      1.01 ±  4%     +17.6%       1.18 ±  4%  perf-stat.i.metric.K/sec
    230.75           -14.6%     197.02        perf-stat.i.metric.M/sec
     87.41            +8.0       95.42        perf-stat.i.node-load-miss-rate%
   1718045 ±  3%    +125.3%    3871440        perf-stat.i.node-load-misses
    227252 ±  3%     -71.5%      64814 ± 10%  perf-stat.i.node-loads
   1686277 ±  4%    +120.6%    3720452        perf-stat.i.node-store-misses
      1.23            -0.1        1.12        perf-stat.overall.branch-miss-rate%
      1.14 ±  5%      +1.6        2.78        perf-stat.overall.cache-miss-rate%
      2.64           +10.5%       2.92        perf-stat.overall.cpi
     10070 ±  4%     -55.1%       4519        perf-stat.overall.cycles-between-cache-misses
    579.14            +3.2%     597.84        perf-stat.overall.instructions-per-iTLB-miss
      0.38            -9.5%       0.34        perf-stat.overall.ipc
     88.31           +10.0       98.35        perf-stat.overall.node-load-miss-rate%
     97.96            +1.3       99.24        perf-stat.overall.node-store-miss-rate%
     22430            +3.3%      23175        perf-stat.overall.path-length
 7.434e+09           -14.4%  6.365e+09        perf-stat.ps.branch-instructions
  91428244           -22.0%   71275228        perf-stat.ps.branch-misses
   8723893 ±  4%    +109.8%   18304568        perf-stat.ps.cache-misses
 7.674e+08           -14.3%  6.578e+08        perf-stat.ps.cache-references
   4418679           -16.4%    3693530        perf-stat.ps.context-switches
  8.77e+10            -5.7%  8.271e+10        perf-stat.ps.cpu-cycles
 8.921e+09           -14.1%  7.664e+09        perf-stat.ps.dTLB-loads
  4.94e+09           -16.3%  4.137e+09        perf-stat.ps.dTLB-stores
  57330404           -17.3%   47408036        perf-stat.ps.iTLB-load-misses
  30765981           -15.9%   25859786        perf-stat.ps.iTLB-loads
  3.32e+10           -14.6%  2.834e+10        perf-stat.ps.instructions
   1712299 ±  3%    +125.4%    3860240        perf-stat.ps.node-load-misses
    226568 ±  3%     -71.4%      64722 ± 10%  perf-stat.ps.node-loads
   1680387 ±  4%    +120.8%    3709583        perf-stat.ps.node-store-misses
 1.302e+13           -14.7%  1.111e+13        perf-stat.total.instructions
   3591158 ±  5%     -25.1%    2688593        interrupts.CAL:Function_call_interrupts
      2328 ± 19%     +42.8%       3323 ±  3%  interrupts.CPU0.NMI:Non-maskable_interrupts
      2328 ± 19%     +42.8%       3323 ±  3%  interrupts.CPU0.PMI:Performance_monitoring_interrupts
    110354 ±  9%     -20.0%      88244 ±  4%  interrupts.CPU0.RES:Rescheduling_interrupts
    128508 ± 14%     -27.1%      93721 ±  3%  interrupts.CPU1.RES:Rescheduling_interrupts
      2180 ± 30%     +47.0%       3205 ± 15%  interrupts.CPU10.NMI:Non-maskable_interrupts
      2180 ± 30%     +47.0%       3205 ± 15%  interrupts.CPU10.PMI:Performance_monitoring_interrupts
    133107 ±  8%     -25.7%      98924 ±  2%  interrupts.CPU10.RES:Rescheduling_interrupts
    133955 ± 13%     -28.9%      95305 ±  6%  interrupts.CPU11.RES:Rescheduling_interrupts
    129709 ± 10%     -24.9%      97452 ±  8%  interrupts.CPU12.RES:Rescheduling_interrupts
    130073 ± 10%     -21.2%     102507 ±  2%  interrupts.CPU13.RES:Rescheduling_interrupts
    136313 ± 10%     -27.4%      99010 ±  3%  interrupts.CPU14.RES:Rescheduling_interrupts
    139937 ±  7%     -29.9%      98077 ±  7%  interrupts.CPU15.RES:Rescheduling_interrupts
    143424 ± 11%     -28.4%     102678 ±  7%  interrupts.CPU16.RES:Rescheduling_interrupts
    138084 ± 10%     -25.7%     102625 ±  5%  interrupts.CPU17.RES:Rescheduling_interrupts
    136238 ±  6%     -26.3%     100366 ±  7%  interrupts.CPU18.RES:Rescheduling_interrupts
    140011 ± 10%     -28.4%     100232 ±  4%  interrupts.CPU19.RES:Rescheduling_interrupts
    129720 ±  7%     -28.8%      92405 ±  7%  interrupts.CPU2.RES:Rescheduling_interrupts
     43177 ± 33%     -34.6%      28234 ±  5%  interrupts.CPU20.CAL:Function_call_interrupts
    143060 ±  6%     -28.5%     102289 ±  7%  interrupts.CPU20.RES:Rescheduling_interrupts
     39911 ± 20%     -30.4%      27788 ±  4%  interrupts.CPU21.CAL:Function_call_interrupts
    144644 ±  9%     -27.6%     104676 ±  6%  interrupts.CPU21.RES:Rescheduling_interrupts
     38543 ± 21%     -35.1%      25019 ± 14%  interrupts.CPU22.CAL:Function_call_interrupts
    144984 ±  7%     -29.9%     101700 ±  2%  interrupts.CPU22.RES:Rescheduling_interrupts
     37835 ± 15%     -22.9%      29155 ±  5%  interrupts.CPU23.CAL:Function_call_interrupts
      2089 ± 19%     +70.6%       3565 ± 20%  interrupts.CPU23.NMI:Non-maskable_interrupts
      2089 ± 19%     +70.6%       3565 ± 20%  interrupts.CPU23.PMI:Performance_monitoring_interrupts
    130192 ±  7%     -22.1%     101416 ±  5%  interrupts.CPU23.RES:Rescheduling_interrupts
     37142 ±  6%     -32.8%      24974 ±  6%  interrupts.CPU24.CAL:Function_call_interrupts
    142384 ±  5%     -31.7%      97277 ±  6%  interrupts.CPU24.RES:Rescheduling_interrupts
     32664 ±  9%     -22.2%      25422 ±  6%  interrupts.CPU25.CAL:Function_call_interrupts
    141175 ±  5%     -31.2%      97084 ±  2%  interrupts.CPU25.RES:Rescheduling_interrupts
     31023 ± 21%     -24.8%      23330 ±  7%  interrupts.CPU26.CAL:Function_call_interrupts
    131921 ±  4%     -28.9%      93831 ±  3%  interrupts.CPU26.RES:Rescheduling_interrupts
     32946 ± 19%     -26.2%      24303 ±  5%  interrupts.CPU27.CAL:Function_call_interrupts
    144853 ±  4%     -35.7%      93190 ±  2%  interrupts.CPU27.RES:Rescheduling_interrupts
    136419 ±  4%     -31.3%      93690        interrupts.CPU28.RES:Rescheduling_interrupts
     36609 ± 20%     -35.3%      23696 ±  5%  interrupts.CPU29.CAL:Function_call_interrupts
    145284 ± 10%     -36.1%      92871        interrupts.CPU29.RES:Rescheduling_interrupts
    122699 ±  7%     -23.8%      93459 ± 10%  interrupts.CPU3.RES:Rescheduling_interrupts
    250.50 ± 40%     -79.9%      50.25 ± 99%  interrupts.CPU3.TLB:TLB_shootdowns
     35689 ± 19%     -36.1%      22793 ± 11%  interrupts.CPU30.CAL:Function_call_interrupts
    152345 ±  4%     -40.3%      90991 ±  3%  interrupts.CPU30.RES:Rescheduling_interrupts
     33895 ± 10%     -15.1%      28774 ±  8%  interrupts.CPU31.CAL:Function_call_interrupts
    150590 ±  5%     -35.5%      97092 ±  7%  interrupts.CPU31.RES:Rescheduling_interrupts
     50156 ± 28%     -45.8%      27170 ±  7%  interrupts.CPU32.CAL:Function_call_interrupts
      3757 ±  7%     -43.6%       2120 ± 32%  interrupts.CPU32.NMI:Non-maskable_interrupts
      3757 ±  7%     -43.6%       2120 ± 32%  interrupts.CPU32.PMI:Performance_monitoring_interrupts
    150142 ±  3%     -36.3%      95673        interrupts.CPU32.RES:Rescheduling_interrupts
     39957 ± 25%     -34.5%      26158 ±  4%  interrupts.CPU33.CAL:Function_call_interrupts
    147066 ±  8%     -34.4%      96521 ±  2%  interrupts.CPU33.RES:Rescheduling_interrupts
    168.25 ±137%     -86.9%      22.00 ± 59%  interrupts.CPU33.TLB:TLB_shootdowns
     38357 ± 13%     -29.9%      26881 ±  5%  interrupts.CPU34.CAL:Function_call_interrupts
      3757 ±  5%     -28.5%       2686 ± 19%  interrupts.CPU34.NMI:Non-maskable_interrupts
      3757 ±  5%     -28.5%       2686 ± 19%  interrupts.CPU34.PMI:Performance_monitoring_interrupts
    140734 ±  2%     -33.3%      93841 ±  3%  interrupts.CPU34.RES:Rescheduling_interrupts
     37965 ± 17%     -25.8%      28175 ±  4%  interrupts.CPU35.CAL:Function_call_interrupts
      3934 ±  8%     -39.3%       2389 ± 13%  interrupts.CPU35.NMI:Non-maskable_interrupts
      3934 ±  8%     -39.3%       2389 ± 13%  interrupts.CPU35.PMI:Performance_monitoring_interrupts
    146074 ± 10%     -33.2%      97630 ±  2%  interrupts.CPU35.RES:Rescheduling_interrupts
     34131 ±  8%     -18.8%      27704 ±  9%  interrupts.CPU36.CAL:Function_call_interrupts
    149093 ±  3%     -35.0%      96945 ±  4%  interrupts.CPU36.RES:Rescheduling_interrupts
     44333 ± 47%     -39.7%      26745 ±  7%  interrupts.CPU37.CAL:Function_call_interrupts
    149936 ±  4%     -34.3%      98542 ±  3%  interrupts.CPU37.RES:Rescheduling_interrupts
     41199 ± 28%     -30.2%      28741 ±  6%  interrupts.CPU38.CAL:Function_call_interrupts
    154224 ±  3%     -31.6%     105443 ±  7%  interrupts.CPU38.RES:Rescheduling_interrupts
     36925 ±  8%     -24.3%      27942 ±  5%  interrupts.CPU39.CAL:Function_call_interrupts
    150490 ±  2%     -32.5%     101625 ±  4%  interrupts.CPU39.RES:Rescheduling_interrupts
    122742 ± 15%     -25.4%      91596 ±  5%  interrupts.CPU4.RES:Rescheduling_interrupts
    143639 ±  9%     -29.4%     101407 ±  2%  interrupts.CPU40.RES:Rescheduling_interrupts
     43235 ± 10%     -30.9%      29877 ±  4%  interrupts.CPU41.CAL:Function_call_interrupts
    158981 ±  5%     -32.8%     106760 ±  4%  interrupts.CPU41.RES:Rescheduling_interrupts
     47792 ± 33%     -37.7%      29769 ±  5%  interrupts.CPU42.CAL:Function_call_interrupts
      3455 ± 11%     -32.2%       2343 ± 36%  interrupts.CPU42.NMI:Non-maskable_interrupts
      3455 ± 11%     -32.2%       2343 ± 36%  interrupts.CPU42.PMI:Performance_monitoring_interrupts
    160241 ±  5%     -34.0%     105793 ±  4%  interrupts.CPU42.RES:Rescheduling_interrupts
     54419 ± 52%     -44.1%      30408 ±  2%  interrupts.CPU43.CAL:Function_call_interrupts
      3726 ± 11%     -38.7%       2285 ± 39%  interrupts.CPU43.NMI:Non-maskable_interrupts
      3726 ± 11%     -38.7%       2285 ± 39%  interrupts.CPU43.PMI:Performance_monitoring_interrupts
    156010           -32.4%     105516 ±  2%  interrupts.CPU43.RES:Rescheduling_interrupts
     69033 ± 79%     -56.0%      30393 ±  7%  interrupts.CPU44.CAL:Function_call_interrupts
    152478 ±  6%     -30.4%     106187 ±  4%  interrupts.CPU44.RES:Rescheduling_interrupts
     49434 ± 49%     -38.5%      30404 ±  9%  interrupts.CPU45.CAL:Function_call_interrupts
    153770 ±  7%     -32.2%     104200 ±  3%  interrupts.CPU45.RES:Rescheduling_interrupts
     56303 ± 52%     -50.4%      27914 ±  4%  interrupts.CPU46.CAL:Function_call_interrupts
      3924 ± 20%     -48.7%       2012 ± 50%  interrupts.CPU46.NMI:Non-maskable_interrupts
      3924 ± 20%     -48.7%       2012 ± 50%  interrupts.CPU46.PMI:Performance_monitoring_interrupts
    152891 ± 11%     -31.7%     104494 ±  5%  interrupts.CPU46.RES:Rescheduling_interrupts
     42970 ± 30%     -29.9%      30107 ±  9%  interrupts.CPU47.CAL:Function_call_interrupts
      3940 ±  8%     -40.8%       2332 ± 38%  interrupts.CPU47.NMI:Non-maskable_interrupts
      3940 ±  8%     -40.8%       2332 ± 38%  interrupts.CPU47.PMI:Performance_monitoring_interrupts
    146615 ±  5%     -27.7%     106013 ±  4%  interrupts.CPU47.RES:Rescheduling_interrupts
    146863 ±  5%     -18.4%     119774 ±  3%  interrupts.CPU48.RES:Rescheduling_interrupts
    136692 ±  8%     -16.3%     114405 ±  7%  interrupts.CPU49.RES:Rescheduling_interrupts
     29311 ±  6%     -12.4%      25673 ±  4%  interrupts.CPU5.CAL:Function_call_interrupts
    129497 ±  7%     -27.1%      94375 ±  6%  interrupts.CPU5.RES:Rescheduling_interrupts
    143797 ± 11%     -21.0%     113564 ±  4%  interrupts.CPU50.RES:Rescheduling_interrupts
      2891 ± 16%     +31.3%       3797 ± 12%  interrupts.CPU51.NMI:Non-maskable_interrupts
      2891 ± 16%     +31.3%       3797 ± 12%  interrupts.CPU51.PMI:Performance_monitoring_interrupts
    139766 ±  2%     -19.6%     112352 ±  8%  interrupts.CPU51.RES:Rescheduling_interrupts
    137319 ±  4%     -20.3%     109422 ±  5%  interrupts.CPU52.RES:Rescheduling_interrupts
    138705 ±  5%     -21.3%     109158 ±  8%  interrupts.CPU53.RES:Rescheduling_interrupts
      2426 ± 28%     +42.8%       3464 ± 19%  interrupts.CPU54.NMI:Non-maskable_interrupts
      2426 ± 28%     +42.8%       3464 ± 19%  interrupts.CPU54.PMI:Performance_monitoring_interrupts
    140683 ± 11%     -24.0%     106919 ±  4%  interrupts.CPU54.RES:Rescheduling_interrupts
     38238 ± 13%     -22.9%      29493 ±  6%  interrupts.CPU55.CAL:Function_call_interrupts
      3043 ±  8%     +18.7%       3612 ±  7%  interrupts.CPU55.NMI:Non-maskable_interrupts
      3043 ±  8%     +18.7%       3612 ±  7%  interrupts.CPU55.PMI:Performance_monitoring_interrupts
    143657 ± 10%     -25.0%     107806 ±  6%  interrupts.CPU55.RES:Rescheduling_interrupts
    131036 ±  8%     -21.3%     103177 ±  4%  interrupts.CPU56.RES:Rescheduling_interrupts
    131204 ± 12%     -21.2%     103444 ± 10%  interrupts.CPU57.RES:Rescheduling_interrupts
    122041 ± 12%     -15.9%     102674 ±  7%  interrupts.CPU58.RES:Rescheduling_interrupts
    167.25 ± 65%     -64.7%      59.00 ±157%  interrupts.CPU58.TLB:TLB_shootdowns
      1883 ± 33%     +61.6%       3042 ±  3%  interrupts.CPU6.NMI:Non-maskable_interrupts
      1883 ± 33%     +61.6%       3042 ±  3%  interrupts.CPU6.PMI:Performance_monitoring_interrupts
    132101 ± 12%     -27.0%      96457 ±  8%  interrupts.CPU6.RES:Rescheduling_interrupts
      1832 ± 24%     +69.3%       3102 ± 32%  interrupts.CPU64.NMI:Non-maskable_interrupts
      1832 ± 24%     +69.3%       3102 ± 32%  interrupts.CPU64.PMI:Performance_monitoring_interrupts
    107979 ±  8%     -11.6%      95452        interrupts.CPU66.RES:Rescheduling_interrupts
     97965 ±  3%     -15.1%      83199 ±  2%  interrupts.CPU69.RES:Rescheduling_interrupts
    126380 ± 11%     -24.6%      95257 ±  5%  interrupts.CPU7.RES:Rescheduling_interrupts
      1820 ± 40%     +60.9%       2929 ± 35%  interrupts.CPU70.NMI:Non-maskable_interrupts
      1820 ± 40%     +60.9%       2929 ± 35%  interrupts.CPU70.PMI:Performance_monitoring_interrupts
    171279 ±  5%     -29.4%     120994 ±  5%  interrupts.CPU72.RES:Rescheduling_interrupts
     50761 ± 40%     -35.0%      32979 ±  7%  interrupts.CPU73.CAL:Function_call_interrupts
    173132 ±  7%     -31.5%     118555 ±  5%  interrupts.CPU73.RES:Rescheduling_interrupts
     43479 ± 17%     -25.8%      32276 ±  3%  interrupts.CPU74.CAL:Function_call_interrupts
      3755 ±  9%     -31.7%       2564 ± 31%  interrupts.CPU74.NMI:Non-maskable_interrupts
      3755 ±  9%     -31.7%       2564 ± 31%  interrupts.CPU74.PMI:Performance_monitoring_interrupts
    167124 ±  7%     -28.8%     119063 ±  4%  interrupts.CPU74.RES:Rescheduling_interrupts
    164069 ±  7%     -26.6%     120499 ±  4%  interrupts.CPU75.RES:Rescheduling_interrupts
    166858 ±  6%     -28.4%     119453 ±  4%  interrupts.CPU76.RES:Rescheduling_interrupts
    157535 ±  6%     -25.5%     117419 ±  4%  interrupts.CPU77.RES:Rescheduling_interrupts
    165642 ±  8%     -25.9%     122719 ±  8%  interrupts.CPU78.RES:Rescheduling_interrupts
    162781 ±  5%     -29.0%     115600 ±  3%  interrupts.CPU79.RES:Rescheduling_interrupts
    132224 ± 11%     -26.6%      97010        interrupts.CPU8.RES:Rescheduling_interrupts
    167082 ±  9%     -30.7%     115794 ±  4%  interrupts.CPU80.RES:Rescheduling_interrupts
     49639 ± 37%     -35.1%      32228 ±  2%  interrupts.CPU81.CAL:Function_call_interrupts
    144305 ±  5%     -18.3%     117926 ±  4%  interrupts.CPU81.RES:Rescheduling_interrupts
    151333 ±  7%     -23.2%     116159 ±  3%  interrupts.CPU82.RES:Rescheduling_interrupts
    142398 ±  8%     -21.1%     112399 ±  7%  interrupts.CPU83.RES:Rescheduling_interrupts
    144455 ±  2%     -20.5%     114911        interrupts.CPU84.RES:Rescheduling_interrupts
    149850 ±  9%     -24.3%     113396 ±  5%  interrupts.CPU85.RES:Rescheduling_interrupts
     34458 ±  4%     -14.4%      29487 ±  8%  interrupts.CPU86.CAL:Function_call_interrupts
    138603 ±  6%     -22.7%     107133 ±  2%  interrupts.CPU86.RES:Rescheduling_interrupts
     39228 ±  7%     -25.5%      29231 ±  4%  interrupts.CPU87.CAL:Function_call_interrupts
    151814 ±  8%     -31.1%     104629 ±  5%  interrupts.CPU87.RES:Rescheduling_interrupts
    137356 ±  8%     -20.2%     109634 ±  3%  interrupts.CPU88.RES:Rescheduling_interrupts
    143613 ± 10%     -28.9%     102166 ± 10%  interrupts.CPU89.RES:Rescheduling_interrupts
    122375 ±  8%     -19.2%      98901 ±  3%  interrupts.CPU9.RES:Rescheduling_interrupts
    140781 ±  6%     -25.0%     105531 ±  3%  interrupts.CPU90.RES:Rescheduling_interrupts
    138917 ± 12%     -24.9%     104264 ±  5%  interrupts.CPU91.RES:Rescheduling_interrupts
    146814 ± 14%     -29.2%     103902 ±  4%  interrupts.CPU92.RES:Rescheduling_interrupts
    132220 ± 15%     -21.3%     104095 ±  2%  interrupts.CPU93.RES:Rescheduling_interrupts
    133.00 ± 88%     -87.6%      16.50 ± 72%  interrupts.CPU93.TLB:TLB_shootdowns
    125991 ±  5%     -19.0%     101995 ±  2%  interrupts.CPU94.RES:Rescheduling_interrupts
    115838 ±  9%     -17.2%      95959 ±  3%  interrupts.CPU95.RES:Rescheduling_interrupts
  13255498 ±  2%     -25.6%    9859155        interrupts.RES:Rescheduling_interrupts
      7.59 ±  2%      -1.5        6.04        perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.43 ±  2%      -1.5        5.91        perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
      6.03 ±  4%      -1.0        5.06        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.90 ±  4%      -1.0        4.95        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.44 ±  3%      -0.9        3.51        perf-profile.calltrace.cycles-pp.schedule.pipe_read.new_sync_read.vfs_read.ksys_read
      2.29 ±  4%      -0.9        1.38 ±  2%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.new_sync_read
      4.07 ±  3%      -0.9        3.21        perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.new_sync_read.vfs_read
      2.62 ±  3%      -0.9        1.76 ±  4%  perf-profile.calltrace.cycles-pp.read
      3.68 ±  2%      -0.8        2.83        perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      2.06 ±  4%      -0.8        1.22        perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read
      3.58 ±  2%      -0.8        2.76        perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      2.37 ±  3%      -0.8        1.58 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
      2.29 ±  3%      -0.8        1.53 ±  4%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      2.26 ±  3%      -0.8        1.50 ±  4%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      2.21 ±  3%      -0.7        1.47 ±  4%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
      4.25 ±  3%      -0.7        3.51        perf-profile.calltrace.cycles-pp.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate
      2.14 ±  4%      -0.6        1.52        perf-profile.calltrace.cycles-pp.unwind_next_frame.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity
      3.48 ±  4%      -0.6        2.90 ±  2%  perf-profile.calltrace.cycles-pp.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
      1.93 ±  3%      -0.5        1.48        perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule_idle.do_idle.cpu_startup_entry
      1.54 ±  4%      -0.4        1.18        perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      1.38 ±  3%      -0.3        1.04 ±  2%  perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule_idle.do_idle
      0.72 ±  4%      -0.1        0.58 ±  3%  perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
      0.66 ±  4%      -0.1        0.54 ±  2%  perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read.ksys_read
     46.28            +0.5       46.74        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
      0.14 ±173%      +0.5        0.66 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.14 ±173%      +0.5        0.66 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel
      0.15 ±173%      +0.6        0.71 ±  8%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.15 ±173%      +0.6        0.71 ±  8%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.15 ±173%      +0.6        0.71 ±  8%  perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify
      7.85 ±  2%      +0.8        8.64 ±  3%  perf-profile.calltrace.cycles-pp.write
      7.77 ±  2%      +0.8        8.58 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
      7.73 ±  2%      +0.8        8.55 ±  3%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.69 ±  3%      +0.8        8.53 ±  3%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      7.64 ±  3%      +0.9        8.49 ±  3%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     35.29            +0.9       36.15        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     35.15            +0.9       36.02        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     42.35            +1.8       44.15        perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     42.22            +1.8       44.06        perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
     38.77            +1.9       40.67        perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     38.65            +1.9       40.56        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
     40.84            +2.1       42.96        perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
     40.50            +2.1       42.65        perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
     40.15            +2.2       42.36        perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
     40.07            +2.2       42.29        perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
     37.50            +2.7       40.20        perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
     37.47            +2.7       40.18        perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
     36.96            +2.9       39.84        perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
     36.62 ±  2%      +3.2       39.86        perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
     34.50            +3.3       37.80        perf-profile.calltrace.cycles-pp.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
     29.96 ±  2%      +4.1       34.04        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate
     29.13 ±  2%      +4.1       33.22        perf-profile.calltrace.cycles-pp.__cna_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
      8.30 ±  2%      -1.7        6.58        perf-profile.children.cycles-pp.ksys_read
      8.12 ±  2%      -1.7        6.42        perf-profile.children.cycles-pp.vfs_read
      7.75 ±  2%      -1.7        6.06        perf-profile.children.cycles-pp.__schedule
      7.59 ±  2%      -1.5        6.05        perf-profile.children.cycles-pp.new_sync_read
      7.45 ±  2%      -1.5        5.94        perf-profile.children.cycles-pp.pipe_read
      4.44 ±  3%      -0.9        3.52        perf-profile.children.cycles-pp.schedule
      2.65 ±  3%      -0.9        1.78 ±  4%  perf-profile.children.cycles-pp.read
      3.70 ±  2%      -0.8        2.87        perf-profile.children.cycles-pp.schedule_idle
      4.28 ±  3%      -0.7        3.54        perf-profile.children.cycles-pp.stack_trace_save_tsk
      0.80 ± 35%      -0.7        0.13 ±  5%  perf-profile.children.cycles-pp.poll_idle
      3.54 ±  3%      -0.6        2.94 ±  2%  perf-profile.children.cycles-pp.arch_stack_walk
      2.02 ±  3%      -0.6        1.43 ±  2%  perf-profile.children.cycles-pp.update_load_avg
      2.15 ±  3%      -0.5        1.67        perf-profile.children.cycles-pp.pick_next_task_fair
      2.30 ±  4%      -0.5        1.82        perf-profile.children.cycles-pp.dequeue_task_fair
      2.10 ±  4%      -0.5        1.63 ±  2%  perf-profile.children.cycles-pp.dequeue_entity
      1.56 ±  4%      -0.4        1.20        perf-profile.children.cycles-pp.menu_select
      1.39 ±  3%      -0.3        1.06 ±  2%  perf-profile.children.cycles-pp.set_next_entity
      0.46 ± 13%      -0.3        0.15 ±  3%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.92 ±  3%      -0.2        0.70 ±  2%  perf-profile.children.cycles-pp.prepare_to_wait_event
      1.13            -0.2        0.92 ±  3%  perf-profile.children.cycles-pp.asm_call_sysvec_on_stack
      0.33 ±  9%      -0.2        0.12 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.32 ± 10%      -0.2        0.11 ±  3%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.61 ±  3%      -0.2        0.41 ±  4%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.32 ± 10%      -0.2        0.11 ±  4%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.47 ±  6%      -0.2        0.28        perf-profile.children.cycles-pp.finish_task_switch
      0.56 ±  5%      -0.2        0.36 ±  3%  perf-profile.children.cycles-pp.unwind_get_return_address
      0.50 ±  6%      -0.2        0.32 ±  4%  perf-profile.children.cycles-pp.__kernel_text_address
      0.96 ±  5%      -0.2        0.78        perf-profile.children.cycles-pp.update_curr
      0.44 ±  6%      -0.2        0.27 ±  4%  perf-profile.children.cycles-pp.kernel_text_address
      2.17 ±  4%      -0.2        2.00        perf-profile.children.cycles-pp.unwind_next_frame
      0.73 ±  3%      -0.2        0.56 ±  4%  perf-profile.children.cycles-pp.select_task_rq_fair
      0.95            -0.2        0.79 ±  2%  perf-profile.children.cycles-pp.update_rq_clock
      0.74 ±  4%      -0.1        0.59 ±  4%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.53 ±  3%      -0.1        0.40 ±  5%  perf-profile.children.cycles-pp.ktime_get
      0.41 ±  4%      -0.1        0.28 ±  3%  perf-profile.children.cycles-pp.stack_trace_consume_entry_nosched
      0.71            -0.1        0.59 ±  3%  perf-profile.children.cycles-pp.mutex_lock
      0.50 ±  2%      -0.1        0.38 ±  3%  perf-profile.children.cycles-pp.tick_nohz_idle_exit
      0.44            -0.1        0.33        perf-profile.children.cycles-pp.__orc_find
      0.52 ±  2%      -0.1        0.41 ±  3%  perf-profile.children.cycles-pp.copy_page_to_iter
      0.15 ± 19%      -0.1        0.05 ±  8%  perf-profile.children.cycles-pp.flush_smp_call_function_from_idle
      0.44 ±  4%      -0.1        0.34 ±  2%  perf-profile.children.cycles-pp.security_file_permission
      0.53 ±  2%      -0.1        0.43        perf-profile.children.cycles-pp.__switch_to
      0.48 ±  3%      -0.1        0.38 ±  3%  perf-profile.children.cycles-pp.__switch_to_asm
      0.37 ±  3%      -0.1        0.27 ±  4%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.67 ±  2%      -0.1        0.57 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
      0.32 ±  4%      -0.1        0.22 ±  4%  perf-profile.children.cycles-pp.copy_page_from_iter
      0.38 ±  4%      -0.1        0.29 ±  5%  perf-profile.children.cycles-pp.select_idle_sibling
      0.45 ±  5%      -0.1        0.37 ±  4%  perf-profile.children.cycles-pp.tick_nohz_next_event
      0.29 ±  4%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.tick_nohz_idle_enter
      0.64 ±  2%      -0.1        0.57 ±  3%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.38 ±  3%      -0.1        0.31 ±  4%  perf-profile.children.cycles-pp.copyout
      0.27 ±  6%      -0.1        0.19 ±  6%  perf-profile.children.cycles-pp.orc_find
      0.40 ±  2%      -0.1        0.33 ±  5%  perf-profile.children.cycles-pp.copy_user_generic_unrolled
      0.35 ±  4%      -0.1        0.28        perf-profile.children.cycles-pp.pick_next_entity
      0.38 ±  4%      -0.1        0.31        perf-profile.children.cycles-pp.update_cfs_group
      0.22 ±  4%      -0.1        0.16 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
      0.30 ±  5%      -0.1        0.23 ±  3%  perf-profile.children.cycles-pp.__unwind_start
      0.32 ±  4%      -0.1        0.26        perf-profile.children.cycles-pp.ttwu_do_wakeup
      0.20 ±  4%      -0.1        0.14 ±  9%  perf-profile.children.cycles-pp.__might_sleep
      0.28 ±  6%      -0.1        0.22 ±  5%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.27 ±  4%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.common_file_perm
      0.18 ±  3%      -0.1        0.12 ±  3%  perf-profile.children.cycles-pp.in_sched_functions
      0.30 ±  4%      -0.1        0.24        perf-profile.children.cycles-pp.check_preempt_curr
      0.22 ±  4%      -0.1        0.17 ±  4%  perf-profile.children.cycles-pp.rcu_idle_exit
      0.34 ±  3%      -0.1        0.28 ±  2%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.30 ±  4%      -0.1        0.24 ±  4%  perf-profile.children.cycles-pp.update_ts_time_stats
      0.31 ±  5%      -0.1        0.25 ±  4%  perf-profile.children.cycles-pp.nr_iowait_cpu
      0.31 ±  3%      -0.1        0.26 ±  3%  perf-profile.children.cycles-pp.sched_clock
      0.21 ±  5%      -0.1        0.16 ±  7%  perf-profile.children.cycles-pp.cpus_share_cache
      0.17 ± 10%      -0.1        0.11 ±  7%  perf-profile.children.cycles-pp.place_entity
      0.28 ±  3%      -0.1        0.23 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.18 ±  4%      -0.1        0.13 ±  3%  perf-profile.children.cycles-pp.resched_curr
      0.33 ±  2%      -0.0        0.28 ±  2%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.29 ±  3%      -0.0        0.24 ±  2%  perf-profile.children.cycles-pp.mutex_unlock
      0.23 ±  3%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.26 ±  3%      -0.0        0.21        perf-profile.children.cycles-pp.___might_sleep
      0.20 ±  6%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__list_del_entry_valid
      0.29 ±  5%      -0.0        0.25 ±  3%  perf-profile.children.cycles-pp.native_sched_clock
      0.24 ±  5%      -0.0        0.19 ±  5%  perf-profile.children.cycles-pp.get_next_timer_interrupt
      0.12 ±  5%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.cpuidle_governor_latency_req
      0.23 ±  8%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.hrtimer_next_event_without
      0.21 ±  3%      -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.read_tsc
      0.14 ±  3%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.rcu_eqs_exit
      0.12 ±  4%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__entry_text_start
      0.19 ±  2%      -0.0        0.15 ±  5%  perf-profile.children.cycles-pp.__fdget_pos
      0.08 ±  6%      -0.0        0.04 ± 58%  perf-profile.children.cycles-pp.rcu_dynticks_eqs_exit
      0.07 ± 10%      -0.0        0.04 ± 57%  perf-profile.children.cycles-pp.put_prev_entity
      0.11 ± 13%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.put_prev_task_fair
      0.17 ±  4%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.15 ±  7%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.hrtimer_get_next_event
      0.16 ±  2%      -0.0        0.14 ±  6%  perf-profile.children.cycles-pp.__fget_light
      0.13 ± 10%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.is_bpf_text_address
      0.11 ±  6%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.file_update_time
      0.14 ±  6%      -0.0        0.11 ± 11%  perf-profile.children.cycles-pp.__wrgsbase_inactive
      0.14 ±  8%      -0.0        0.11 ±  7%  perf-profile.children.cycles-pp.available_idle_cpu
      0.09 ±  4%      -0.0        0.06 ± 13%  perf-profile.children.cycles-pp.menu_reflect
      0.13 ±  9%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.stack_access_ok
      0.14 ±  5%      -0.0        0.12 ±  7%  perf-profile.children.cycles-pp.switch_fpu_return
      0.10 ±  8%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.current_time
      0.09 ±  9%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.__rdgsbase_inactive
      0.10            -0.0        0.08        perf-profile.children.cycles-pp.__calc_delta
      0.09 ± 10%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.bpf_ksym_find
      0.07 ± 10%      -0.0        0.05        perf-profile.children.cycles-pp.pick_next_task_idle
      0.18 ±  3%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp.fsnotify
      0.17 ±  5%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.copy_fpregs_to_fpstate
      0.07 ±  6%      -0.0        0.05        perf-profile.children.cycles-pp.put_task_stack
      0.07 ±  6%      -0.0        0.05        perf-profile.children.cycles-pp.apparmor_file_permission
      0.07 ± 12%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.07 ±  6%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.update_min_vruntime
      0.17 ±  2%      -0.0        0.16        perf-profile.children.cycles-pp.anon_pipe_buf_release
      0.07 ±  5%      -0.0        0.06        perf-profile.children.cycles-pp.atime_needs_update
      0.08 ±  5%      -0.0        0.07        perf-profile.children.cycles-pp.finish_wait
      0.48 ± 14%      +0.2        0.71 ±  8%  perf-profile.children.cycles-pp.start_kernel
     46.28            +0.5       46.74        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     46.28            +0.5       46.74        perf-profile.children.cycles-pp.cpu_startup_entry
     46.25            +0.5       46.71        perf-profile.children.cycles-pp.do_idle
      7.88 ±  2%      +0.8        8.65 ±  3%  perf-profile.children.cycles-pp.write
     42.99            +1.7       44.69        perf-profile.children.cycles-pp.ksys_write
     42.80            +1.7       44.53        perf-profile.children.cycles-pp.vfs_write
     42.37            +1.8       44.16        perf-profile.children.cycles-pp.new_sync_write
     42.23            +1.8       44.06        perf-profile.children.cycles-pp.pipe_write
     39.21            +2.1       41.33        perf-profile.children.cycles-pp.cpuidle_enter
     40.84            +2.1       42.96        perf-profile.children.cycles-pp.__wake_up_common_lock
     39.20            +2.1       41.32        perf-profile.children.cycles-pp.cpuidle_enter_state
     40.50            +2.2       42.65        perf-profile.children.cycles-pp.__wake_up_common
     40.15            +2.2       42.36        perf-profile.children.cycles-pp.autoremove_wake_function
     40.09            +2.2       42.30        perf-profile.children.cycles-pp.try_to_wake_up
     37.97            +2.4       40.36        perf-profile.children.cycles-pp.ttwu_do_activate
     37.94            +2.4       40.33        perf-profile.children.cycles-pp.enqueue_task_fair
     37.50            +2.5       40.05        perf-profile.children.cycles-pp.enqueue_entity
     36.91            +2.9       39.86        perf-profile.children.cycles-pp.intel_idle
     34.95            +3.0       37.95        perf-profile.children.cycles-pp.__account_scheduler_latency
     31.46            +3.5       35.00        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     29.71 ±  2%      +3.8       33.52        perf-profile.children.cycles-pp.__cna_queued_spin_lock_slowpath
      0.71 ± 39%      -0.7        0.05 ±  8%  perf-profile.self.cycles-pp.poll_idle
      1.08 ±  3%      -0.3        0.78        perf-profile.self.cycles-pp.update_load_avg
      1.24 ±  2%      -0.2        1.02 ±  2%  perf-profile.self.cycles-pp.__schedule
      1.86            -0.2        1.65        perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.59 ±  3%      -0.2        0.40 ±  4%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.95 ±  3%      -0.2        0.75 ±  2%  perf-profile.self.cycles-pp.set_next_entity
      0.66 ±  4%      -0.2        0.51 ±  6%  perf-profile.self.cycles-pp.menu_select
      0.43 ±  5%      -0.1        0.28 ±  3%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.53 ±  3%      -0.1        0.40 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock
      0.67 ±  2%      -0.1        0.54 ±  2%  perf-profile.self.cycles-pp.stack_trace_save_tsk
      0.77 ±  2%      -0.1        0.64 ±  2%  perf-profile.self.cycles-pp.update_rq_clock
      0.72 ±  8%      -0.1        0.60        perf-profile.self.cycles-pp.update_curr
      0.44            -0.1        0.33        perf-profile.self.cycles-pp.__orc_find
      0.56 ±  2%      -0.1        0.45 ±  3%  perf-profile.self.cycles-pp.pipe_read
      0.33 ±  4%      -0.1        0.22        perf-profile.self.cycles-pp.prepare_to_wait_event
      0.48 ±  3%      -0.1        0.38 ±  3%  perf-profile.self.cycles-pp.__switch_to_asm
      0.32 ±  2%      -0.1        0.22 ±  7%  perf-profile.self.cycles-pp.ktime_get
      0.47            -0.1        0.38 ±  2%  perf-profile.self.cycles-pp.__switch_to
      0.35 ±  2%      -0.1        0.26 ±  5%  perf-profile.self.cycles-pp.select_task_rq_fair
      0.28 ±  5%      -0.1        0.20 ±  3%  perf-profile.self.cycles-pp.dequeue_entity
      0.23 ±  6%      -0.1        0.15 ±  3%  perf-profile.self.cycles-pp.stack_trace_consume_entry_nosched
      0.46 ±  3%      -0.1        0.39 ±  5%  perf-profile.self.cycles-pp.mutex_lock
      0.32 ±  4%      -0.1        0.25 ±  4%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.39 ±  3%      -0.1        0.32 ±  6%  perf-profile.self.cycles-pp.copy_user_generic_unrolled
      0.45 ±  3%      -0.1        0.38        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.19 ±  6%      -0.1        0.12 ±  8%  perf-profile.self.cycles-pp.vfs_read
      0.34 ±  4%      -0.1        0.27 ±  2%  perf-profile.self.cycles-pp.pick_next_entity
      0.84 ±  2%      -0.1        0.77        perf-profile.self.cycles-pp.enqueue_entity
      0.28 ±  5%      -0.1        0.21 ±  5%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.19 ±  4%      -0.1        0.12 ± 10%  perf-profile.self.cycles-pp.__might_sleep
      0.35 ±  3%      -0.1        0.29        perf-profile.self.cycles-pp.__wake_up_common
      0.19 ±  4%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.___perf_sw_event
      0.47 ±  2%      -0.1        0.41 ±  2%  perf-profile.self.cycles-pp.do_idle
      0.27 ±  4%      -0.1        0.21 ±  3%  perf-profile.self.cycles-pp.__unwind_start
      0.22 ±  6%      -0.1        0.16 ±  2%  perf-profile.self.cycles-pp.finish_task_switch
      0.34 ±  3%      -0.1        0.29        perf-profile.self.cycles-pp.schedule
      0.35 ±  6%      -0.1        0.29 ±  2%  perf-profile.self.cycles-pp.update_cfs_group
      0.24 ±  6%      -0.1        0.19 ±  4%  perf-profile.self.cycles-pp.orc_find
      0.21 ±  5%      -0.1        0.16 ±  7%  perf-profile.self.cycles-pp.cpus_share_cache
      0.30 ±  7%      -0.1        0.25 ±  5%  perf-profile.self.cycles-pp.nr_iowait_cpu
      0.18 ±  4%      -0.1        0.13        perf-profile.self.cycles-pp.resched_curr
      0.29 ±  3%      -0.1        0.24 ±  2%  perf-profile.self.cycles-pp.mutex_unlock
      0.16 ±  9%      -0.1        0.11 ±  6%  perf-profile.self.cycles-pp.place_entity
      0.32 ±  5%      -0.0        0.27        perf-profile.self.cycles-pp.cpuidle_enter_state
      0.23 ±  3%      -0.0        0.18 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.22 ±  4%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.common_file_perm
      0.12 ±  3%      -0.0        0.08 ± 11%  perf-profile.self.cycles-pp.in_sched_functions
      0.28 ±  3%      -0.0        0.24 ±  3%  perf-profile.self.cycles-pp.native_sched_clock
      0.20 ±  4%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.__list_del_entry_valid
      0.12 ±  5%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.new_sync_write
      0.25            -0.0        0.21        perf-profile.self.cycles-pp.___might_sleep
      0.20 ±  4%      -0.0        0.16 ±  5%  perf-profile.self.cycles-pp.vfs_write
      0.07 ±  7%      -0.0        0.03 ±100%  perf-profile.self.cycles-pp.main
      0.29 ±  2%      -0.0        0.25 ±  4%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.21 ±  2%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.read_tsc
      0.07 ±  5%      -0.0        0.04 ± 58%  perf-profile.self.cycles-pp.rcu_dynticks_eqs_exit
      0.12 ±  6%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.new_sync_read
      0.21 ±  2%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.12 ±  6%      -0.0        0.10 ±  9%  perf-profile.self.cycles-pp.arch_stack_walk
      0.07 ±  6%      -0.0        0.04 ± 57%  perf-profile.self.cycles-pp.update_min_vruntime
      0.11 ±  4%      -0.0        0.08 ± 10%  perf-profile.self.cycles-pp.kernel_text_address
      0.23 ±  7%      -0.0        0.21 ±  5%  perf-profile.self.cycles-pp.__account_scheduler_latency
      0.14 ±  6%      -0.0        0.11 ± 11%  perf-profile.self.cycles-pp.__wrgsbase_inactive
      0.09 ±  9%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.__entry_text_start
      0.08 ±  5%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.copy_page_to_iter
      0.19 ±  6%      -0.0        0.17 ±  5%  perf-profile.self.cycles-pp.pipe_write
      0.15 ±  3%      -0.0        0.13 ±  5%  perf-profile.self.cycles-pp.__fget_light
      0.06 ±  6%      -0.0        0.04 ± 57%  perf-profile.self.cycles-pp.unwind_get_return_address
      0.14 ±  7%      -0.0        0.12 ±  7%  perf-profile.self.cycles-pp.switch_fpu_return
      0.09 ±  9%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.tick_nohz_next_event
      0.08 ± 11%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.__hrtimer_next_event_base
      0.16            -0.0        0.14 ±  6%  perf-profile.self.cycles-pp.pick_next_task_fair
      0.09 ±  9%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.__rdgsbase_inactive
      0.06            -0.0        0.04 ± 57%  perf-profile.self.cycles-pp.copy_page_from_iter
      0.14 ±  6%      -0.0        0.11 ±  7%  perf-profile.self.cycles-pp.available_idle_cpu
      0.08 ± 16%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.call_cpuidle
      0.10 ±  8%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.09 ±  5%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.rcu_idle_exit
      0.19 ±  3%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.dequeue_task_fair
      0.10 ±  4%      -0.0        0.08        perf-profile.self.cycles-pp.__calc_delta
      0.17 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.anon_pipe_buf_release
      0.17 ±  4%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.copy_fpregs_to_fpstate
      0.06 ±  6%      -0.0        0.05        perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.06 ±  6%      -0.0        0.05        perf-profile.self.cycles-pp.put_task_stack
     36.91            +2.9       39.86        perf-profile.self.cycles-pp.intel_idle
     29.30 ±  2%      +3.9       33.15        perf-profile.self.cycles-pp.__cna_queued_spin_lock_slowpath


                                                                                
                       unixbench.time.voluntary_context_switches                
                                                                                
  4.4e+08 +-----------------------------------------------------------------+   
          |                                                    +..  +..+. ..|   
  4.3e+08 |-+                                                  :   +     +  |   
  4.2e+08 |-+                                                 :   +         |   
          |                                                   :             |   
  4.1e+08 |-+                                                 :             |   
    4e+08 |-+               +.            .+.. .+..+.+..+.   :              |   
          |               ..  +..+.+..+.+.    +           +..+              |   
  3.9e+08 |..+.+..+.+..+.+                                                  |   
  3.8e+08 |-+                                                               |   
          |                                                                 |   
  3.7e+08 |-+                      O    O  O    O  O                        |   
  3.6e+08 |-+                         O       O      O  O O  O O  O O  O O  |   
          |  O      O  O O  O O                                             |   
  3.5e+08 +-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                  unixbench.score                               
                                                                                
  3800 +--------------------------------------------------------------------+   
       |                                                              .+.  .|   
  3700 |-+                                                     +.  .+.   +. |   
  3600 |-+                                                    :  +.         |   
       |                                                      :             |   
  3500 |-+                                                   :              |   
  3400 |-+                                    .+.+..+..+.    :              |   
       |                 .+.+..+..+.+..+..+.+.           +..+               |   
  3300 |..+.+..+..+.+..+.                                                   |   
  3200 |-+                                                                  |   
       |                                                                    |   
  3100 |-+                        O O  O  O O  O O  O  O    O  O O     O O  |   
  3000 |-+O         O  O    O                            O          O       |   
       |    O  O  O       O    O                                            |   
  2900 +--------------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                  unixbench.workload                            
                                                                                
    6e+08 +-----------------------------------------------------------------+   
          |                                                                 |   
  5.8e+08 |-+                                                  +..  +..+. ..|   
          |                                                    :   +     +  |   
  5.6e+08 |-+                                                 :   +         |   
          |                                                   :             |   
  5.4e+08 |-+                                                 :             |   
          |                .+.    .+..    .+..+.+..+.+..+.   :              |   
  5.2e+08 |.. .+..    .+.+.   +..+    +.+.                +..+              |   
          |  +    +.+.                                                      |   
    5e+08 |-+                                                               |   
          |                        O    O  O    O  O                        |   
  4.8e+08 |-+                         O       O      O  O O  O O  O O  O O  |   
          |  O O    O  O O  O O  O                                          |   
  4.6e+08 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


[-- Attachment #2: 0006-locking-qspinlock-Introduce-the-shuffle-reduction-op.patch --]
[-- Type: text/x-patch, Size: 3133 bytes --]

From 7da41e2aab3b53579a762767314ac54a469eb52f Mon Sep 17 00:00:00 2001
From: Alex Kogan <alex.kogan@oracle.com>
Date: Wed, 25 Nov 2020 13:51:08 -0500
Subject: [PATCH v12 6/6] locking/qspinlock: Introduce the shuffle reduction
 optimization into CNA

This performance optimization chooses probabilistically to avoid moving
threads from the main queue into the secondary one when the secondary queue
is empty.

It is helpful when the lock is only lightly contended. In particular, it
makes CNA less eager to create a secondary queue, but does not introduce
any extra delays for threads waiting in that queue once it is created.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
---
 kernel/locking/qspinlock_cna.h | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index ac3109a..6213992 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -5,6 +5,7 @@
 
 #include <linux/topology.h>
 #include <linux/sched/rt.h>
+#include <linux/random.h>
 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
@@ -86,6 +87,34 @@ static inline bool intra_node_threshold_reached(struct cna_node *cn)
 	return current_time - threshold > 0;
 }
 
+/*
+ * Controls the probability for enabling the ordering of the main queue
+ * when the secondary queue is empty. The chosen value reduces the amount
+ * of unnecessary shuffling of threads between the two waiting queues
+ * when the contention is low, while responding fast enough and enabling
+ * the shuffling when the contention is high.
+ */
+#define SHUFFLE_REDUCTION_PROB_ARG  (7)
+
+/* Per-CPU pseudo-random number seed */
+static DEFINE_PER_CPU(u32, seed);
+
+/*
+ * Return false with probability 1 / 2^@num_bits.
+ * Intuitively, the larger @num_bits the less likely false is to be returned.
+ * @num_bits must be a number between 0 and 31.
+ */
+static bool probably(unsigned int num_bits)
+{
+	u32 s;
+
+	s = this_cpu_read(seed);
+	s = next_pseudo_random32(s);
+	this_cpu_write(seed, s);
+
+	return s & ((1 << num_bits) - 1);
+}
+
 static void __init cna_init_nodes_per_cpu(unsigned int cpu)
 {
 	struct mcs_spinlock *base = per_cpu_ptr(&qnodes[0].mcs, cpu);
@@ -290,7 +319,15 @@ static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock,
 {
 	struct cna_node *cn = (struct cna_node *)node;
 
-	if (!cn->start_time || !intra_node_threshold_reached(cn)) {
+	if (node->locked <= 1 && probably(SHUFFLE_REDUCTION_PROB_ARG)) {
+		/*
+		 * When the secondary queue is empty, skip the call to
+		 * cna_order_queue() with high probability. This optimization
+		 * reduces the overhead of unnecessary shuffling of threads
+		 * between waiting queues when the lock is only lightly contended.
+		 */
+		cn->partial_order = LOCAL_WAITER_FOUND;
+	} else if (!cn->start_time || !intra_node_threshold_reached(cn)) {
 		/*
 		 * We are at the head of the wait queue, no need to use
 		 * the fake NUMA node ID.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread
* [PATCH v12 5/5] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA
@ 2020-11-17 23:13 Alex Kogan
  2020-11-22  9:47 ` [locking/qspinlock] 6f9a39a437: unixbench.score -17.3% regression kernel test robot
  0 siblings, 1 reply; 2+ messages in thread
From: Alex Kogan @ 2020-11-17 23:13 UTC (permalink / raw)
  To: linux, peterz, mingo, will.deacon, arnd, longman, linux-arch,
	linux-arm-kernel, linux-kernel, tglx, bp, hpa, x86, guohanjun,
	jglauber
  Cc: steven.sistare, daniel.m.jordan, alex.kogan, dave.dice

Prohibit moving certain threads (e.g., in irq and nmi contexts)
to the secondary queue. Those prioritized threads will always stay
in the primary queue, and so will have a shorter wait time for the lock.

Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/qspinlock_cna.h | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index d3e2754..ac3109a 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,6 +4,7 @@
 #endif
 
 #include <linux/topology.h>
+#include <linux/sched/rt.h>
 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
@@ -35,7 +36,8 @@
  * running on the same NUMA node. If it is not, that waiter is detached from the
  * main queue and moved into the tail of the secondary queue. This way, we
  * gradually filter the primary queue, leaving only waiters running on the same
- * preferred NUMA node.
+ * preferred NUMA node. Note that certain priortized waiters (e.g., in
+ * irq and nmi contexts) are excluded from being moved to the secondary queue.
  *
  * We change the NUMA node preference after a waiter at the head of the
  * secondary queue spins for a certain amount of time (10ms, by default).
@@ -49,6 +51,8 @@
  *          Dave Dice <dave.dice@oracle.com>
  */
 
+#define CNA_PRIORITY_NODE      0xffff
+
 struct cna_node {
 	struct mcs_spinlock	mcs;
 	u16			numa_node;
@@ -121,9 +125,10 @@ static int __init cna_init_nodes(void)
 
 static __always_inline void cna_init_node(struct mcs_spinlock *node)
 {
+	bool priority = !in_task() || irqs_disabled() || rt_task(current);
 	struct cna_node *cn = (struct cna_node *)node;
 
-	cn->numa_node = cn->real_numa_node;
+	cn->numa_node = priority ? CNA_PRIORITY_NODE : cn->real_numa_node;
 	cn->start_time = 0;
 }
 
@@ -262,11 +267,13 @@ static u32 cna_order_queue(struct mcs_spinlock *node)
 	next_numa_node = ((struct cna_node *)next)->numa_node;
 
 	if (next_numa_node != numa_node) {
-		struct mcs_spinlock *nnext = READ_ONCE(next->next);
+		if (next_numa_node != CNA_PRIORITY_NODE) {
+			struct mcs_spinlock *nnext = READ_ONCE(next->next);
 
-		if (nnext) {
-			cna_splice_next(node, next, nnext);
-			next = nnext;
+			if (nnext) {
+				cna_splice_next(node, next, nnext);
+				next = nnext;
+			}
 		}
 		/*
 		 * Inherit NUMA node id of primary queue, to maintain the
@@ -285,6 +292,13 @@ static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock,
 
 	if (!cn->start_time || !intra_node_threshold_reached(cn)) {
 		/*
+		 * We are at the head of the wait queue, no need to use
+		 * the fake NUMA node ID.
+		 */
+		if (cn->numa_node == CNA_PRIORITY_NODE)
+			cn->numa_node = cn->real_numa_node;
+
+		/*
 		 * Try and put the time otherwise spent spin waiting on
 		 * _Q_LOCKED_PENDING_MASK to use by sorting our lists.
 		 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-11-25 18:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 18:56 [locking/qspinlock] 6f9a39a437: unixbench.score -17.3% regression Alex Kogan
  -- strict thread matches above, loose matches on Subject: below --
2020-11-17 23:13 [PATCH v12 5/5] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA Alex Kogan
2020-11-22  9:47 ` [locking/qspinlock] 6f9a39a437: unixbench.score -17.3% regression kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).